Assignments as influential factor to improve the prediction of student performance in online courses

This website contains additional material from the paper “Assignments as influential factor to improve the prediction of student performance in online courses”.

Abstract

Studies on the prediction of student success in distance learning have explored mainly demographics factors and student interactions with the virtual learning environments. However, it is remarkable that a very limited number of studies use information about the assignments submitted by students as influential factor to predict the impact on student achievement. This paper aims to explore the real importance of assignment information for solving students’ performance prediction in distance learning and evaluate the beneficial effect of including this information. We investigated and compared the use of this factor using both traditional representation and a more flexible representation based on Multiple Instance Learning (MIL) that can handle weakly labeled data. A comparative study using the Open University Learning Analytics Dataset, one of the most important online universities of United Kingdom, and a wide and different type of machine learning algorithms shows that algorithms using only information about assignments with a representation based on MIL can outperform more than 20% the accuracy with respect to the use of a representation based on single instance learning. Thus, it is found that using an appropriate representation that eliminates the sparseness of data allows to show the relevance of this factor. Moreover, algorithms using only information assignments as impact factor obtain better results compared to those obtained by previous studies that solve the same problem exploring other factors.

Datasets

All the datasets used in this work come from the Open University Learning Analytics Dataset (OULAD), but have been processed in order to adapt them to this study. The new structure of the database from which datasets have been generated is the following.

There are two approaches: multiple instance and single instance, and, for each one, the information has been split into the 7 courses exiting in the dataset, merging all their presentations. So there are 14 datasets in ARFF format that summarize the information related to the submitted assignments by the students in each course following two distinct approaches.

Multiple Instance Datasets
Simple Instance Datasets  

Multiple Instance Datasets

Dataset Course #Bags #Positive bags #Negative bags Avg instances per bag #Instances #Attributes
mil-assessment-binary-nomiss-AAA  AAA 705 530 175 4.47 3149 5
mil-assessment-binary-nomiss-BBB BBB 6077 3754  2323 7.08 43032 5
mil-assessment-binary-nomiss-CCC CCC 3413 1677 1736  4.98  17025 5
mil-assessment-binary-nomiss-DDD DDD 4940 2607  2333 5.63 27820 5
mil-assessment-binary-nomiss-EEE EEE  2298 1649 649 3.43 7893 5
mil-assessment-binary-nomiss-FFF FFF 6294 3648  2646  8.71 54815 5
mil-assessment-binary-nomiss-GGG GGG 2112 1514 598 7.21 15219 5

Simple Instance Datasets

Dataset Course #Instances #Positive instances #Negative instances #Attributes
simple-assessment-binary-nomiss-AAA AAA 705 530 175 51
simple-assessment-binary-nomiss-BBB BBB 6077  3754 2323 191
simple-assessment-binary-nomiss-CCC CCC 3413  1677 1736 81
simple-assessment-binary-nomiss-DDD DDD 4940  2607  2333  156
simple-assessment-binary-nomiss-EEE EEE 2298 1649 649 61
simple-assessment-binary-nomiss-FFF FFF 6294  3648 2646 241
simple-assessment-binary-nomiss-GGG GGG 2112  1514 598 136

Experimental study

This section presents a more extend study to support the results showed in the paper. First it is described the study of the configuration of the multi-instance proposals. Then, it can be found the complete experimentation of the comparative study between the two multi-instance proposal and the simple instance approach.

Study of the configuration of multi-instance proposals

The objective of the study is to evaluate the different alternatives for carrying out the transformation of the wrapper methods and determining which one works better in this problem. Regarding to SimpleMI and due to the characteristics of the data, two different methods for transforming the problem are studied:

  • Configuration 1: computing arithmetic mean of each attribute using all instances of the bag and using it in the summarized instance.
  • Configuration 2: computing geometric mean of each attribute using all instances of the bag and using it in the summarized instance.

Regarding to MIWrapper, three different methods for transforming the problem are studied:

  • Configuration 1: computing the arithmetic average of the class probabilities of all the individual instances of the bag.
  • Configuration 2: computing the geometric average of the class probabilities of all the individual instances of the bag.
  • Configuration 3: checking the maximum probability of single positive instances. If there is at least one instance with its positive probability greater than 0.5, the entire bag is positive.

This problem is focused in predict if a student will pass a course or, by contrast, he/she will fail or drop out it. Thus, the performance of every studied algorithm is measured in terms of binary classification. Specifically, for the configuration of the proposed wrappers we focus on the accuracy of the classification, since the classes in the different datasets are pretty balanced. The experimentation consist of a 10-fold stratified cross-validation for every combination of wrapper configuration, algorithm and course. The full data can be downloaded in:

Results of the wrappers configurations

With the average accuracy of the cross-validation, a statistical analysis is carried out in order to find significant differences between configurations. The following table contains the results of Wilcoxon signed-rank test between the configurations of each wrapper in accuracy attending to the sum of ranks in which the first configuration outperforms the second one R+, the sum of ranks in which the second configuration outperforms the first one R-, and the p-value given a two-tailed probability of find significant differences between the two configurations. According with this value, the last column shows the obtained conclusions at a confidence level alpha=99%, since in all cases p-value is smaller than 0.01. In case of SimpleMI, test confirms that configuration 1 has significantly higher accuracy than configuration 2. Thus, in this problem it is better to summarize the bag with the arithmetic mean. In case of MIWrapper, tests shows that the third configuration is by far the worst option, while configurations 1 and 2 have a more evenly performance, although the configuration 1 finally obtains a significantly higher accuracy. That is, it is better to use the arithmetic mean to combine the class probabilities of instances into the final class bag.

Wrapper Comparison R+ R- p-value Conclusions at 99% level of confidence
SimpleMI Config. 1 vs Conf. 2 12740.5 300.5 4.78e-25 Config. 1 improves Config. 2
MIWrapper Config. 1 vs Config. 2 8274.0 4606.0 5.19e-4 Config. 1 improves Config. 2
Config. 1 vs Config. 3 12846.0 195.0 2.32e-25 Config. 1 improves Config. 3
Config. 2 vs Config. 3 12847.0 194.0 2.28e-25 Config. 2 improves Config. 3

Complete experimental results

The experimental study is composed of cross-evaluation of 23 algorithms over the 7 courses using the 3 different data representation: traditional single-instance and the two MIL wrappers. The cross-validation has 10 folds. Moreover, several algorithms are not deterministc, but sthocastics, so the experimentation has been repeated 5 times with different seeds. Specifically, 13 algorithms are in this situation. Thus, the final number of combinations is 7x3x10x(10+13×5) = 15750 experiments.

Paradigm Algorithm Stochastic?
Methods based on trees DecisionStump No
J48 No
RandomTree Yes
RandomForest Yes
Methods based on rules ZeroR No
OneR No
NNge No
PART No
Ridor Yes
Naive Bayes No
Logistic No
Methods based on SVM LibSVM No
SPegasos No
SGD Yes
SMO Yes
Methods based on ANN RBFNetwork Yes
Multilayer Perceptron Yes
Methods based on ensembles AdaBoostM1 with RandomForest Yes
AdaBoostM1 with PART Yes
AdaBoostM1 with NaiveBayes Yes
Bagging with RandomForest Yes
Bagging with PART Yes
Bagging with NaiveBayes Yes

The full reports obtained for every combination can be downloaded bellow. The results are in CSV format, following the Weka style, since this is the framework used for the experimentation. The reports are grouped by representation, and inside each download, there are a CSV file by course with all the executions of all the algorithms and multiple performance metrics like accuracy, specificity and sensitivity among others. Moreover, for each course it is another file that summarizes the metrics used in the paper.

Results of traditional representation
Results of SimpleMI
Results of MIWrappers

At the ages of fifty four fifty can advise you that I'onal ended up lucky not to have wanted the product sooner, nevertheless loosing your partner of 25yrs 2010 became a curve which modified me personally for a long time. Now there came out a place exactly where click to read fifty had visit to have my tastes fulfilled only to find out this plumbing related desired just a little poke. So I questioned my own Computer system doc intended for a little something with tiny facet is affecting. He or she provided the particular recommended you read Cialis regular 5mg. 1st working day fifty had 5mg without any help to discover more help in the event that t discovered virtually any difference considering the woman never was planning determine, which'azines our system and also l'meters being dedicated to the idea. Regardless, these materials Operates, along with is useful. And click here then up coming night time with your ex m took 10mg at 8pm, and the rest is heritage. Through 13:double zero fifty manufactured my own move, but it appeared to be the most element to live in place until finally the sunlight came up upward , 100%Pleased :)When i't thirty-two as well as gone pretty much 1,5 years without intercourse. I had created pop over to this website plenty of anxiousness related penile erection challenges. And hop over to here then We found this specific great which woman My spouse and i started courting, along with first 2 times us all sex didn'to determine which properly, and i also appeared to be worried about just what exactly your woman considered this matter. We obtained braveness to visit and request cialis approved from the health practitioner. When i had taken 10mg product and it labored perfectly. I could truthfully continue on having sex many times a day without the problems. Merely bad thing is a smallish frustration. For me this can be truly a wonder drug.