Previous: Method and results
Main page
Next: Method
DatasetsStudents sitting CSE1301 in 2001 were required to undertake 14 distinct summative assessment tasks:
Choice of activitiesActivities were chosen for inclusion in the study on the basis of availability. The aim was to maximise the number of activities included, taking ease of availability and likely usefulness into account.The prac and bonus marks were easy to get, as they were entered into a database at the time of marking. The exam and the mid-semester test were delivered on paper, and although it would have been easy to acquire an electronic list of total marks, it is impossible to infer much conceptual structure from a single numeric mark. The finer the granularity of measurement, the better the chances of being able to infer conceptual structure. Therefore, the physical exam papers were acquired, and the page totals were transcribed from the front of each exam. This was a time-consuming task, so it was decided not to repeat the process with the mid-semester test and therefore the test results are not included in this study. This still left 12 prac marks, 6 prac bonus marks, and 28 exam pages --- a total of 46 marks per student, which was felt to be ample. The prac and exam marks were combined into a single file, one line per student and one column per activity. At this time, the data was stripped of identfying features in accordance with Ethics Committee guidelines. Therefore, no breakdown on student demographics (for example, gender or ethnicity) was possible using this data. It was, however, possible to break students down according to their final mark. The activities that were used comprise 12 pracs, ranging from an introduction to using the computer to an advanced assignment; six bonus prac questions, which were not compulsory and were not attempted by many students; and 28 exam pages. These activities are described in more detail in Appendix II.
Data validationBefore analysis began, the data was vetted to ensure that the marks values were sane. This entailed:
After this modification, there were still 350 student records left in the main dataset, each comprising 46 separate activity marks: a total of 15,916 marks.
Choice of datasetsSeveral subsets of the data were chosen for analysis. The first dataset chosen for analysis was, of course, the full set ALL: all students and all activities. However, this dataset produced a graph with so many points on it that it was difficult to interpret it, so ways were investigated to reduce its visual complexity.A correlation matrix was calculated for the data, using Pearson's r. The first look at the matrix showed some surprising results: correlation coefficients between exam questions were relatively high across the board, mostly in the neighbourhood of 0.7, whereas correlation coefficients between pracs tended to be around 0.35. Prac bonus questions correlated poorly with almost everything. Examining the raw data showed that very few students had attempted them, and they were eventually removed from consideration from most datasets, leaving 40 activities per student. Note that the sets ALL and TMA include the bonus questions. When the exam paper was analysed, it was found that the first ten pages consisted of multiple-choice questions, four to a page. Another four pages contained short-answer questions, six to a page. If the questions on each page had been related, this would not have been a problem; however, the questions were heterogeneous. So many factors contribute to the marks for each page that it is unlikely that cluster analysis would be able to do much to draw them out, so datasets were generated with totals, rather than page marks, for the multiple-choice and short-answer questions. This left 28 unique activities per student. Note that set ALL is the only dataset that includes separate marks for each page of multiple-choice and short-answer questions. | |
Previous: Method and results
Main page
Next: Method