Previous: Results
Main page
Next: Analysis and conclusions
Random dataHow can these graphs be interpreted? Exactly what can be inferred from them? It is a circular problem: the accuracy of the representation of the map cannot be checked against the underlying conceptual structure of the subject, because the underlying conceptual structure (as modified through the students' learning experiences) is not known with certainty. It is useful to generate dummy data with known properties and see how well competency mapping performs. This exercise will also shed light on the best ways to design assessment for use as input to competency mapping.In order to generate this dummy data, it was first necessary to design a mathematical model for assessment marks.
A model for competenciesThere are two entities that we must model: students and questions. Each question will test one or more competencies, and each competency will contribute a proportion of the total mark for the question between 0 (completely irrelevant) and 1 (marks for that question depend exclusively on this competency). Furthermore, each student will have a certain degree of mastery of each competency, between 0 (no idea at all) and 1 (complete mastery).Assume that the domain comprises N competencies (roughly equivalent to the clusters on the map). Assume, optimistically, that the clusters are independent and disjoint: they do not overlap. The set of capabilities of each student can then be represented as a vector of positive numbers S between 0 and 1, where S[i] represents the proportion of competency i that the student has mastered. Assessment tasks can be modelled as a stochastic vector of positive numbers Q between 0 and 1, where Q[i] represents the contribution of competency i to the question. Now, the probability of a given student getting a given question right is the product of the student's capability vector and the question's capability vector. To put it another way, the expected mark is (Q . S)M, where M is the total marks available for the question. Note that, under this model, assessment is an attempt to infer each student's vector by getting the student to sit a number of tasks with (it is hoped!) known Q vectors. The aim of teaching, of course, is to get all the S[i] as close to 1 as possible; more realistically, to get the S[i] over some predetermined minimum value. This project is an attempt to infer the Q vectors. Of course, this model is an oversimplification. It is unlikely that real domains have independent disjoint competencies. However, it is good enough to be used as a basis for analysis: it is simple, and not hard to calculate. This model may have uses outside competency mapping, and can provide a basis for any competency-based analysis of student results.
The random datasetsData was generated according to several different models:
The data was generated using python scripts, three of which are attached at Appendix IV. Student competency vectors were generated according to a normal distribution, with identical mean and standard deviation for each student and competency. Note that this does not imply that the student competency vectors are identical: only that they were randomly generated by the same function. This simple model does not take into account the possibility of a mixed student population; nor does it take into account the intuition that some topics are inherently harder than others. Because this project is seeking to draw inferences about the subject matter of the course rather than about the student population, this was felt to be adequate. Fourteen datasets were randomly generated, and the following were chosen for inclusion in this thesis:
PerformanceCompetency mapping performed predictably well on the orthogonal datasets. In these datasets, each question assessed only one competency. As can be seen from the example, they produced tight, well-separated clusters that are easy to distinguish. They do not resemble the results of competency mapping on actual student data, but this is unsurprising: student assessment is far from orthogonal. However, it does imply that questions that are as close as practical to orthogonal might produce competency maps that are easier to read and interpret. The mixed datasets produced the plots that looked most like those obtained from real data. Competency mapping did a good job of clustering the questions according to the strongest base competency, but the clusters tend to overlap more than the clusters drawn from real student data do. This is at least partly due to the higher stresses in the multidimensional scaling: when an 80-question mixed dataset with eight base competencies was plotted, the stress in the scaling was over 0.4. In comparison, the stresses for multidimensional scaling in the real student data were in the neighbourhood of 0.1. The reason for this poor fit is not known at this stage. The example shown had a stress value of roughly 0.18.The competency map for the orthomodal dataset with both modal competencies set to 0.25 looks very similar to the map for the orthogonal dataset. The main difference is that the clusters are less tight. However, when one modal competency is set at 0.75 --- modelling a mode in which delivery and interface issues dominate student performance --- and the other remains at 0.25, the results are quite different. While the tasks for the lighter mode still cluster by base competency, the tasks for the heavier mode cluster together most strongly. These ``mode A'' tasks, which were assigned to cluster 5 in the example figure, form a single, relatively diffuse cluster. This resembles the clustering of prac tasks in the real data, which opens up the possibility that issues associated with the delivery of pracs, rather than familiarity with essential concepts or mastery of basic skills, may be the dominating factor in determining prac marks. Other interpretations of these data are possible, of course. For example, it is possible that the pracs are measuring programming ability, but that the exam is not: in other words, the ability to program may be acting as a modal competency in that it is a determiner of marks in one mode but not in the other. But the exam contained programming questions as well as theory questions! If this is the reason for the divergence in correlations, then it must take a substantially different skill set to write a good program under exam conditions rather than in a lab class.
| |||||
Previous: Results
Main page
Next: Analysis and conclusions