Previous: Datasets
Main page
Next: Results
MethodIt had originally been planned to use William Trochim's Concept System software but this turned out not to be possible. Concept System is a dedicated system for Trochim's own version of concept mapping and will not accept data at the correlation matrix stage. It would have been possible to write a dedicated competency mapping system, but then there would not have been time to analyse the data in any depth. It was therefore decided to use SPSS for factor analysis, multidimensional scaling and cluster analysis.The raw numeric marks were converted to percentages before the analysis to ensure that they were commensurate. Factor analysis was performed on this normalized dataset to try to determine the appropriate number of clusters to use. In factor analysis, components of variation are successively extracted in such a way as to maximize the amount of variance captured at each step, until all the variance in the original data has been accounted for. The amount of variance extracted at each step is called the eigenvalue, due to its method of computation. When the eigenvalue extracted at each step is plotted, the resulting graph is called a scree plot. It is usually shaped like a cliff with a scree slope at the bottom: an initially sharp descending gradient, as the most important factors are extracted; followed by a gentler gradient, corresponding to minor factors; and finally the graph flattens out completely as the point of diminishing returns is reached. Further factors contribute little to the model, and may be considered to be artifacts. The "true" number of factors may be estimated in two ways: the Kaiser criterion, which counts factors with eigenvalues greater than 1; or the scree test, which tries to find the point of diminishing returns on a scree plot. Both of these tests only give a rule of thumb: the Kaiser test tends to overestimate, while the scree test tends to underestimate the true number of factors. It was therefore necessary to go back to the data and, using the estimates gleaned from factor analysis, choose the clustering that made the most sense. Hierarchical cluster analysis with number of clusters ranging from 2 to 12 was performed on each dataset, and multidimensional scaling (MDS) was used to map the data into two dimensions. Multidimensional scaling is a method for assigning variables to points in two or more dimensions, in such a way that variables that are have a higher measured similarity --- in this case, correlations between marks --- are closer together. It is essentially an optimisation problem: the stress, which is a numerical measure of "badness-of-fit", is minimised over several iterations. It is important to note that the clustering is not done on the points generated by multidimensional scaling. Because multidimensional scaling rarely produces a perfect fit to the input data, it is possible for points to cluster together on the basis of correlation that do not look particularly close on the multidimensional scaling plot. If this actually happens, it indicates that the final stress, or "badness-of-fit", was too high and that the dimensionality of the scaling should be increased. Finally, the points generated by multidimensional scaling were grouped into separate files according to the cluster they had been assigned, and gnuplot was used to plot them to a graph. This process was performed for the following sets of data:
Very few of the bottom 115 students had any non-zero bonus marks, so attempting to use them would have been futile. In fact, all of these students had zero for bonus question B8: factor analysis cannot be performed if any of the variables are zero in all cases. | |
Previous: Datasets
Main page
Next: Results