Computer Intensive Methods

Explanation

 

Back

When you do conventional statistics, they have been derived from mathematical and statistical theory.

The relationship of these theoretical structures to the real world makes a number of assumptions which are not always met. An alternative to fighting with theoretical statistics is to use methods which turn the sample you have into the population of interest.

You create an empirical distribution by resampling from the existing distribution.

Computers, with a little bit of grunt, makes sampling to empirical distributions quite effortless.

There are a number of ways to create these empirical distributions so that you can test hypotheses and establish confidence intervals. Each has its own place in the scheme of things

Example

You have data on two groups of students (Full time and Part time) showing their rating of your performance as a lecturer. The class is small and you only have 13 and 11 in the two groups.

You want to test for a difference between the two groups even though you know the samples are biased and small.

You use a computer intensive method - probably a permutation test.

Basic methodology

  1. Calculate a conventional statistic for the difference between the groups.

  2. Use a systematic randomisation approach
    • Extract two samples from the existing samples
    • Recalculate the statistic for the new samples
    • If the new statistic is greater than or equal to the original statistic
        Update a counter NGE

  3. Update a counter to give the count of permutations PERMS

  4. When all permutations have been completed
      Divide the NGE count by the PERMS count
The figure obtained in Stage 4 is seen as the probability value used in hypothesis testing or confidence interval calculation.

For example, if the NGE/PERMS = 0.025 and you were using a rejection level of 0.05, you would be able to reject the null hypothesis (from the original example) that there was no difference in performance rating by part and full time students.

Empirically, what you have shown is that it is very unlikely within your mini-population that the difference you found is a usual occurrence. It occurs infrequently when you rearrange the sample in different ways.