Clustor creates jobs based on the number of parameter combinations specified
in a plan script. There are 4 types of Clustor parameters involved in this
study which specify:
- Selection method
Apart from the five selection methods previously mentioned,
there are 6 other methods examined in this case study which results are not
presented, mainly because the methods couldn't converge into models with
reasonably good predictive performance even after 50 variables have been
included in the model. So, in total there are 11 selection methods.
- Data file
The data for this case study are randomly sampled into 10 train-test data sets.
In addition to that, one train-test data set using data from consecutive years
1950-1987 as the training data and 1987-1994 as the test data is also formed.
The latter data set was used to build the benchmark forecasting models named
SHIFOR and SHIFOR94 against which the predictive performance of the model being
built in this case study is compared. So, in total there are 11 data sets
each stored in a different data file.
- Regression coefficient calculation
In this experiments, the coefficients of a set of variables can be calculated
using two methods: Jacobian Transformation and Gaussian Elimination.
The cost of a model, especially one which variables are highly correlated,
can differ depending on which method is used to find the coefficients of the
variables. This in effect will cause the search algorithm to choose different
paths in its pursue to find an optimum model.
- MML model parameter estimation
This parameter is to distinguish the way a model's variable coefficients and
standard deviation is calculated between MML and the other methods.
This is simply due to the different way those parameters are derived from
the initial formula in MML method. Hence, this Clustor parameter has 2 values.
Three Clustor script files are to be created before submitting the simulations
to Clustor:
- Plan file (example: complete.pln).
This file contains all the parameters and the command lines which Clustor
will parameterize to generate jobs at run time.
- Clustor option file (example:
clustor.options)
This file sets, among other things, the limit of jobs that can run concurrently
on a node for a user to prevent a user from dominating Clustor at any given
time.
- Root option file (example: root.options)
The number of concurrent node activations is set in this file.