|
Printer-friendly versions [PS] [PDF]
Data Mining is a multidisciplinary field which brings together a wide
variety of techniques from areas of research and development with longer
histories: machine learning, pattern recognition, statistics, databases and
visualisation. The aim is to extract knowledge from the raw
information stored in large databases, with the aim of better describing
or understanding the existing data, or predicting how new data will be
generated in the future.
Over the course of this semester, you will work in groups on papers
focusing on the theory and application of these techniques.
The marks for this unit will be allocated as follows:
| Group research paper of approximately 5000 words | 50% |
|
| Presentation of the paper | 20% |
| Individual literature survey document and tasks | 25% |
| Attendance at group paper presentations | 5% |
Group Research Paper (50%): due week 12
The assessment for this unit is based on a group research paper, and tasks
associated with its production and presentation. Each group will prepare a
research paper on a particular data mining technique and its applications.
At the end of the semester, all the group papers will be made available to
class members as an "electronic book" on the unit website, where each
paper will form a chapter. The production of this book is the aim of the
whole class for the semester.
Students must form groups of four or five, and submit a Group Registration
Form (see the printer-friendly versions of this document) to the lecturer
by the end of week 3. Forms may be submitted to the lecturer's letter box
on level 5 of building B. Students having difficulty finding group members
are encouraged to use the Feeback Forum on the unit website to seek others
in the same situation.
Each group will be assigned a data mining technique or issue as a topic
from a list provided by the lecturer. The number of groups assigned to each
topic will be minimised (for most topics, there will be two groups).
Assignments of topics to groups will be decided on the basis of preferences
expressed via a form on the web.
One or two group members must take responsibility for researching and
writing each of these parts of the paper:
- A literature survey giving the research background for the technique
and brief accounts of how and where it is applied.
- An explanation of how the technique and the algorithms implementing it
actually work, preferably with a worked example.
- Two or more detailed case studies showing how the technique
has been applied in business, industrial or scientific applications.
It is expected that papers should be of a quality suitable for
publication—they will form chapters of the book which will be made
available to the class at the end of the semester. Students should make use
of the Faculty
Guide to writing assignments, paying particular attention to section 4,
``Citations'', and section 5, ``Quotations and Paraphrases''.
Papers are to be approximately 5,000 words. A list of allowed paper topics
will be available from the unit web site. Here are examples of possible
topics:
- Association Rule Discovery
- Back-propagation Neural Networks
- Self-Organising Maps
- Decision Trees
- Clustering
- Bayesian Networks
- Hidden Markov Models
- Information Filtering (e.g. for "spam" email)
- Visualisation for Data Mining
- Ethics and Data Mining
Your group can specify their topic preferences using a web form.
Assessment
The group research paper will be marked as follows:
| Understanding of technique/algorithm (or issue) | 20 |
|
| Case studies | 20 |
| Organization and clarity | 5 |
| Accuracy of referencing | 5 |
Individual Literature Survey: (25%): due week 7
The literature survey is due in week 7 and should consist of a
discussion of the papers read, including the problems addressed,
the techniques used, and their advantages and disadvantages. Each student
must discuss at least five (preferably more) papers covering the topic of
their paper. These papers must include the set reading from the lecturer, as
well as papers you have located yourselves. The majority of papers surveyed
must be academic papers, published in peer-reviewed conferences or
journals, not magazine articles.
The expected length of the literature survey is 1000 words. There will also
be tutorial tasks related to the literature survey.
Assessment
The individual literature survey will be marked as follows:
Understanding of techniques/algorithms (or issues) and their advantages and disadvantages | 10 |
|
| Organization and clarity | 5 |
| Accuracy of referencing | 5 |
| Tutorial exercises | 5 |
Group Seminar Presentation: (20%)
The presentation will be for 20 minutes, with 5 minutes for questions.
Groups should provide copies of their overheads to the lecturer. All group
members must participate in the presentation. Depending on the final
number of groups, time for presentations may be extended.
Assessment
The group seminar will be marked as follows:
| Content | 10 |
|
| Structure | 5 |
| Presentation | % |
Attendance at student paper presentations (5%)
Students must attend at least 75% of group presentations, and participate
by asking questions.
|