Home
 Lectures
 Staff
 Timetable
 Tutorials
 Research Paper
 Marks
 Downloads
 Links
Feedback
 
 CSE5230: Research Paper

Printer-friendly versions [PS] [PDF]

Data Mining is a multidisciplinary field which brings together a wide variety of techniques from areas of research and development with longer histories: machine learning, pattern recognition, statistics, databases and visualisation. The aim is to extract knowledge from the raw information stored in large databases, with the aim of better describing or understanding the existing data, or predicting how new data will be generated in the future. Over the course of this semester, you will work in groups on papers focusing on the theory and application of these techniques. The marks for this unit will be allocated as follows:

Group research paper of approximately 5000 words50%
Presentation of the paper20%
Individual literature survey document and tasks25%
Attendance at group paper presentations5%

Group Research Paper (50%): due week 12

The assessment for this unit is based on a group research paper, and tasks associated with its production and presentation. Each group will prepare a research paper on a particular data mining technique and its applications. At the end of the semester, all the group papers will be made available to class members as an "electronic book" on the unit website, where each paper will form a chapter. The production of this book is the aim of the whole class for the semester.

Students must form groups of four or five, and submit a Group Registration Form (see the printer-friendly versions of this document) to the lecturer by the end of week 3. Forms may be submitted to the lecturer's letter box on level 5 of building B. Students having difficulty finding group members are encouraged to use the Feeback Forum on the unit website to seek others in the same situation.

Each group will be assigned a data mining technique or issue as a topic from a list provided by the lecturer. The number of groups assigned to each topic will be minimised (for most topics, there will be two groups). Assignments of topics to groups will be decided on the basis of preferences expressed via a form on the web.

One or two group members must take responsibility for researching and writing each of these parts of the paper:

  • A literature survey giving the research background for the technique and brief accounts of how and where it is applied.
  • An explanation of how the technique and the algorithms implementing it actually work, preferably with a worked example.
  • Two or more detailed case studies showing how the technique has been applied in business, industrial or scientific applications.
It is expected that papers should be of a quality suitable for publication—they will form chapters of the book which will be made available to the class at the end of the semester. Students should make use of the Faculty Guide to writing assignments, paying particular attention to section 4, ``Citations'', and section 5, ``Quotations and Paraphrases''.

Papers are to be approximately 5,000 words. A list of allowed paper topics will be available from the unit web site. Here are examples of possible topics:

  • Association Rule Discovery
  • Back-propagation Neural Networks
  • Self-Organising Maps
  • Decision Trees
  • Clustering
  • Bayesian Networks
  • Hidden Markov Models
  • Information Filtering (e.g. for "spam" email)
  • Visualisation for Data Mining
  • Ethics and Data Mining
Your group can specify their topic preferences using a web form.

Assessment

The group research paper will be marked as follows:
Understanding of technique/algorithm (or issue)20
Case studies20
Organization and clarity5
Accuracy of referencing5

Individual Literature Survey: (25%): due week 7

The literature survey is due in week 7 and should consist of a discussion of the papers read, including the problems addressed, the techniques used, and their advantages and disadvantages. Each student must discuss at least five (preferably more) papers covering the topic of their paper. These papers must include the set reading from the lecturer, as well as papers you have located yourselves. The majority of papers surveyed must be academic papers, published in peer-reviewed conferences or journals, not magazine articles.

The expected length of the literature survey is 1000 words. There will also be tutorial tasks related to the literature survey.

Assessment

The individual literature survey will be marked as follows:
Understanding of techniques/algorithms (or issues)
and their advantages and disadvantages
10
Organization and clarity5
Accuracy of referencing5
Tutorial exercises5

Group Seminar Presentation: (20%)

The presentation will be for 20 minutes, with 5 minutes for questions. Groups should provide copies of their overheads to the lecturer. All group members must participate in the presentation. Depending on the final number of groups, time for presentations may be extended.

Assessment

The group seminar will be marked as follows:
Content10
Structure5
Presentation%

Attendance at student paper presentations (5%)

Students must attend at least 75% of group presentations, and participate by asking questions.