COMP6237 Data Mining
Effort: ~40 hours per student
Credit: 30% of overall module mark
Team size: 4 students.
Due date: Friday 23rd February 2024, 16:00.
Handin
Required files: brief.pdf Suggested Brief Template
Due date: Friday 16th May 2024, 16:00.
Handin
Required files: paper.pdf; mark_split.pdf
In this coursework you form a team and choose a predictive data mining problem to tackle. Teams are expected to perform a series of experiments on their chosen problem in order to build a predictive model (or set of models) which they evaluate and compare (across techniques used by the team and any other published approaches). The team must present their work as a written conference paper. The team is also expected to orally present an interim progress report to the class in the week before the Easter vacation.
Students will form groups of no more than 4 members. We’ve made a page on the student wiki to help you form teams: wiki student group list.
Each group will propose a project in consultation with members of staff; we expect that you’ll pick a dataset or challenge that has been or is being used for current data mining research (see the links at the end of this document for some ideas). The project must demonstrate the team’s ability to scientifically tackle a real-world predictive data-mining problem (as opposed to a descriptive or understanding type problem that you’ll tackle in the individual coursework).
There are three deliverables for this coursework:
sigconf
style option for the template) and be at most 6 pages in length, including all references and appendices (if used). Additionally, each team is required to submit a proposed marks distribution form (see below).Each team will receive an overall mark (broken down into sub-categories). Individual marks will be assigned based on a split decided by the team. Full details below:
The conference paper and presentation will be marked as a single piece of work using the following criteria:
Criterion | Description | Marks |
---|---|---|
Experimentation and Analysis | Analyse the problem and results obtained | 35 |
Application of techniques | Show ability to apply predictive data mining techniques and preprocessing operations | 35 |
Reflection | Reflect on what the experimental results tell us about the problem and the techniques used | 20 |
Reporting | Clear and professional reporting | 10 |
Standard ECS late submission penalties apply.
Written group feedback will be given covering the above points, and will be emailed out once marking is complete.
Team members should agree between themselves as to how the marks awarded for the team submission will be divided between the team members (see below for instructions on how to proceed if this is not possible). The Team Leader should print out the form here, complete it as agreed and arrange for every member of the team to sign and date it. The completed signed form must be submitted via the ECS Handin system with the conference paper. An incomplete form (e.g. with missing signatures) means that the entire ECS Handin submission is incomplete and therefore subject to penalties.
Teams are encouraged to split the work evenly between all team members (in which case the marks split evenly). They are advised to consider any proposed non-uniform distribution very carefully before submission. Note that an individual contribution of zero is acceptable and will result in that team member being effectively removed from the team. One or more individual contributions of 10% or less may result in an ad-hoc reduction in the effective team size. Any proposed non-uniform distribution will be discussed with the team after the presentation and may be subject to modification by the Module Leader at that stage.
Teams are advised to make every effort to agree on the marks distribution because failure to agree will be interpreted as demonstrating a general lack of competence. However, the procedure to follow if there is no agreement is set out below:
The team should divide into two or more subteams (in the worst case, a team of size ‘N’ could have ‘N’ subteams). Each team should elect a subteam leader, who should make a full submission as detailed above. Each marks distribution form submitted should indicate proposed percentages of the overall team marks to be allocated to the members of that subteam, with a written one-page explanation of why such an allocation would be appropriate. It should be noted that any attempt by a team member to exploit the advice above (that teams should make every effort to agree) by, for example, refusing to sign the marks distribution form will not be successful (in the unlikely event that this happens, each individual should make a brief signed statement as to the facts of the case and submit this with the other documentation).
The final marks breakdown for a team that fails to agree will be determined by the Module Leader, taking all relevant factors into account. This decision will be final.
The following list has some pointers to places where you might get some inspiration for data mining challenges together with associated data such as evaluation criteria and comparative performance data:
If you have any problems/questions then email Shoaib or email Markus.