COMP6237 Data Mining

2024-25


Maintained by Dr Shoaib Ehsan.

Welcome to the homepage for the ECS COMP6237 Data Mining module.

The challenge of data mining is to transform raw data into useful information and actionable knowledge. Data mining is the computational process of discovering patterns in data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and data management.

This course will introduce key concepts in data mining, information extraction, and information indexing; including specific algorithms and techniques for feature extraction, clustering, outlier detection, topic modelling and prediction of complex unstructured data sets. By taking this course, you will be given a broad view of the general issues surrounding unstructured and semi-structured data and the application of algorithms to such data. At a practical level, you will have the chance to explore an assortment of data mining techniques which you will apply to problems involving real-world data.

Lectures

The lectures for this course will be given by Dr Markus Brede (email), Dr Zhiwu Huang (email) and Dr Shoaib Ehsan (email)

The lecture slots are as follows:

Day Time Room
Mondays 12 PM B06 1081 (L/R B)
Mondays 5 PM B06 1081 (L/R B)
Tuesdays 10 AM B07 3031 (L/R F2)
Thursdays 11 AM B07 3027 (L/R F1)

There will generally be three lectures each week, making use of the Monday (12 pm), Tuesday and the Thursday slots. For some of the weeks, we may also use the Monday (5 pm) slot.

The current timetable is shown below - be aware that this might change (especially if you ask us to add additional tutorial sessions):

Date Semester Week Lecturer(s) Topic/Title
27-Jan 1 Shoaib Intro to data mining
30-Jan   Shoaib Linear Regression I
03-Feb 2 Shoaib Linear Regression II (MLE); Group CW set
04-Feb   Shoaib Linear Regression III
06-Feb   Shoaib Linear Regression Problem Sets
10-Feb 12pm 3 Shoaib & Zhiwu Group coursework Q & A
10-Feb 5pm   Shoaib Logistic Regression
11-Feb   Shoaib & Zhiwu Group coursework Q & A
13-Feb   Shoaib & Zhiwu Group coursework Q & A
17-Feb 12pm 4 Zhiwu Making Recommendations
17-Feb 5pm   Shoaib Dealing with non-linear data
18-Feb   Zhiwu Finding Groups
20-Feb   Zhiwu Covariance
24-Feb 5 Zhiwu Embedding Data
25-Feb   Zhiwu Search
27-Feb   Zhiwu Document filtering
03-Mar 6 Zhiwu Modelling with decision trees
04-Mar   Zhiwu Modelling Prices & Nearest Neighbours
06-Mar   Zhiwu Market Basket Analysis
10-Mar 7 Zhiwu Semantic Spaces & Latent Semantics
11-Mar   Zhiwu Topic Modelling
13-Mar   Zhiwu Outlier Detection
17-Mar 8 Shoaib & Zhiwu & Markus Group Coursework Presentations
18-Mar   Shoaib & Zhiwu & Markus Group Coursework Presentations
20-Mar   Shoaib & Zhiwu & Markus Group Coursework Presentations
24-Mar 9 Shoaib Logistic Regression Problem Sets
25-Mar   Shoaib Intro to Information Theory
27-Mar   Shoaib Information Theory II
Easter      
28-Apr 10 Markus Link Prediction on Networks
29-Apr   Markus Community Detection on Networks
01-May   Markus Exploiting network structure for IR
05-May 11   Bank Holiday
06-May   Shoaib Mining Data Streams
08-May   Shoaib Exam Revision Q&A
12-May 12 Zhiwu Exam Revision Q&A
13-May   Markus Exam Revision Q&A
16-May     CW Due
20-May 13   Exams

Lecture Materials

Materials to accompany the lectures can be found here:

Coursework

The schedule for each coursework is shown below. Further details will become available below as each coursework is set:

Link for forming groups for the coursework Link.

Where to get additional help

Here are a couple of documents to help you. The first is an overview of the course, so that you can see the structure:

Talk to us! As we said above, you are more than welcome to arrange to meet with us via Teams (or in the unused lecture slots / at other times if appropriate) to discuss issues related to the course. Zhiwu (email), Shoaib (email) and Markus (email) can all be reached by email.

Copyright ©2019 The University of Southampton. All rights reserved.