COMP6237 Data Mining

2025-26


Maintained by Dr Shoaib Ehsan.

Welcome to the homepage for the ECS COMP6237 Data Mining module.

The challenge of data mining is to transform raw data into useful information and actionable knowledge. Data mining is the computational process of discovering patterns in data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and data management.

This course will introduce key concepts in data mining, information extraction, and information indexing; including specific algorithms and techniques for feature extraction, clustering, outlier detection, topic modelling and prediction of complex unstructured data sets. By taking this course, you will be given a broad view of the general issues surrounding unstructured and semi-structured data and the application of algorithms to such data. At a practical level, you will have the chance to explore an assortment of data mining techniques which you will apply to problems involving real-world data.

Lectures

The lectures for this course will be given by Dr Markus Brede (email), Dr Zhiwu Huang (email) and Dr Shoaib Ehsan (email)

The lecture slots are as follows:

Day Time Room
Monday 5 PM B100 4011 (Harvard L/TB)
Tuesday 5 PM B06 1077 (L/T A)
Thursday 12 PM B46 2003 (L/T B)
Friday 3 PM B46 2003 (L/T B)

The current timetable is shown below - be aware that this might change (especially if you ask us to add additional tutorial sessions):

Date Semester Week Lecturer(s) Topic/Title
26-Jan 1 Shoaib Intro to data mining
29-Jan   Shoaib Linear Regression I
30-Jan   Shoaib Linear Regression II
02-Feb 2 Shoaib Linear Regression Problem Sets
03-Feb   Shoaib Logistic Regression
06-Feb   Shoaib Logistic Regression Problem Sets; Group CW set
09-Feb 3 Zhiwu Making Recommendations
12-Feb   Shoaib Dealing with non-linear data
13-Feb   Shoaib & Zhiwu Group coursework Q & A
16-Feb 4 Shoaib & Zhiwu Group coursework Q & A
17-Feb   Shoaib & Zhiwu Group coursework Q & A
19-Feb   Zhiwu Finding Groups
23-Feb 5 Zhiwu Covariance
26-Feb   Zhiwu Embedding Data
27-Feb   Zhiwu Search
02-Mar 6 Zhiwu Document filtering
03-Mar   Zhiwu Modelling with decision trees
05-Mar   Zhiwu Modelling Prices & Nearest Neighbours
09-Mar 7 Zhiwu Market Basket Analysis
10-Mar   Zhiwu Semantic Spaces & Latent Semantics
12-Mar   Zhiwu Topic Modelling
16-Mar 8 Shoaib & Zhiwu & Markus Group Coursework Presentations
17-Mar   Shoaib & Zhiwu & Markus Group Coursework Presentations
19-Mar   Shoaib & Zhiwu & Markus Group Coursework Presentations
20-Mar   Shoaib & Zhiwu & Markus Group Coursework Presentations
Easter      
20-Apr 13 Zhiwu Outlier Detection
21-Apr   Markus Link Prediction on Networks
23-Apr   Markus Community Detection on Networks
28-Apr 14 Markus Exploiting network structure for IR
30-Apr   Shoaib Intro to Information Theory
01-May   Shoaib Information Theory II
05-May 15 Shoaib Information Theory Problem Sets
07-May   Shoaib Mining Data Streams
11-May 16 Shoaib Exam Revision Q&A
12-May   Zhiwu Exam Revision Q&A
14-May   Markus Exam Revision Q&A
15-May     CW Due
18-May 17   Exams

Lecture Materials

Materials to accompany the lectures can be found here:

Coursework

The schedule for each coursework is shown below. Further details will become available below as each coursework is set:

Link for forming groups for the coursework Link.

Where to get additional help

Here are a couple of documents to help you. The first is an overview of the course, so that you can see the structure:

Talk to us! As we said above, you are more than welcome to arrange to meet with us via Teams (or in the unused lecture slots / at other times if appropriate) to discuss issues related to the course. Zhiwu (email), Shoaib (email) and Markus (email) can all be reached by email.

Copyright ©2019 The University of Southampton. All rights reserved.