COMP6237 Data Mining

2023-24


Maintained by Dr Shoaib Ehsan.

Welcome to the homepage for the ECS COMP6237 Data Mining module.

The challenge of data mining is to transform raw data into useful information and actionable knowledge. Data mining is the computational process of discovering patterns in data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and data management.

This course will introduce key concepts in data mining, information extraction, and information indexing; including specific algorithms and techniques for feature extraction, clustering, outlier detection, topic modelling and prediction of complex unstructured data sets. By taking this course, you will be given a broad view of the general issues surrounding unstructured and semi-structured data and the application of algorithms to such data. At a practical level, you will have the chance to explore an assortment of data mining techniques which you will apply to problems involving real-world data.

Lectures

The lectures for this course will be given by Dr Markus Brede (email), Dr Zhiwu Huang (email) and Dr Shoaib Ehsan (email)

The lecture slots are as follows:

Day Time Room
Mondays 9AM B02 1039 (L/T K)
Tuesdays 9AM B46 2003 (L/T B)
Thursdays 10AM B02 1039 (L/T K)
Fridays 1PM B02 1039 (L/T K)

There will generally be three lectures each week, making use of the Monday, Tuesday and the Thursday slot. For some of the weeks we will also use the Friday slot.

The current timetable is shown below - be aware that this might change (especially if you ask us to add additional tutorial sessions):

Date Semester Week Lecturer(s) Topic/Title
29-Jan 1 Zhiwu & Markus & Shoaib Intro to data mining
01-Feb   Shoaib Linear Regression
02-Feb   Shoaib Maximum Likelihood Estimation
06-Feb 2 Shoaib Tutorial/seminar: linear regression and MLE; CW set
08-Feb   Shoaib Logistic regression
09-Feb   Shoaib Dealing with non-linear data
12-Feb 3 Zhiwu & Shoaib Group coursework Q & A
13-Feb   Zhiwu & Shoaib Group coursework Q & A
15-Feb   Zhiwu & Shoaib Group coursework Q & A
19-Feb 4 Shoaib Tutorial/seminar: logistic regression
20-Feb   Shoaib Intro to information theory
22-Feb   Shoaib Information theory II
26-Feb 5 Zhiwu Making Recommendations
27-Feb   Zhiwu Finding Groups
29-Feb   Zhiwu Covariance
04-Mar 6 Zhiwu Embedding Data
05-Mar   Zhiwu Search
07-Mar   Zhiwu Document filtering
11-Mar 7 Zhiwu Modelling with decision trees
12-Mar   Zhiwu Modelling Prices & Nearest Neighbours
14-Mar   Zhiwu Market Basket Analysis
18-Mar 8 Zhiwu & Shoaib & Markus Group coursework presentations
19-Mar   Zhiwu & Shoaib & Markus Group coursework presentations
21-Mar   Zhiwu & Shoaib & Markus Group coursework presentations
22-Mar   Zhiwu & Shoaib & Markus Group coursework presentations
Easter      
22-Apr 9 Zhiwu Semantic Spaces & Latent Semantics
23-Apr   Zhiwu Topic Modelling
25-Apr   Zhiwu Outlier Detection
29-Apr 10 Markus Link Prediction on Networks
30-Apr   Markus Community Detection on Networks
02-May   Markus Exploiting network structure for IR
06-May 11   Bank Holiday
07-May   Shoaib Mining Data Streams
09-May   Shoaib Exam Revision Q&A
13-May 12 Markus Exam Revision Q&A
14-May   Zhiwu Exam Revision Q&A
16-May     CW Due
20-May 13   Exams

Lecture Materials

Materials to accompany the lectures can be found here:

Coursework

The schedule for each coursework is shown below. Further details will become available below as each coursework is set:

Link for forming groups for the coursework Link.

Where to get additional help

Here are a couple of documents to help you. The first is an overview of the course, so that you can see the structure:

Talk to us! As we said above, you are more than welcome to arrange to meet with us via Teams (or in the unused lecture slots / at other times if appropriate) to discuss issues related to the course. Zhiwu (email), Shoaib (email) and Markus (email) can all be reached by email.

Copyright ©2019 The University of Southampton. All rights reserved.