Welcome to the homepage for the ECS COMP6237 Data Mining module.
The challenge of data mining is to transform raw data into useful information and actionable knowledge. Data mining is the computational process of discovering patterns in data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and data management.
This course will introduce key concepts in data mining, information extraction, and information indexing; including specific algorithms and techniques for feature extraction, clustering, outlier detection, topic modelling and prediction of complex unstructured data sets. By taking this course, you will be given a broad view of the general issues surrounding unstructured and semi-structured data and the application of algorithms to such data. At a practical level, you will have the chance to explore an assortment of data mining techniques which you will apply to problems involving real-world data.
The lectures for this course will be given by Dr Markus Brede (email), Dr Zhiwu Huang (email) and Dr Shoaib Ehsan (email)
The lecture slots are as follows:
Day | Time | Room |
---|---|---|
Mondays | 12 PM | B06 1081 (L/R B) |
Mondays | 5 PM | B06 1081 (L/R B) |
Tuesdays | 10 AM | B07 3031 (L/R F2) |
Thursdays | 11 AM | B07 3027 (L/R F1) |
There will generally be three lectures each week, making use of the Monday (12 pm), Tuesday and the Thursday slots. For some of the weeks, we may also use the Monday (5 pm) slot.
The current timetable is shown below - be aware that this might change (especially if you ask us to add additional tutorial sessions):
Date | Semester Week | Lecturer(s) | Topic/Title |
---|---|---|---|
27-Jan | 1 | Shoaib | Intro to data mining |
30-Jan | Shoaib | Linear Regression I | |
03-Feb | 2 | Shoaib | Linear Regression II (MLE); Group CW set |
04-Feb | Shoaib | Linear Regression III | |
06-Feb | Shoaib | Linear Regression Problem Sets | |
10-Feb 12pm | 3 | Shoaib & Zhiwu | Group coursework Q & A |
10-Feb 5pm | Shoaib | Logistic Regression | |
11-Feb | Shoaib & Zhiwu | Group coursework Q & A | |
13-Feb | Shoaib & Zhiwu | Group coursework Q & A | |
17-Feb 12pm | 4 | Zhiwu | Making Recommendations |
17-Feb 5pm | Shoaib | Dealing with non-linear data | |
18-Feb | Zhiwu | Finding Groups | |
20-Feb | Zhiwu | Covariance | |
24-Feb | 5 | Zhiwu | Embedding Data |
25-Feb | Zhiwu | Search | |
27-Feb | Zhiwu | Document filtering | |
03-Mar | 6 | Zhiwu | Modelling with decision trees |
04-Mar | Zhiwu | Modelling Prices & Nearest Neighbours | |
06-Mar | Zhiwu | Market Basket Analysis | |
10-Mar | 7 | Zhiwu | Semantic Spaces & Latent Semantics |
11-Mar | Zhiwu | Topic Modelling | |
13-Mar | Zhiwu | Outlier Detection | |
17-Mar | 8 | Shoaib & Zhiwu & Markus | Group Coursework Presentations |
18-Mar | Shoaib & Zhiwu & Markus | Group Coursework Presentations | |
20-Mar | Shoaib & Zhiwu & Markus | Group Coursework Presentations | |
24-Mar | 9 | Shoaib | Logistic Regression Problem Sets |
25-Mar | Shoaib | Intro to Information Theory | |
27-Mar | Shoaib | Information Theory II | |
Easter | |||
28-Apr | 10 | Markus | Link Prediction on Networks |
29-Apr | Markus | Community Detection on Networks | |
01-May | Markus | Exploiting network structure for IR | |
05-May | 11 | Bank Holiday | |
06-May | Shoaib | Mining Data Streams | |
08-May | Shoaib | Exam Revision Q&A | |
12-May | 12 | Zhiwu | Exam Revision Q&A |
13-May | Markus | Exam Revision Q&A | |
16-May | CW Due | ||
20-May | 13 | Exams |
Materials to accompany the lectures can be found here:
The schedule for each coursework is shown below. Further details will become available below as each coursework is set:
Link for forming groups for the coursework Link.
Here are a couple of documents to help you. The first is an overview of the course, so that you can see the structure:
Talk to us! As we said above, you are more than welcome to arrange to meet with us via Teams (or in the unused lecture slots / at other times if appropriate) to discuss issues related to the course. Zhiwu (email), Shoaib (email) and Markus (email) can all be reached by email.