Reference no: EM132370173
Machine Learning Assignment - Sleep EEG Data Classification
About the Assignment - This assignment covers the course content delivered in Module 6-10. Specifically, the objectives of the assignment are:
- Obtaining deep understanding of unsupervised machine learning methods.
- Understanding machine learning paradigms.
- Applying unsupervised machine learning techniques to practice.
- Becoming confident and comfortable with data analysis using machine learning techniques.
Requirements - In this assignment, you need to do a practical task and submit a report for classification of sleep electroencephalograph (EEG) signals using unsupervised machine learning techniques covered in CSC8003.
Background information - Sleep EEG Data Classification
We spend about one third of our life asleep. Human sleep is a dynamic process that can be divided into two main states: the rapid eye movement (REM) and the non-rapid eye movement (NREM), whereas the latter can be further divided into 4 stages, namely, Stages 1, 2, 3 and 4. EEG signals are an important source for studying human brain activities and for diagnosing and monitoring neurological diseases, such as sleep disorders. The main process of EEG data classification can be seen in Figure 1.

The assignment will cover only the feature selection and classification tasks. Your target is classifying the 1500 cases sleep EEG data into 2 stages (Awake and Sleep) based on given features set. You are not required to label the cases as Awake and Sleep stage. You just need to classify the cases in two different groups.
Data description - A data set with 1500 cases is given in the attachment, in which each case is the data collected during the observation of three people in a certain unit of time (56 seconds). For all the 1500 cases, there are 5 data sets (x1; x2; x3; x4 and x5) that can be considered feature sets. The feature sets are calculated from the raw EEG data using 5 different feature extraction methods respectively. However, all the feature data x1 from different cases are calculated by the same feature extraction method. So as x2; x3; x4 and x5.
The labels of 1500 cases will be given in another file. However, you can only use the labels to evaluate your classification results. Don't use them for feature selection and model building.
Report - In the report, the following deliverables are expected:
- A survey on application of unsupervised machine learning techniques in practice for Sleep EEG Data classification. The survey should focus on only unsupervised machine learning application on feature selection and model building, excluding feature extraction.
- Analysis of the features in the given data sets.
- According to the data sets and survey, discuss which unsupervised machine learning methods you will use in this assignment with appropriate justification. You need to select two unsupervised machine learning methods (e.g. K means, PCA) covered in the course content.
- Process the feature data sets by using the unsupervised machine learning methods you selected and analyze which data set is more useful for EEG Data classification.
- Two unsupervised machine learning methods are discussed and used to complete the EEG Data classification based on the selected feature data sets independently. As a result, the 1500 cases should be classified into two groups. You should label them as "A and B" instead of "awake and sleep" for ease of practice.
- The feature selection methods and model building results need to be presented in the report clearly, including key equations, figures and (or) tables that may help presentation and discussion. The programming code and result in spreadsheet including supportive figures and (or) tables should also be submitted together with the report (for sake of description, we refer the term "Appendix" to these material in submission such as code and spreadsheets).
- Compare the classification results obtained from your two independent unsupervised learning methods to see if they match each other and discuss why; Compare your classification results with given label. The comparison results should be presented in tables or figures.
- Discuss the advantages and disadvantages of your two unsupervised machine learning methods used in this assignment separately.
The report should be with at least 1200 words in maximum of 10 pages (except title page, table of content, appendix and reference list).
Other Requirements - Some other requirements are specified below:
The unsupervised machine learning methods used in this assignment should not go beyond the course content.
Matlab is suggested for the practical part of the assignment, however, not compulsory. You may use other programming tools if you believe it better suits your background and skill set. The teaching team will try their best to assist you as always, however, the assistance might not be guaranteed if a programming tool instead of Matlab is in use. The teaching team is not able to, and does not claim to have knowledge of all programming tools in possession.
The experimental results are not the only criteria for the outcome of assignment. The deep analysis and justifiable reasoning about machine learning methods in practice are also very important. To obtain the full mark, you should present your analysis and deep understanding clearly and logically along with the results.