Perform unsupervised learning of data in a real-world

Assignment Help Computer Engineering
Reference no: EM132090380

Assessment: Individual Problem solving task

Learning Outcomes

This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO):

ULO 1: Apply suitable clustering/dimensionality reduction techniques to perform unsupervised learning of data in a real-world

Purpose
In this assignment, you need to demonstrate your skills for data clustering and dimensionality reduction. There are two parts of this assignment

Instructions
This is an individual assessment task of maximum 20 pages including all relevant material, graphs, images and tables. Students will be required to provide responses for series of problem situations related to their analysis techniques. They are also required to provide evidence through articulation of the scenario, application of programming skills, analysis techniques and provide a rationale for their response

Task A - Clustering
Download BBC sports dataset from the Cloud. This dataset consists of 737 documents from the BBC Sport website corresponding to sports news articles in five topical areas from 2004-2005. There are 5 class labels: athletics, cricket, football, rugby, tennis. The original dataset and raw text files can be downloaded from here

1. There are 3 files in the dataset corresponding to the feature matrix, the class labels and the term dictionary. You need to read these files in Python notebook and store in variables X, trueLabels, and terms.

2. Next perform K-means clustering with 5 clusters using Euclidean distance as similarity measure. Evaluate the clustering performance using adjusted rand index and adjusted mutual information. Report the clustering performance averaged over 50 random initializations of K-means

3. Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. Report the clustering performance and compare it with the results obtained in step 2

4. For clustering cases (Euclidean distance and the other similarity measure), visualize the cluster centres using Tag cloud using Python package WordCloud.

Task B - (Dimensionality Reduction using PCA/SVD

For the provided BBC sports dataset, perform PCA and plot the captured variance with respect to increasing latent dimensionality. What is the minimum dimension that captures (a) at least 95% variance and (b) at least 98% variance?

Reference no: EM132090380

Questions Cloud

Services combine security information management : Creation of SIEM Labs - Outline of Assignment - Visualization with a SIEM using security events and log failures can aid in pattern detection
How much must your employer borrow to end up : How much must your employer borrow to end up with the $550,000?
What is the annual interest cost on the loan : The company is required to maintain a 20% compensating balance in its chequing account. What is the annual interest cost on the loan?
How long do you have to pay the supplier to receive : How long do you have to pay the supplier to receive that discount? If you don't get the discount, how long do you have to pay 5,000?
Perform unsupervised learning of data in a real-world : Perform PCA and plot the captured variance with respect to increasing latent dimensionality. What is the minimum dimension that captures (a) at least 95
Cash flow streams has the higher pv : Which of these cash flow streams has the higher PV if the discount rate is 10%? If the discount rate is 40%? Please show the full calculations.
Company has an equity multiplier : If a company has an equity multiplier of 1.15, total asset turnover of 2.10, and a profit margin of 6.1 percent, what is its ROE?
What will the share price be after this announcement : If only imperfect is corporate taxes, what will the share price be after this announcement? please explain as I'd like to know personally.
Putting a portion towards paying off existing debts : You recently came into a large sum of cash. After putting a portion towards paying off existing debts, you have decided to put the rest into investments

Reviews

len2090380

8/20/2018 3:50:51 AM

PART 2 Excellent Good Fair Unsatisfactory For the provided BBC sports dataset: * Perform PCA * Plot the captured variance with respect to increasing latent dimensionality. * What is the minimum dimension that captures (a) at least 95% variance and (b) at least 98% variance? 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed any one of the three tasks. 0 mark Failed to complete any given task.

len2090380

8/20/2018 3:50:45 AM

Criteria 3: * Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. * Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. * Report the clustering performance and compare it with the results obtained in step 2 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed any one of the three tasks. 0 mark Failed to complete any given task.

len2090380

8/20/2018 3:50:38 AM

Criteria 3: * Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. * Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. * Report the clustering performance and compare it with the results obtained in step 2 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed any one of the three tasks. 0 mark Failed to complete any given task.

len2090380

8/20/2018 3:50:32 AM

Criteria 2: * Perform K-means clustering with 5 clusters using Euclidean distance as similarity measure. * Evaluate the clustering performance using adjusted rand index and adjusted mutual information. * Report the clustering performance averaged over 50 random initializations of K-means. 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed only one of the three tasks. 0 mark Failed to complete any given task.

len2090380

8/20/2018 3:50:24 AM

Criteria Excellent Good Fair Unsatisfactory Criteria 1: Reading files corresponding to the feature matrix, class labels and the term dictionary and store them in variables X, true Labels and terms using Python notebook. 5 marks Successfully read all files and stored in corresponding variables using Python notebook. 3 marks Partially achieved the goal by missing reading or storing one file or variable. 2 marks Only able to either reading files or creating variables in Python to store any value. 0 mark Fail to read and store using Python notebook.

len2090380

8/20/2018 3:50:13 AM

Deakin University has a strict standard on plagiarism as a part of Academic Integrity. To avoid any issues with plagiarism, students are strongly encouraged to run the similarity check with the Turnitin system, which is available through Unistart. A Similarity score MUST NOT exceed 39% in any case. Late submission penalty is 5% per each 24 hours from 11.30pm, 22nd No marking on any submission after 5 days (24 hours X 5 days from 4pm 10th) Be sure to downsize the photos in your report before your submission in order to have your file uploaded in time.

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd