Demonstrate your skills for data clustering

Assignment Help Computer Engineering
Reference no: EM132218849

Purpose

In this assignment, you need to demonstrate your skills for data clustering and dimensionality reduction. There are two parts of this assignment

Instructions

This is an individual assessment task of maximum 20 pages including all relevant material, graphs, images and tables. Students will be required to provide responses for series of problem situations related to their analysis techniques.

They are also required to provide evidence through articulation of the scenario, application of programming skills, analysis techniques and provide a rationale for their response

Task A - Clustering

Download BBC sports dataset from the Cloud. This dataset consists of 737 documents from the BBC Sport website corresponding to sports news articles in five topical areas from 2004-2005. There are 5 class labels: athletics, cricket, football, rugby, tennis. The original dataset and raw text files can be downloaded from here

1. There are 3 files in the dataset corresponding to the feature matrix, the class labels and the term dictionary. You need to read these files in Python notebook and store in variables X, trueLabels, and terms.

2. Next perform K-means clustering with 5 clusters using Euclidean distance as similarity measure. Evaluate the clustering performance using adjusted rand index and adjusted mutual information. Report the clustering performance averaged over 50 random initializations of K-means

3. Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. Report the clustering performance and compare it with the results obtained in step 2

4. For clustering cases (Euclidean distance and the other similarity measure), visualize the cluster centres using Tag cloud using Python package WordCloud.

Task B - (Dimensionality Reduction using PCA/SVD)

For the provided BBC sports dataset, perform PCA and plot the captured variance with respect to increasing latent dimensionality. What is the minimum dimension that captures (a) at least 95% variance and (b) at least 98% variance?

Reference no: EM132218849

Questions Cloud

Create an argument that the same goals may be achieved : Create an argument that the same goals may be achieved if the company remains a privately held entity. Provide support for your argument.
What are the types of budgets organizations can use : Budgeting in any organization, whether it is for-profit, nonprofit, or even a governmental agency, is important for: income planning and monitoring.
Identify what ethical issue arise from use of driverless car : Identify what ethical issues arise from use of driverless cars. Will you embrace this emerging technology from ethical perspective?
Explain core concepts related to stocks : The purpose of this assignment is to explain core concepts related to stocks and to analyze the ethical implications of decisions and promote ethical standards.
Demonstrate your skills for data clustering : SIT720: Demonstrate your skills for data clustering and dimensionality reduction.
Describe what contribution margin means : What types of decisions could you make using the contribution margin? Give specific examples of how you might use the contribution margin to make decisions.
Discuss the types of administrative costs : Think about any administrative costs that will be required to get the business started and to operate over the first year.
Define policies to avoid financial statement fraud : In this assignment, you will recommend policies to avoid financial statement fraud and ensure the quality of financial statements.
How do you know if a remote host is alive or not : MN404– How do you know if a remote host is alive or not? How do you find which process is taking how much CPU?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd