Evaluate clustering performance using adjusted rand index

Assignment Help Other Subject
Reference no: EM132091500

Instructions

This is an individual assessment task of maximum 20 pages including all relevant material, graphs, images and tables. Students will be required to provide responses for series of problem situations related to their analysis techniques. They are also required to provide evidence through articulation of the scenario, application of programming skills, analysis techniques and provide a rationale for their response

Task A - Clustering

Download BBC sports dataset from the Cloud. This dataset consists of 737 documents from the BBC Sport website corresponding to sports news articles in five topical areas from 2004-2005. There are 5 class labels: athletics, cricket, football, rugby, tennis. The original dataset and raw text files can be downloaded from here

1. There are 3 files in the dataset corresponding to the feature matrix, the class labels and the term dictionary. You need to read these files in Python notebook and store in variables X, trueLabels, and terms.

2. Next perform K-means clustering with S clusters using Euclidean distance as similarity measure. Evaluate the clustering performance using adjusted rand index and adjusted mutual information. Report the clustering performance averaged over 50 random initializations of K-means

3. Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. Report the clustering performance and compare it with the results obtained in step 2

4. For clustering cases (Euclidean distance and the other similarity measure), visualize the cluster centres using Tag cloud using Python package WordCloud.

Task B - Dimensionality Reduction using PCA/SVD

For the provided BBC sports dataset, perform PCA and plot the captured variance with respect to increasing latent dimensionality. What is the minimum dimension that captures (a) at least 95% variance and (b) at least 98% variance?

Attachment:- bbcsport_terms.rar

Reference no: EM132091500

Questions Cloud

What is the total force on the coil : The magnetic field is perpendicular to the wires in the coil and has a magnitude of 0.30 T. If the current in the coil is 270 mA
Analyse a web site from the web engineering point of view : SIT725 Software Engineering Assignment - To investigate and analyse a web site from the Web engineering point of view
What is the mass of the object : An object suspended from a spring with a spring constant of 2.56 N/m vibrates with a frequency of 0.148 Hz.
Consider the target market your own organizations face : Consider the target market your own organizations face. Who are they and how can they be researched?
Evaluate clustering performance using adjusted rand index : Evaluate the clustering performance using adjusted rand index and adjusted mutual information. Report the clustering performance averaged over 50 random
Implementation of tools for measuring business success : Internal and external issues, competition, future outlook for the organization, and implementation of tools for measuring business success.
Is the image formed upright or inverted : A concave mirror has a focal length of 10 cm. An object is placed 30 cm away from it. Is the image formed upright or inverted?
How will you be able to apply the skills you learned : How will you be able to apply the skills you learned from it to gain life and/or career success?
Define knowledge and knowledge management : Define Knowledge and Knowledge management, then determine the Knowledge and Know-how of Carphone Warehouse more particularly

Reviews

inf2091500

10/23/2018 8:35:47 PM

The assignment consists of a practical part and lot more discussion on related topics but experts Mind is really helpful and explained each topic pointwise. I would like to state that the team is supportive and provided excellent work. We get easy access to account and customer care service was really satisfactory.

len2091500

8/21/2018 3:50:17 AM

For clustering cases (Euclidean distance and the other similarity measure reported in previous two tasks), visualise the cluster centres using Tag cloud using Python package WordCloud 5 marks Successfully used the WordCloud Package to visualise the cluster centres using at least two different similarity measures. marks uccessfully used the ordCloud Package to isualise the cluster centres sing at least one similarity easu re. t marks Demonstrated knowledge in NordCloud Package and risualisation, but cannot use hem successfully. D mark Failed to show any evidence of knowledge in WordCloud Package and visualisation.

len2091500

8/21/2018 3:49:57 AM

Criteria 3: • Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. • Evaluate the clustering performance over SO random initializations of K-means using adjusted rand index and adjusted mutual information. • Report the clustering performance and compare it with the results obtained in step 2. i marks successfully completed all three :asks. 3 marks Successfully completed any two of the three tasks. marks uccessfully completed any ne of the three tasks. 0 mark Failed to complete any given task.

len2091500

8/21/2018 3:48:25 AM

Criteria 2: • Perform K-means clustering with 5 clusters using Euclidean distance as similarity measure. • Evaluate the clustering performance using adjusted rand index and adjusted mutual information. • Report the clustering performance averaged over 50 random initializations of K-means. 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed only one of the three tasks. D mark Failed to complete any given task.

len2091500

8/21/2018 3:42:37 AM

Criteria Criteria 1: Reading files corresponding to the feature matrix, class labels and the term dictionary and store them in variables X, true Labels and terms using Python notebook. Excellent S marks successfully read all files and stored in corresponding variables sing Python notebook. Good 3 marks Partially achieved the goal by missing reading or storing one file or variable. Fair 2 marks Only able to either reading files or creating variables in Python to store any value. Unsatisfactory) mark Fail to read and store using Python notebook.

Write a Review

Other Subject Questions & Answers

  National basketball association

An inner-city youth spends many hours in the neighborhood playground shooting baskets on the basketball court and engaging in every game that he can play. His skills become outstanding, and he receives a college scholarship and then signs a lucrative..

  Creating a power point presentation on history of sociology

The first semester long project for this course is the project creating a power point presentation on the history of sociology. In this presentation, you will want to address the origins of sociology, the important early theorists and the theories..

  Case study on understanding sexuality

Normal 0 false false false EN-US X-NONE X-NONE Understanding Sexuality

  What insights can gain from your colleagues job selections

Compare and contrast the positions you researched with those you learned about from reading your colleagues' exploration sheets. Describe specific conclusions you reached, insights you gained, or changes in your perspective you experienced as a re..

  How does lewis gordon''s reflection of du bois question in h

How does Lewis Gordon's reflection of Du Boi's question in his book Existentia Africana show the moral, religious and social implications of the question?

  Discuss the different types of qualitative research design

Define and explain the significance of qualitative research methodology.Discuss the different types of qualitative research design.

  Discuss the importance of the bill of rights

Discuss the importance of the Bill of Rights in our adversary system. List the fundamental rights of the accused in American law

  Build a leveraged portfolio using derivatives

The aim of the essay is to assess student's active learning by asking them to build a leveraged portfolio using derivatives

  Discuss proposed clinical change

Using the Levels listed below, establish an evidence hierarchy relating to the clinical problem Nurse turn over and proposed clinical change

  A sample containing beryllium-aluminum

A sample containing beryllium (atomic mass 9 u), aluminum (27 u), and an unknown element is placed in a mass spectrometer. The ions all have the same charge and are accelerated through the same potential difference before entering the magnetic field.

  Discuss human rights in the united states

Human Rights in the United States, The idea that human rights are a Western conception is a topic of contemporary debate

  Fundamental concept of uniformitarianism

1. Which of the following best describes the fundamental concept of uniformitarianism?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd