Implementing two clustering validation measures

Assignment Help Applied Statistics
Reference no: EM132271097

Assignment -

In this assignment, you will be implementing two clustering validation measures: Normalized Mutual Information (NMI) and Jaccard similarity.

You will be given one set of ground-truth clustering (partition) results and five clustering test cases. You need to evaluate the clustering test cases with regard to the ground-truth by NMI and Jaccard measures and submit your measures. You will be graded based on whether your measures are correct.

Use base 2 for all the logarithm in NMI calculation.

The ground-truth clustering (partition) results are stored in file "partitions.txt"; the five clustering result test cases are stored in file "clustering_l .txt", "clustering_5.txt".

All files including partitions.txt, clustering_l .txt, ..., can be downloaded from the data.zip file attached.

Each clustering result (both ground-truth and test cases) is represented by a file. Each line in a file consists of two integers, separated by a space. The first integer represents the id of a data item, and the second integer represents the id of the cluster that this item belongs to.

You need to submit a file titled "scores.txt" consisting of 5 lines. Each line contains two float numbers separated by a space. The first number of the i-th line represents the NMI measure you calculated for the i-th test case i (i.e., "clustering_i.txt") with regard to the ground-truth given in "partitions.txt", and the second number of the i-th line represents the Jaccard measure you calculated for the i-th test case.

Note - Your file format is correct and on how many of the measures you submitted are correct.

Attachment:- Assignment Files.rar

Reference no: EM132271097

Questions Cloud

Describe a specific real-life hrm challenge : Describe a specific real-life HRM challenge. Identify and explain a concept, model or theory from the textbook that can be applied to the management challenge
Investment cost associated with installing optioni : If a company invest in production improvement option D that will boost labor productivity by 50% while its annual depreciation Asian cost will rise by an amount
What generates good customer experience the customer gained : BIZ104 Customer Experience Management Assessment - Customer Experience Reflection, Laureate International Universities, Australia
How has globalization affected different world regions : How has globalization affected different world regions? What are some of the benefits and costs of globalization for different sectors of society.
Implementing two clustering validation measures : In this assignment, you will be implementing two clustering validation measures: Normalized Mutual Information (NMI) and Jaccard similarity
Uncle leverage has heard of disruptive innovation : Neither UNCLE MASK nor UNCLE LEVERAGE has heard of disruptive innovation before. Both have heard of Elon Musk, however,
Is workforce inequality healthy for organizations workforce : Is workforce inequality healthy for an organizations workforce, customer-to-business. Why or Why not?
Define the differences between skill and competencies : Define the differences between skill, knowledge, and competencies
How does being forward-looking differentiate leaders : Describe ethical decision-making and the impacts within the organization. How does being forward-looking differentiate leaders?

Reviews

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd