Implementing two clustering validation measures

Assignment Help Applied Statistics
Reference no: EM132271097

Assignment -

In this assignment, you will be implementing two clustering validation measures: Normalized Mutual Information (NMI) and Jaccard similarity.

You will be given one set of ground-truth clustering (partition) results and five clustering test cases. You need to evaluate the clustering test cases with regard to the ground-truth by NMI and Jaccard measures and submit your measures. You will be graded based on whether your measures are correct.

Use base 2 for all the logarithm in NMI calculation.

The ground-truth clustering (partition) results are stored in file "partitions.txt"; the five clustering result test cases are stored in file "clustering_l .txt", "clustering_5.txt".

All files including partitions.txt, clustering_l .txt, ..., can be downloaded from the data.zip file attached.

Each clustering result (both ground-truth and test cases) is represented by a file. Each line in a file consists of two integers, separated by a space. The first integer represents the id of a data item, and the second integer represents the id of the cluster that this item belongs to.

You need to submit a file titled "scores.txt" consisting of 5 lines. Each line contains two float numbers separated by a space. The first number of the i-th line represents the NMI measure you calculated for the i-th test case i (i.e., "clustering_i.txt") with regard to the ground-truth given in "partitions.txt", and the second number of the i-th line represents the Jaccard measure you calculated for the i-th test case.

Note - Your file format is correct and on how many of the measures you submitted are correct.

Attachment:- Assignment Files.rar

Reference no: EM132271097

Questions Cloud

Describe a specific real-life hrm challenge : Describe a specific real-life HRM challenge. Identify and explain a concept, model or theory from the textbook that can be applied to the management challenge
Investment cost associated with installing optioni : If a company invest in production improvement option D that will boost labor productivity by 50% while its annual depreciation Asian cost will rise by an amount
What generates good customer experience the customer gained : BIZ104 Customer Experience Management Assessment - Customer Experience Reflection, Laureate International Universities, Australia
How has globalization affected different world regions : How has globalization affected different world regions? What are some of the benefits and costs of globalization for different sectors of society.
Implementing two clustering validation measures : In this assignment, you will be implementing two clustering validation measures: Normalized Mutual Information (NMI) and Jaccard similarity
Uncle leverage has heard of disruptive innovation : Neither UNCLE MASK nor UNCLE LEVERAGE has heard of disruptive innovation before. Both have heard of Elon Musk, however,
Is workforce inequality healthy for organizations workforce : Is workforce inequality healthy for an organizations workforce, customer-to-business. Why or Why not?
Define the differences between skill and competencies : Define the differences between skill, knowledge, and competencies
How does being forward-looking differentiate leaders : Describe ethical decision-making and the impacts within the organization. How does being forward-looking differentiate leaders?

Reviews

Write a Review

Applied Statistics Questions & Answers

  A developer of condominium properties in the southwest

Rosenberg Land Development (RLD) is a developer of condominium properties in the Southwest United States. RLD has recently acquired a 40.625 acre site outside of Phoenix, Arizona. Zoning restrictions allow at most 8 units per acre. Three types ..

  Post a response - sampling vs. reliability

The larger the sample, the more reliable the results." Do you agree or disagree with this statement? Explain - What facts / statistics would you need to know

  If you were conducting a two sample t test to compare two me

If you were conducting a two sample T-test to compare two means, which of the following would allow you to properly use the pooled method in order to perform the test? A) If the larger sample standard deviation was 5 and the smaller sample standard d..

  Calculate the rate of claims

Identify the exposure input, i.e which variable in Claims gives the exposure? Explain your answer - what is the total number of categories for counts

  What would you consider to be an appropriate balance

What would you consider to be an appropriate balance of the major macromolecules (complex carbohydrates, sugars, protein, saturated fats, unsaturated fats) in your daily diet? Do you try to achieve this balance, and if so, how? Are there any macromol..

  What is the probability of getting 6 in 4 throws

A fair die is thrown 10 times. What is the probability of getting 6 in 4 throws?

  Fuel consumption and cars

Fuel Consumption and Cars. The Fiat 500E is an electric car that can travel approximately 80-100 miles on a full charge. The time it takes to fully recharge depends on the percent depleted, but is approximately normal with mean 2.2 hours and standard..

  Describe the results of your hypotheses tests

For regressions describe the results of your hypotheses tests. Report the coefficient of determination (Adjusted R squared) and what this means

  Report of the results of inferential test

Your task will be to create a document that includes: a report of the results of inferential test using standard conventions

  What is your criteria for rejecting the null

We are studying hypothesis testing. My instructor asked: Once the p-value is determined, how do you decide whether or not your evidence is strong enough?  In other words, what is your criteria for rejecting the null hypothesis?

  You are expected to complete a project related to

you are expected to complete a project related to inferential statistics. the project must contain the following

  Examine the traced and medicalaid relationship

Examine the relationship between Traced by MedicalAid. Is there evidence that whether or not a child was traced is independent of whether the mother had medical aid

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd