Why might k-means clustering not be a good method to use

Assignment Help Applied Statistics
Reference no: EM131578430

Case Problem: Know Thy Customer

Know Thy Customer (KTC) is a financial consulting company that provides personalized financial advice to its clients. As a basis for developing this tailored advising, KTC would like to segment its customers into several representative groups based on key characteristics.

Peyton Avery, the director of KTC's fledging analytics division, plans to establish the set of representative customer profiles based on 600 customer records in the file KnowThyCustomer. Each customer record contains data on age, gender, annual income, marital status, number of children, whether the customer has a car loan, and whether the customer has a home mortgage. KTC's market research staff has determined that these seven characteristics should form the basis of the customer clustering.

Peyton has invited a summer intern, Danny Riles, into her office so they can discuss how to proceed. As they review the data on the computer screen, Peyton's brow furrows as she realizes that sis task may not be trivial. The data contains both categorical variables (Female, Married, Car, Mortgage), and interval variables (Age, Income, and Children).

Managerial Report

Playing the role of Peyton, you must write a report documenting the construction of the representative customer profiles. Because Peyton would like to use this report as a training reference for interns such as Danny, your report should experiment with several approaches and explain the strengths and weaknesses of each. In particular, your report should include the following analyses:

1. Using k-means clustering on all seven variables, experiment with different values of k. Recommend a value of k and describe these k clusters according to their "average" characteristics. Why might k-means clustering not be a good method to use for these seven variables?

2. Using hierarchical clustering all seven variables, experiment with using complete linkage and group average linkage as the clustering method. Recommend a set of customer profiles (clusters). Describe these clusters according to their "average" characteristics. Why might hierarchical clustering not be a good method to use for these seven variables?

3. Apply a two-step clustering method:

a. Apply hierarchical clustering on the binary variables Female, Married, Car, and Mortgage to recommend a set of clusters. Using Matching Coefficients as the similarity measure and group average linage as the clustering method.

b. Based on the clusters from part (a), split the original 600 observations into m separate data sets, where m is the number of clusters recommended from part (a). For each of these m data set, apply 2-means clustering using Age, Income, and Children as variables. This will generate a total of 2m clusters. Describe these 2m clusters according to their "average" characteristics.

What benefit does this two-step clustering approach have over the approaches in parts (1) and (2)? What weakness does it have?

Attachment:- Know-the-Customer.rar

Reference no: EM131578430

Questions Cloud

Explain a third personal or career goal you have : Explain a third personal or career goal you have that can be accomplished in the next three to twelve months. - One paragraph of 3-5 sentences.
Define goal to pay his credit card balance : Nick has a goal to pay his credit card balance in full by June 30. When he first wrote the goal in December, his balance was $2,500.
Discussion on burglaries and domestic disputes : Address the Discussion BoardForum topic thoroughly. Provide support for your comments through interaction with information from course
Find its ytm in apr and ear : A 20-year, 4% quarterly coupon, $1,000 par value bond is selling for $1,075.31 with. Find its YTM in (a) APR and (b) EAR.
Why might k-means clustering not be a good method to use : Why might k-means clustering not be a good method to use for these seven variables? Describe these clusters according to their "average" characteristics.
What is a recurrence for the running time : What is a recurrence for the running time of Select1? Hint: how could Exercise help you?
What was the delta ray brands net income after-tax : Delta Ray Brands Corp. just completed their latest fiscal year. Assuming a federal income tax rate of 34%, what was the Delta Ray Brands net income after-tax
Express the running time of magic middle : Let T(n) be the running time of the modified Select1 on n items. How can you express the running time of Magic Middle in terms of T(n)?
All major stock markets today are effectively : All major stock markets today are effectively _______________.

Reviews

Write a Review

Applied Statistics Questions & Answers

  Discuss the following concepts and give examples from every

1. Discuss the following concepts and give examples from everyday life in which you might encounter each concept. Hint: For instance, consider the experiment of arriving for class. Some possible outcomes are not arriving (missing class), arriving on ..

  Find probability that exactly 5 rats have recessive traits

Find the probability that exactly 5 rats have recessive traits and find the probability that no more than 9 rats have recessive traits.

  What is the interpretation of r-square

What is the interpretation of R-square (just use the latest output) and how to calculate correlation based on it?

  What are the key issues in health care right now

What are the key issues in health care right now? How is research used-or how could it be used-in each of these areas by health care organizations?

  Estimate the difference between the population average time

Sammy the Statistics Student wanted to estimate the difference between the population average time of his AM(x1) and PM(x2) commutes. Using 40 AM commutes and 31PM commutes, he found a 95% confidence interval for the difference of mean times of 3 to ..

  Simulation has become very significant tool for many

simulation has become very significant tool for many business entities in last couple of decades. using software

  An absence of relationship between variables

A formal statement that there is an absence of relationship between variables when tested by a researcher is called what?1.Null hypothesis?

  Forecasting with time series analysis

Develop a sales forecast for each store for December Plot the data for each store.Use data in table.

  The mean salary of federal government employees

The mean salary of federal government employees on the General Schedule is $59,593. The average salary of 30 state employees who do similar work is $58,800 with population standard deviation = $1500. At the 0.01 level of significance, can it be concl..

  Calculate the median for the group of results

State the five types of central tendency in general use. Which of these are most frequently used in SPC work and calculate the median for the group of results tabulated below: 1,3,4,6,7,9,11,14,16,17,18.

  Lake has a distribution that is approximately normal

The length of 1 year old fish from the lake has a distribution that is approximately normal with a mean of 16 cm and a standard deviation of 6 cm. a) What proportion of 1 year old fish from the lake are shorter than 10cm ?

  Ways in which we as researchers can present our data

What are some ways in which we as researchers can present our data and our findings in a way that can be understood my managers, employees and other stakeholders?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd