What is the dominant vehicle class in each cluster

Assignment Help Other Engineering
Reference no: EM131726788

K-Means Clustering

Part 1 -

1. Introduction - Cluster analysis involves grouping things together so that the members of each group are more similar to each other than to members of other groups. There are numerous algorithms or models associated with clustering such as k-means clustering, hierarchical clustering, and density models. 

Cluster analysis is popular in market segmentation. For example, the market for a product or service may be segmented into groups of customers or regions that share common interests or are similar in terms of their preferences and socio-economic attributes. An appropriate marketing strategy may then be devised to serve the needs of identified segments better.

This part involves using k-means clustering as a clustering tool to be applied to a data mining study within your domain of interest using R and RStudio.

2. Steps to Completion - For each study the general procedure is to:

  • Review theoretical background based on available resources in the course content
  • Select a dataset from the module's recommended datasets list
  • Run an analysis, perform evaluation, and capture the results
  • Document your findings and analysis in a data mining analytical report

3. Deliverables - Submit your analysis report by addressing the following critical areas:

Introduction: give some background and context about the domain of application, provide the rationale for the type of analysis, and state the objective clearly.

Analysis: describe the data both qualitatively and quantitatively through exploratory analysis, perform necessary preprocessing activities, give some intuition about the algorithm and core parameters, demonstrate the model building steps along with parameter tuning, and explain all your assumptions.

Result: explain the result and interpret the model output using terms that reflect the application area, perform model evaluation using the appropriate metrics, and leverage visualization.

Conclusion: summarize your main findings, discuss experimental limitations related to the data and/or implementation of the algorithm, and suggest improvement areas as a potentiation future work.

Miscellaneous:

  • Proof read your report for correct structure, grammar, and spelling
  • Follow appropriate APA formatting and provide all references
  • Include your R script and extended model outputs in an Appendix section.

The length of the report should be 7-10 pages excluding the title page, appendix and R script.

Part 2 -

Run an exercise on a vehicle dataset and write a report on your findings and results interpretation in your own words. The report needs to cover the exercise key points below in order.

Download the vehicle.csv file to your hard drive.

1. Introduction - What do you expect the k-means clustering method to accomplish for the vehicle data?

2. Data pre-processing

  • Run the set.seed command. Include the command on the report and explain the reason for running this command.
  • Load the data from vehicle.csv file into R. Create a copy of the vehicle dataset called myvehicle. Include the command in the report.
  • Remove the variable class from a myvehicle. Include the command in the report, and explain why we remove the class variable.
  • Run the scale command to scale the myvehicle. Include the command in the report, and explain why we scale data.
  • Discuss any additional data pre-processing that you run. Include the commands and explain what each command does in the report.

3. Run the kmeans method with k=4 and store the output in the variable kc. Include the command in the report and discuss the input parameters you used. Enter kc at the command prompt and hit enter.  Include the command output in the report and answer the following questions.

  • How many instances are in each cluster?
  • What information does the cluster means section of an output provides and how were the numbers obtained?
  • What is clustering vector?
  • What is sum of squares by cluster, and what does it mean?
  • Run the kc$iter command, and explain what the output shows. Include the command, the output, and explanation in the report.

4. Clustering evaluation

Build the cross-tabulation to compare how the method clustered the vehicles with the actual vehicle class.  Include the command and the output in the report.  Answer the following questions.

  • What is the dominant vehicle class in each cluster?
  • What additional information does the table show?
  • What percentage of vehicles were clustered in agreement with the actual class?

5. Build the cluster plot.  Include the command, the plot, and the plot interpretation in the report.

6. Experiment with 3 different k values, and summarize the findings in the tabular format.

k

Number of instances in each cluster

Between clusters sum of squares

Within clusters sum of squares

Number of iterations

4





Value of your choice





Value of your choice





Value of your choice





Explain the effect of k values on method results.

What is an ideal value of k for the vehicle data?  (This is an open-ended question)

7. Summary

  • What differences between k-means clustering and classification methods did you observe?
  • Which part of this exercise did you find the most challenging and which approach did you take to resolve the challenge?

Attachment:- Assignment Files.rar

Reference no: EM131726788

Questions Cloud

What are the implications for companies : Does it pay to be socially responsible? Explain why or why not. What are the implications for companies?
Compute the percentage who receive a : A professor has noticed that even though attendance is not a component of the grade for his class, students who attend regularly obtain better grades.
Discuss who takes extensive notes in a difficult class : Intelligence is the ability to acquire and apply knowledge that is acquired. An example would be a student who takes extensive notes in a difficult class
Governs regulatory compliance of the manufacturer : What administrative agency governs regulatory compliance of the manufacturer?
What is the dominant vehicle class in each cluster : Build the cross-tabulation to compare how the method clustered the vehicles with the actual vehicle class. What is the dominant vehicle class in each cluster
Illustrate your analysis or comparison and contrasts : Choose any of these works: The Things They Carried. You need to use quoted material from each story to illustrate your analysis or comparison and contrasts.
Probability that the randomly selected women are taller : Recall that the Empirical Rule stated that for bell-shaped distributions, about 68% of the values fall within one standard deviation of the mean.
Industry analysis by describing the competitors-products : Develop and industry analysis by describing the competitors, products, selling techniques, and market conditions in a given industry.
Discuss personal attributes that center around skill : personal attributes that center around skill at information processing, problem solving, and adapting to new or changing environments

Reviews

len1726788

11/17/2017 1:43:38 AM

Submit the following files in the Exercise 7 Assignment folder. The report addressing the key points above. An R script with commands your ran and brief comments on the commands purpose. Conclusion: summarize your main findings, discuss experimental limitations related to the data and/or implementation of the algorithm, and suggest improvement areas as a potentiation future work.

Write a Review

Other Engineering Questions & Answers

  Characterization technology for nanomaterials

Calculate the reciprocal lattice of the body-centred cubic and Show that the reciprocal of the face-centred cubic (fcc) structure is itself a bcc structure.

  Calculate the gasoline savings

How much gasoline do vehicles with the following fuel efficiencies consume in one year? Calculate the gasoline savings, in gallons per year, created by the following two options. Show all your work, and draw boxes around your answers.

  Design and modelling of adsorption chromatography

Design and modelling of adsorption chromatography based on isotherm data

  Application of mechatronics engineering

Write an essay on Application of Mechatronics Engineering

  Growth chracteristics of the organism

To examine the relationship between fermenter design and operating conditions, oxygen transfer capability and microbial growth.

  Block diagram, system performance and responses

Questions based on Block Diagram, System Performance and Responses.

  Explain the difference in a technical performance measure

good understanding of Mil-Std-499 and Mil-Std-499A

  Electrode impedances

How did this procedure affect the signal observed from the electrode and the electrode impedances?

  Write a report on environmental companies

Write a report on environmental companies

  Scanning electron microscopy

Prepare a schematic diagram below of the major parts of the SEM

  Design a pumping and piping system

creating the pumping and piping system to supply cool water to the condenser

  A repulsive potential energy should be a positive one

Using the data provided on the webvista site in the file marked vdw.txt, try to develop a mathematical equation for the vdW potential we discussed in class, U(x), that best fits the data

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd