Reference no: EM13918762
Assignment:
- Part I:
- Use the data collected in the attachments as Excel file.
- Select a reasonable size of objects to represent the population, -e.g., 2000, 1000, 500, ... ( your call as the domain expert)
- Select a set "representative attributes", e.g., 8, 15, 20, .... (your call as the domain expert)
- Decide a similarity measure between any two objects.
- Number the objects.
- Note: it is a good idea to "remove" those outliers and objects with missing values.
-
- Part II:
- Select a "k" as the number of clusters. (Justify k as the domain expert.)
- Manuallly select k centroids.
- Cluster the data into k clusters.
- Compute the SE for each cluster. Show the sum of SEs.
- Explain the clustering.
- Randomly select k centroids.
- Cluster the data into k clusters.
- Compute the SE for each cluster. Show the sum of SEs.
- Explain the clustering difference between this clustering and the previous one.
-
- Part III:
- Add a new feature on this assignment.
- Explain your algorithm, program, and result.
- For example:
- A new way to select data.
- A new way to calculate similarity, sepecifically on your own unique data.
- A new clustering algorithm.
- A new way to calculate the effectiveness of clustering.
A new way to visualize the clustering.
• What to turn in-
- Part I:
- number of objects selected, number of attributes selected, and why and how to select them?
- explain the similarity measure function, i.e., what is the similarity/dis-similarity between any two objects?
-
- Part II:
- what is k- and why k- What are the k centroids?
- what is the clustering result?
- what is the SSE for this clustering?
-
- Repeat the above for randomly selected centroids.
-
- Part III:
- Clearly describe your idea, algorithm, program, and result.
-
- Part IV:
Source Code and explain how to run your program.
Attachment:- project datamining.rar
Percentage of reported rape cases
: Question 5: What percentage of reported rape cases do police make an arrest?
|
Moncton to vancouver with a connection in montreal
: Please don't forget to complete your signed statement of Academic Integrity within the body of your solution Submit a PDF of your type-written (i.e., not handwritten) solution (recall that a submission cannot be marked unless it is in PDF format).
|
Recent accounts receivable turnover
: Assets 2014 2013 Cash $ 2,800 $ 1,800 Accounts receivable 7,800 5,800 Inventory $ 28,000 $ 29,000 Assuming that net credit sales for the year 2014 totaled $153,000, what is the company's most recent accounts receivable turnover
|
Identify and describe eastman kodak key stakeholders
: Make an initial assessment of the degree to which core business funcitons are designed to enhance organizaitonal knowledge and capabilities that enable Kodak to create a sustained competitive advantage in the digital imaging industry; are there ga..
|
Decide a similarity measure between any two objects
: Select a reasonable size of objects to represent the population and Decide a similarity measure between any two objects - Explain the clustering difference between this clustering and the previous one.
|
Discuss executive succession in general from the aspect
: Discuss Executive Succession in general from the aspect of how companies go about it and the effect that executive turnover has on corporate performance. List any sources that are used.
|
Unpaid portion of salaries
: Record the journal entry necessary at December 31, 2005 to account for both the paid and unpaid portion of salaries.
|
Financial statement analysis of a public company
: Write a five-to seven-page financial statement analysis of a public company, formatted according to APA style as outlined in the Ashford Writing Center.
|
Direct materials price variance
: Actual costs for the production of 7,000 faucets were $41,359.50 for materials (106,050 ounces purchased and used @ $.39 per ounce) and $21,560 for labor (98,000 minutes @ $.22 per minute). 1. Moreland's direct materials price variance is:
|