Cluster analysis project, Database Management System

(a) Data Mining Process: In the context of this cluster analysis project, and in your own words, explain how you would execute the first stage of data mining, namely the "Pre-modelling" stage. Be sure to differentiate the sub-tasks in this stage

(b) Pre-modelling: Describe the potential business problem and data mining problem in the context of this project. Be sure to differentiate these two problems in your description.

(c) Data Preparation: Use the "seeds_dataset_twoClass.csv" file to prepare the dataset for cluster analysis. You can use the following table format to justify the data type (i.e., measurement) and direction (i.e., role) used for each attribute.



Data Type

(or Measurement)

Direction (or Role)

(Input, Target or None)


(d) Data Exploration: Analyse the dataset "seeds_dataset_twoClass.csv" using the following summary statistics in the Data Audit node. Discuss the use of these summary statistics for deciding if further data preparation is required.

a. Mean and Standard Deviation (Std. Dev), Min and Max

b. % Complete and Valid Records

c. Outliers and Extremes

 (e) Data Preparation: From the scenario and data given, explain why the attribute A3 (compactness) is probably not useful for cluster analysis. Prepare the data (for mining) by filtering out this field using IBM SPSS Modeller.

(f) Executing Clustering Technique: Decide on the number of clusters (i.e., K) and then execute K-Means on the filtered dataset. Assess the appropriateness of applying K-Means on this dataset. Interpret the clustering results.

(g) Interpreting Clustering Results: Use the Graphboard node to generate a scatter plot based on attributes A4 and A5. The plot should show each data point labelled or coloured based on the cluster number assigned by K-Means. Evaluate the clustering results using this plot (and you may also use the project information given in the Background section of this assignment).

(h) Data Preparation: Having read your preliminary analysis, a colleague gave the following comment: "the dataset should have been normalised before the clustering process." Evaluate the clustering solutions with and without normalisation and then discuss whether normalisation is necessary in this case.

Posted Date: 2/20/2013 1:23:45 AM | Location : United States

Related Discussions:- Cluster analysis project, Assignment Help, Ask Question on Cluster analysis project, Get Answer, Expert's Help, Cluster analysis project Discussions

Write discussion on Cluster analysis project
Your posts are moderated
Related Questions
Ask Write the SQL code that will create the table structure for a table named EMP _1. This table is a subset of the EMPLOYEE table. The basic EMP_1 table structure is summarized in

With the help of a diagram defines the typical element modules of a DBMS? The functional elements of a database system can be widely divided into: 1) Storage manager and 2) q

, write the relational schema, and draw the new dependency diagrams. Identify the normal forms for each table structure you created

Explain the functions of data manager? Functions of Data Manager : - Convert operations in user's queries coming directly through query processor or indirectly in an applicatio

All edges of a cube are expanding at a rate of 6 centimeters per second. How fast is the surface area changing when each edge is (a) 2 centimeters and (b) 10 centimeters?

Determine the rule that a value of a foreign key should appear as a value of some particular table is called a  Ans: Referential constraint.    The rule that a value of a fo

Critically evaluate the bulleted list of information-related items in this case study. How are each contradictory to the notion of being an information-literate knowledge worker?

You are required to write a report which evaluates two of the following issues in relation to your case study database: a. Security issues b. Performance issues c. Backup