Cluster analysis project, Database Management System

Assignment Help:

(a) Data Mining Process: In the context of this cluster analysis project, and in your own words, explain how you would execute the first stage of data mining, namely the "Pre-modelling" stage. Be sure to differentiate the sub-tasks in this stage

(b) Pre-modelling: Describe the potential business problem and data mining problem in the context of this project. Be sure to differentiate these two problems in your description.

(c) Data Preparation: Use the "seeds_dataset_twoClass.csv" file to prepare the dataset for cluster analysis. You can use the following table format to justify the data type (i.e., measurement) and direction (i.e., role) used for each attribute.

 

Attribute

Data Type

(or Measurement)

Direction (or Role)

(Input, Target or None)

Justification

(d) Data Exploration: Analyse the dataset "seeds_dataset_twoClass.csv" using the following summary statistics in the Data Audit node. Discuss the use of these summary statistics for deciding if further data preparation is required.

a. Mean and Standard Deviation (Std. Dev), Min and Max

b. % Complete and Valid Records

c. Outliers and Extremes

 (e) Data Preparation: From the scenario and data given, explain why the attribute A3 (compactness) is probably not useful for cluster analysis. Prepare the data (for mining) by filtering out this field using IBM SPSS Modeller.

(f) Executing Clustering Technique: Decide on the number of clusters (i.e., K) and then execute K-Means on the filtered dataset. Assess the appropriateness of applying K-Means on this dataset. Interpret the clustering results.

(g) Interpreting Clustering Results: Use the Graphboard node to generate a scatter plot based on attributes A4 and A5. The plot should show each data point labelled or coloured based on the cluster number assigned by K-Means. Evaluate the clustering results using this plot (and you may also use the project information given in the Background section of this assignment).

(h) Data Preparation: Having read your preliminary analysis, a colleague gave the following comment: "the dataset should have been normalised before the clustering process." Evaluate the clustering solutions with and without normalisation and then discuss whether normalisation is necessary in this case.


Related Discussions:- Cluster analysis project

Increased overheads on update-data replication, Increased overheads on upda...

Increased overheads on update : On the drawbacks side, it will need the system to ensure that all replicas of a relation are consistent. This shows that all the replicas of the rel

List the data structures implemented by the storage manager, List the data ...

List the data structures implemented by the storage manager. The storage manager executes the following data structure a) Data files b) Data dictionary c) indices

How do you find the final model after iterative analysis, How do you find t...

How do you find the final model after iterative analysis? Why is iterative analysis of any of the problem needed? The final model serves as the base for system architectur

Need website hacking security test, We want have a website; we need to test...

We want have a website; we need to test the security of our website. You need to: -Try to bypass our security by simulate a website hacking -Try some hack Techniques like

Load - unload utilities, These permit the user to unload a database or part...

These permit the user to unload a database or parts of a database and reload the data on the similar machine, or on another machine in a dissimilar location. This can be useful in

What are the aggregate objects in the data dictionary, What are the aggrega...

What are the aggregate objects in the data dictionary? Views Match codes Lock objects.

What is functional dependency, What is Functional Dependency? A Functi...

What is Functional Dependency? A Functional dependency is shown by X Y among two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuple

State the two-way associations, Two-way Associations Mostly, associati...

Two-way Associations Mostly, associations are travel in both directions, although not usually with same frequency. There are three approaches for the implementation. In

Write short notes on extension and intension, Write short notes on extensio...

Write short notes on extension and intension? In any data model, it is significant to distinguish among the description of the database and the database itself the description

Illustrate the view of software architecture, Illustrate the view of softwa...

Illustrate the view of software architecture A model is a semantically closed abstraction of system composes of elements. It can be visualized using any of the following five v

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd