Cluster analysis project, Database Management System

Assignment Help:

(a) Data Mining Process: In the context of this cluster analysis project, and in your own words, explain how you would execute the first stage of data mining, namely the "Pre-modelling" stage. Be sure to differentiate the sub-tasks in this stage

(b) Pre-modelling: Describe the potential business problem and data mining problem in the context of this project. Be sure to differentiate these two problems in your description.

(c) Data Preparation: Use the "seeds_dataset_twoClass.csv" file to prepare the dataset for cluster analysis. You can use the following table format to justify the data type (i.e., measurement) and direction (i.e., role) used for each attribute.

 

Attribute

Data Type

(or Measurement)

Direction (or Role)

(Input, Target or None)

Justification

(d) Data Exploration: Analyse the dataset "seeds_dataset_twoClass.csv" using the following summary statistics in the Data Audit node. Discuss the use of these summary statistics for deciding if further data preparation is required.

a. Mean and Standard Deviation (Std. Dev), Min and Max

b. % Complete and Valid Records

c. Outliers and Extremes

 (e) Data Preparation: From the scenario and data given, explain why the attribute A3 (compactness) is probably not useful for cluster analysis. Prepare the data (for mining) by filtering out this field using IBM SPSS Modeller.

(f) Executing Clustering Technique: Decide on the number of clusters (i.e., K) and then execute K-Means on the filtered dataset. Assess the appropriateness of applying K-Means on this dataset. Interpret the clustering results.

(g) Interpreting Clustering Results: Use the Graphboard node to generate a scatter plot based on attributes A4 and A5. The plot should show each data point labelled or coloured based on the cluster number assigned by K-Means. Evaluate the clustering results using this plot (and you may also use the project information given in the Background section of this assignment).

(h) Data Preparation: Having read your preliminary analysis, a colleague gave the following comment: "the dataset should have been normalised before the clustering process." Evaluate the clustering solutions with and without normalisation and then discuss whether normalisation is necessary in this case.


Related Discussions:- Cluster analysis project

State the component diagram, State the Component Diagram A component d...

State the Component Diagram A component diagram represents a set of component and relationships among them. In a dynamic model, component diagram is used to model physical com

Explain the togaf framework for enterprise architectures, Question: (a)...

Question: (a) With the help of a diagram, explain the TOGAF framework for enterprise architectures. (b) Where would you position enterprise architectures in an enterpri

Which operation is used if we are interested certain columns, Which operati...

Which operation is used if we are interested in only certain columns of a table? PROJECTION operation is used if we are interested in only certain columns of a table.

Explain the dependency relationship of object oriented, Explain the depende...

Explain the dependency relationship of object oriented A dependency is a relationship which states that a change in specification of one thing can affect another thing, but n

What are the categories of sql command, What are the categories of SQL comm...

What are the categories of SQL command? SQL commands are separated in to the following categories: 1. Data - Definitition Language 2. Data Manipulation language 3. Dat

The internal or physical level in dbms, The Internal or Physical Level ...

The Internal or Physical Level The collection of files permanently stored on secondary storage devices is called as the physical database. The internal or physical level is t

Project, I have a small project that contains 3 deliverable s. I have done ...

I have a small project that contains 3 deliverable s. I have done the first one and I would like you to help me with second one

Idea for fyp, want a idea for final year project

want a idea for final year project

Design a query plan for distributed query, (a) Design a query plan for the ...

(a) Design a query plan for the following distributed query: An application at site B wants to compute a join of STUDENT ? Id=StudId TRANSCRIPT where STUDENT (Id,Major) is at site

Describe integrity constraints, Describe integrity constraints? Integr...

Describe integrity constraints? Integrity Constraints - A database is just as excellent as the information stored in it, and a DBMS must therefore help avoid the entry of inc

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd