Cluster analysis project, Database Management System

Assignment Help:

(a) Data Mining Process: In the context of this cluster analysis project, and in your own words, explain how you would execute the first stage of data mining, namely the "Pre-modelling" stage. Be sure to differentiate the sub-tasks in this stage

(b) Pre-modelling: Describe the potential business problem and data mining problem in the context of this project. Be sure to differentiate these two problems in your description.

(c) Data Preparation: Use the "seeds_dataset_twoClass.csv" file to prepare the dataset for cluster analysis. You can use the following table format to justify the data type (i.e., measurement) and direction (i.e., role) used for each attribute.

 

Attribute

Data Type

(or Measurement)

Direction (or Role)

(Input, Target or None)

Justification

(d) Data Exploration: Analyse the dataset "seeds_dataset_twoClass.csv" using the following summary statistics in the Data Audit node. Discuss the use of these summary statistics for deciding if further data preparation is required.

a. Mean and Standard Deviation (Std. Dev), Min and Max

b. % Complete and Valid Records

c. Outliers and Extremes

 (e) Data Preparation: From the scenario and data given, explain why the attribute A3 (compactness) is probably not useful for cluster analysis. Prepare the data (for mining) by filtering out this field using IBM SPSS Modeller.

(f) Executing Clustering Technique: Decide on the number of clusters (i.e., K) and then execute K-Means on the filtered dataset. Assess the appropriateness of applying K-Means on this dataset. Interpret the clustering results.

(g) Interpreting Clustering Results: Use the Graphboard node to generate a scatter plot based on attributes A4 and A5. The plot should show each data point labelled or coloured based on the cluster number assigned by K-Means. Evaluate the clustering results using this plot (and you may also use the project information given in the Background section of this assignment).

(h) Data Preparation: Having read your preliminary analysis, a colleague gave the following comment: "the dataset should have been normalised before the clustering process." Evaluate the clustering solutions with and without normalisation and then discuss whether normalisation is necessary in this case.


Related Discussions:- Cluster analysis project

Serialisability, Serialisability: Any schedule that makes the similar resul...

Serialisability: Any schedule that makes the similar results as a serial schedule is known as a serialisable schedule. But how can a schedule are determined to be serialisable or n

Explain the term - control as concurrent tasks, Explain the term - Control ...

Explain the term - Control as Concurrent Tasks As we know, any object can be implemented as a task in programming language or operating system. This is the most general approa

Write query to insert data in student table, Consider student (std_id, std_...

Consider student (std_id, std_name, date_of_birth, phone, dept_name). Put the data for a student with student id200, name arun, birth date 1 February, 1985, phone number (01110 328

Create a subroutine, 1. Create a subroutine that does the remote blast on t...

1. Create a subroutine that does the remote blast on the file protein.fa. 2. Pass that returned files from the blast to another subroutine that parses the output and sends it to

What are the advantages of object oriented databases, What are the advantag...

What are the advantages of object oriented databases in comparison with others? Why it is still not widely used? Object oriented designs are coherent, efficient and less prone

Explain the form of the create view command, We define a view in SQL throug...

We define a view in SQL through using the create view command. To describe a view, we must provide the view a name and must state the query in which computers the view. The form of

In a hierarchical model of data records are organized, In a Hierarchical mo...

In a Hierarchical model of data records are organized ? In the hierarchical model of data records are organized as Tree structure.

Explain about foreign key, What is Foreign Key Foreign Key: Sometimes...

What is Foreign Key Foreign Key: Sometimes we may have to work with an attribute that does not have a primary key of its own. To recognize its rows, we have to use the primar

What is a database, What is a Database? To know what database is, we ha...

What is a Database? To know what database is, we have to start from data, which is the basic building block of any DBMS. Data: Facts, figures, statistics etc. having no pa

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd