Cluster analysis project, Database Management System

Assignment Help:

(a) Data Mining Process: In the context of this cluster analysis project, and in your own words, explain how you would execute the first stage of data mining, namely the "Pre-modelling" stage. Be sure to differentiate the sub-tasks in this stage

(b) Pre-modelling: Describe the potential business problem and data mining problem in the context of this project. Be sure to differentiate these two problems in your description.

(c) Data Preparation: Use the "seeds_dataset_twoClass.csv" file to prepare the dataset for cluster analysis. You can use the following table format to justify the data type (i.e., measurement) and direction (i.e., role) used for each attribute.

 

Attribute

Data Type

(or Measurement)

Direction (or Role)

(Input, Target or None)

Justification

(d) Data Exploration: Analyse the dataset "seeds_dataset_twoClass.csv" using the following summary statistics in the Data Audit node. Discuss the use of these summary statistics for deciding if further data preparation is required.

a. Mean and Standard Deviation (Std. Dev), Min and Max

b. % Complete and Valid Records

c. Outliers and Extremes

 (e) Data Preparation: From the scenario and data given, explain why the attribute A3 (compactness) is probably not useful for cluster analysis. Prepare the data (for mining) by filtering out this field using IBM SPSS Modeller.

(f) Executing Clustering Technique: Decide on the number of clusters (i.e., K) and then execute K-Means on the filtered dataset. Assess the appropriateness of applying K-Means on this dataset. Interpret the clustering results.

(g) Interpreting Clustering Results: Use the Graphboard node to generate a scatter plot based on attributes A4 and A5. The plot should show each data point labelled or coloured based on the cluster number assigned by K-Means. Evaluate the clustering results using this plot (and you may also use the project information given in the Background section of this assignment).

(h) Data Preparation: Having read your preliminary analysis, a colleague gave the following comment: "the dataset should have been normalised before the clustering process." Evaluate the clustering solutions with and without normalisation and then discuss whether normalisation is necessary in this case.


Related Discussions:- Cluster analysis project

Advantages and drawbacks of ddbms, Advantages And Drawbacks Of Ddbms  T...

Advantages And Drawbacks Of Ddbms  There are various reasons for building distributed database systems, with sharing of data, reliability and availability, and speedup of query

What are uncommitted modifications, What are uncommitted modifications? ...

What are uncommitted modifications? The immediate-modification technique permits database modifications to be output to the database whereas the transaction is still in the act

What is meant by software and hardware raid systems, What is meant by softw...

What is meant by software and hardware RAID systems? RAID can be executed with no change at the hardware level, using only software modification. Such RAID implementations are

Why containment is important in oosystems, Why containment is important in ...

Why containment is important in oosystems? Containment is an significant concept in oosystems because it allows dissimilar users to view data at different granularities.

Central database schema, Show a centralised schema to support the operation...

Show a centralised schema to support the operation of the system you have chosen, including in particular: the product catalogue; the outlets and respective stocks; customer record

Database management issues, You are required to write a report which evalua...

You are required to write a report which evaluates two of the following issues in relation to your case study database: a. Security issues b. Performance issues c. Backup

Dataware house, ) Define a job scheduling strategy that will meet business ...

) Define a job scheduling strategy that will meet business requirement of reporting availability by 6am CST for the following cubes? Show the job scheduling dependencies in a picto

What is the meaning of redundant associations, Adding Redundant Association...

Adding Redundant Associations for Efficient Access The expression redundant association means using "duplicate association for proficient access". While analysis, it is not a

What is a super key, What is a super key? A super key is a set of one o...

What is a super key? A super key is a set of one or more attributes that collectively permits us to recognize uniquely an entity in the entity set.

What is outer join, What is Outer join?  They can be employed while we ...

What is Outer join?  They can be employed while we want to keep all the tuples in R, or all those in S or all those in both relations in the result of the JOIN in spite of of w

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd