Cluster analysis project, Database Management System

Assignment Help:

(a) Data Mining Process: In the context of this cluster analysis project, and in your own words, explain how you would execute the first stage of data mining, namely the "Pre-modelling" stage. Be sure to differentiate the sub-tasks in this stage

(b) Pre-modelling: Describe the potential business problem and data mining problem in the context of this project. Be sure to differentiate these two problems in your description.

(c) Data Preparation: Use the "seeds_dataset_twoClass.csv" file to prepare the dataset for cluster analysis. You can use the following table format to justify the data type (i.e., measurement) and direction (i.e., role) used for each attribute.

 

Attribute

Data Type

(or Measurement)

Direction (or Role)

(Input, Target or None)

Justification

(d) Data Exploration: Analyse the dataset "seeds_dataset_twoClass.csv" using the following summary statistics in the Data Audit node. Discuss the use of these summary statistics for deciding if further data preparation is required.

a. Mean and Standard Deviation (Std. Dev), Min and Max

b. % Complete and Valid Records

c. Outliers and Extremes

 (e) Data Preparation: From the scenario and data given, explain why the attribute A3 (compactness) is probably not useful for cluster analysis. Prepare the data (for mining) by filtering out this field using IBM SPSS Modeller.

(f) Executing Clustering Technique: Decide on the number of clusters (i.e., K) and then execute K-Means on the filtered dataset. Assess the appropriateness of applying K-Means on this dataset. Interpret the clustering results.

(g) Interpreting Clustering Results: Use the Graphboard node to generate a scatter plot based on attributes A4 and A5. The plot should show each data point labelled or coloured based on the cluster number assigned by K-Means. Evaluate the clustering results using this plot (and you may also use the project information given in the Background section of this assignment).

(h) Data Preparation: Having read your preliminary analysis, a colleague gave the following comment: "the dataset should have been normalised before the clustering process." Evaluate the clustering solutions with and without normalisation and then discuss whether normalisation is necessary in this case.


Related Discussions:- Cluster analysis project

North indian plains, i am not clear with the climetic condition of the nort...

i am not clear with the climetic condition of the north indian plain. can u plz explain it for me

Assignment topic, assume we have the following apllicationthat model sccer ...

assume we have the following apllicationthat model sccer team the games they play and the players inv each teamin the design we want to capture

Relational algebra and calculus, Q.  What is relational algebra and calcul...

Q.  What is relational algebra and calculus? Explain the relational algebra operators :- Selection and predicate, set operation, joins and division. Sol. Relational Algebra

Databases and data warehouses , Databases and data warehouses both are data...

Databases and data warehouses both are databases and both contains number of rows and columns of tables containing data. Data warehouse is really different from databases because d

Develop c# schoolproject asap, Develop C# schoolproject ASAP Helping me ...

Develop C# schoolproject ASAP Helping me with schoolproject. The project is prepared in C# and haves a database connection. We have some small issues asking up data from t

Describe five main functions of a database administrator, Describe five mai...

Describe five main functions of a database administrator. Ans: A database administrator (DBA) is a person who is accountable for the environmental aspects of a database. Genera

Prepare an erd for the scenario, Question: (a) Prepare an ERD for the s...

Question: (a) Prepare an ERD for the scenario given below. (b) Convert the ER diagram produced in (a) above into its physical design. You have to choose an appropriate Pri

Data files indices and data dictionary, The indices are accumulated in the ...

The indices are accumulated in the index files. The data is accumulated in the data files. Indices give fast access to data items. For example, a book database may be managed in th

Differentiate between various levels of data abstraction, Differentiate bet...

Differentiate between various levels of data abstraction? Data Abstraction - Abstraction is the procedure to hide the irrelevant things from the users and represent the rele

Write modified version of transfer that avoids deadlock, Your OS has a set ...

Your OS has a set of queues, each of which is protected by a lock. To enqueue or dequeue an item, a thread must hold the lock associated to the queue. You need to implement an a

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd