Discussing about data preprocessing steps

Assignment Help Database Management System
Reference no: EM132120705

The groceries Dataset

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.

Task 1: Data Pre-processing

Read the data in R. There are many ways to read in csv tables in R. For more details, please refer to data import/export in R

For the clustering experiments, the column for class labels need to be removed. Refer to lecture Module 10 to see how to do so.

Verify if any other pre-processing is beneficial for the analysis. For example, replacing missing values, attribute range normalization, converting numerical or string to nominal values etc.

Task 2: Data Mining

- Association Rule Mining experiments: Using R to explorer "association rules" on the groceries dataset.Try out different algorithms. Visualize the result you found. Report any interesting association rules discovered in the experiments and explain why they are interesting.
- Classification experiments: Using to construct classifiers on the mushroom or Ionosphere dataset. Randomly split the data set in the training and test data set (80% v.s. 20%). Select at least one classifier from each of the following two categories of classifiers: Tree-based models, Bayes classifiers, and Rule-based classifiers. Compare the result of the chosen classifers.
- Clustering experiments: Using R explorer clusters on the mushroom or Ionospheredataset.Select and compare two clustering algorithms from R(e.g. k-means v.s. density-based). Use R to visually explore the resulting clusters.
- For all the above experimentations, try different parameter settings to fine tune the outcome. In principle select methods that work well on the given data set.
Task 3: Prepare a report
Your report should contain the following:
- Theoretical Discussion: Limited to two pages discussing about data preprocessing steps, the motivation for selecting a particular method, and how the parameters are chosen.
- Results: Include results and screenshots of the above experimentations.
- Discussion and error analysis: Try to interpret the results of your model. Discuss intuitions or hypothesis that can be obtained by visual inspections of the resulting classes or clusters. Mention about assumptions if any, discuss issues that might have affected the model's performance.
- References: If you are using information from other sources apart from R manual and official website, you should cite them.

Attachment:- Assignment.rar

Verified Expert

The solution file is prepared in ms word and done it in R programming which applies association rule mining, clustering methods and classification algorithm on mushroom data set. The classifier algorithms to consider to analysis the mush room data set is tree-based models and rule-based classifiers. This report has data pre-processing, apply classifier algorithms, cluster algorithms and discussed about error analysis.

Reference no: EM132120705

Questions Cloud

What is the probability that the amount of juice squeezed : a. What is the probability that the amount of juice squeezed from a randomly selected orange is between 4.70 and 5.00 ounces?
An oligopoly situation exists when : Which of the following is not a type of qualitative research? An oligopoly situation exists when
Find the probability that the sample standard deviation : For a random sample of 16 pineapples, find the probability that the sample standard deviation s will exceed 129 grams.
What is the probability that the sum of weights : When he randomly chooses four potatoes and measures the weight, what is the probability that the sum of their weights is more than 608 grams?
Discussing about data preprocessing steps : NIT6160 - Data Warehousing - Design a data warehouse for the above mention scenario. Implement your data warehouse with SQL Developer
Is any single type of theory more valid than the others : What are the key differences between the 4 identified types of research theories above? Is any single type of theory more valid than the others?
Find the probabilities that she spends : Find the probabilities that she spends, 0, 3, 6, 9, 12, and 15 dollars on coffee over the course of a five day week.
Suppose the lifetime of a particular appliance : suppose the lifetime of a particular appliance follows an exponential distribution with a mean of 10 years. what is the probability that the appliance will fail
Confidence interval for the population mean : You then would choose to calculate a 95% (or another level) confidence interval for the population mean.

Reviews

inf2120705

11/20/2018 3:08:15 AM

Good work.. Really appreciate the this service. I used ExpertsMind so many times and from the beginning to end its a really good communication and service. When talking about the assignment its wonderful and hope I will get really good mark on it. Thanks.

len2120705

9/24/2018 12:33:43 AM

This project is worth 20% of the total assessment of this unit, and is due on week 12. The goal of this project is to applying association rule mining, classification and clustering methods on theMushroom or Ionosphere and groceriesdata sets. For detailed information about the mush room or Ionosphere data set, refer to the Submission Instructions This section is intended for submission instructions in learning systems. Grading Report Section Max. points Theoretical discussion and data-preprocessing 5% Results 10% Error analysis & references 5% Total 20%

Write a Review

Database Management System Questions & Answers

  Knowledge and data warehousing

Design a dimensional model for analysing Purchases for Adventure Works Cycles and implement it as cubes using SQL Server Analysis Services. The AdventureWorks OLTP sample database is the data source for you BI analysis.

  Design a database schema

Design a Database schema

  Entity-relationship diagram

Create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices and concepts.

  Implement a database of courses and students for a school

Implement a database of courses and students for a school.

  Prepare the e-r diagram for the movie database

Energy in the home, personal energy use and home energy efficiency and Efficient use of ‘waste' heat and renewable heat sources

  Design relation schemas for the entire database

Design relation schemas for the entire database.

  Prepare the relational schema for database

Prepare the relational schema for database

  Data modeling and normalization

Data Modeling and Normalization

  Use cases perform a requirements analysis for the case study

Use Cases Perform a requirements analysis for the Case Study

  Knowledge and data warehousing

Knowledge and Data Warehousing

  Stack and queue data structure

Identify and explain the differences between a stack and a queue data structure

  Practice on topic of normalization

Practice on topic of Normalization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd