Calculate the correlations between er and pgr

Assignment Help Data Structure & Algorithms
Reference no: EM131032614

Part 1 - Description, Visualisation and Pre-processing [R Only]

a) Explore the data

i. Use as many functions/techniques in R as necessary to adequately describe and visualise the data. Provide a table for all the attributes of the dataset including the measures of centrality (mean, median etc.), dispersion and how many missing values each attribute has. Use the table to make comments about the data.

ii. Produce histograms for each attribute. Provide details how you created the histograms and comment on the distribution of data. Use also the descriptive statistics you produced above to help you characterise the shape of the distribution.

b) Explore the relationships between the attributes, and between the class and the attributes

i. Calculate the correlations between er and pgr, b1 and b2, and p1 and p2 (three correlations). What do these tell you about the relationships between these variables?

ii. Produce scatterplots between the class variable and er, pgr and h1 variables (note: you may have to recode the class variable as numeric to produce scatterplots). What do these tell you about the relationships between these three variables and the class?

c) General Conclusions

Take into considerations all the descriptive statistics, the visualisations, the correlations you produced together with the missing values and comment on the importance of the attributes. Which of the attributes seem to hold significant information and which you can regard as insignificant? Provide an explanation for your choice.

d) Dealing with missing values in R

i. Write an script in R to find missing values and replace them using three strategies. Replace missing values with 0, mean and median

ii. Compare and contrast these approaches

f) Attribute transformation  

Explore the use of three transformation techniques (mean centering, normalisation and standardisation) to scale the attributes, and compare their various effects.

g) Attribute / instance selection

i. Starting again from the raw data, consider attribute and instance deletion strategies to deal with missing values. Choose a number of missing values per instance or per attribute and delete instances/attributes accordingly. Explain your choice.

ii. Consider using correlations between attributes to reduce the number of attributes. Try to reduce the dataset to contain only uncorrelated attributes.

iii. Use principal component analysis in R to create a data set with ten attributes.

As a result, you will end up with several different sets of data to be used in Part 3 & 4. Give each set of data a clear and distinct name, so that you can easily refer to again in the later stages.

Part 2 - Clustering [R Only]

Using R (only), explore the use of clustering to find natural groupings in the data, without using the class variable - i.e. use only the 20 numeric (input) attributes to perform the clustering. Once the data is clustered, you may use the class variable to evaluate or interpret the results (how do the new clusters compare to the original classes?).

a) Use hierarchical, k-means, PAM as clustering algorithms to create classifications of seven clusters and write the results. Which algorithm produces better results when compared to the class attribute? [10]

b) As each of these algorithms has adjustable parameters, you may explore the 'optimisation' or 'tuning' of these parameters, either manually or (preferably) automatically. Which parameters produce the best results for each clustering algorithm? Provide the reasoning of the techniques you used to find the optimal parameters.

c) Choose one clustering algorithm of the above and perform this clustering on alternative data sets that you have produced as a result of Part 2.

i. The reduced data set featuring only the first 10 Principal Components.

ii. The dataset after deletion of instances and attributes.

iii. The three datasets after you replaced missing values with the three techniques.

iv. Which of these datasets had a positive impact on the quality of the clustering? Provide explanations using the results for each clustering of the alternative data set.

Part 3 - Classification [Weka and R]

You must use Weka to perform the classification, but you may choose to use R to present results. Use Weka to explore the use of various classification techniques to create models that predict the given class from the input attributes. Split the data (randomly) into a training set (2/3 of the data) and a test set (containing 1/3 of the data);

a) Try using the following classification algorithms: ZeroR, OneR, NaïveBayes, IBk (kBNN) and J48 (C4.5) algorithms. Which algorithm produces the best results?

b) Choose one classification algorithm of the above and explore various parameter settings for each of the different splits of data. Which parameters improve the predictive ability of the algorithm?

c) Choose one classification algorithm of the above and use the data sets you created in part 2:

i. The reduced data set featuring only the first 10 Principal Components.

ii. The dataset after deletion of instances and attributes.

iii. The three datasets after you replaced missing values with the three techniques.

iv. Which of the datasets had a good impact on the predictive ability of the algorithm? Provide explanations using the results for each clustering of the alternative data set.

Attachment:- Assignment.rar

Reference no: EM131032614

Questions Cloud

Explain oligopoly and perfect competition market structures : Explain oligopoly and perfect competition market structures, and identify the key factors that distinguish them
What conclusions were made based on the research : What was the research question the author was investigating in the study? How did the author(s) conduct the research? What methods were used? What conclusions were made based on the research?
Write an essay about darkie tooth paste : Write an essay about Darkie Tooth Paste. In Ralph Ellison's Invisible Man, the unnamed narrator/protagonist encounters a number of artifacts or relics such as the slave shackles.
How would you ensure you are accountable for information : Examine the widely held belief that memory confidence is strongly correlated with memory accuracy. Explain to your client the relationship between memory confidence and memory accuracy. Your response should be professional, accurate and should be ..
Calculate the correlations between er and pgr : Calculate the correlations between er and pgr, b1 and b2, and p1 and p2 (three correlations). What do these tell you about the relationships between these variables
Was the outcome accurate : Describe a time when you have participated in a psychological assessment (Myers-Briggs Type Indicator, employment screen, personality test, educational test, etc.).
How rich are the rich : Wealth statistics on the very rich are compiled every year by Forbes magazine.
Write two paragraph for my debate its about net neutrality : Write two paragraph for my debate it's about Net Neutrality. My position is net neurtality dose more harm than good.
Reflect understanding of nutritional support for athlete : The main purpose of this RP is to critically reflect your understanding of the "Nutritional Support for Athletes" This RP must have 6 pages, including cover page and references.

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Goal-seeking analysis and simulation

Perform a what-if analysis to determine the maximum total profit that could be achieved if only rye (no wheat) is planted, given the cost and time constraints.

  Which includes and algorithm that takes an array

Write an application which includes and algorithm that takes an array, selects the high and low integer from the array of integers with each pass and builds a new array of integers by inserting the high and low selection with each pass. Your ..

  Highlighting features that boost performances

highlighting features that boost performances

  Write down a cumulative 10- to 12-page paper incorporating

write a cumulative 10- to 12-page paper incorporating your prior work. solve the design problems of your virtual

  What is difference between a state graph and a search tree

Describe how the problem of traveling from one city to another could be framed as a production system. What are the states? What are the productions?

  Create a program that implements each mergesort an quicksort

Create a program that implements each mergesort and quicksort. For each the program should generate an array of 500 numbers in the range of 1-100.

  Prepare a flowchart chart to print the largest number

Write a flow chart to print the largest of any three numbers - Prepare a flowchart chart to print the largest number.

  Write a suitable logical description of the robot.

Write a sentence describing the Go action. Use a successor-state axiom.

  Give time algorithm that outputs satisfying assignment

Find out  whether there is an assignment of true/false values to the literals such that at least a*m clauses will be true. Note that 3-SAT(1) is exactly the 3-SAT problem. Give an O(m*n)-time algorithm that outputs a satisfying assignment for 3-S..

  Designing a string-checking algorithm

Provide an example of an input string that is in the proper format and an example that is not in the proper format. Describe how your algorithm determines that the first string is in the proper format and that the second string is not in proper fo..

  Create a shell script the count the number of files

Create a shell script that will calculate the number of files in your account hat were last modified five or more days ago and when you run the shell script,

  Design a class that keeps track of a student food purchases

Design a class that keeps track of a student's food purchases at the campus cafeteria. A meal card is assigned to an individual student. When a meal card is first issued, the balance is set to the number of points. If the student does not specify ..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd