Draw scatterplot with country group of each point indicated

Assignment Help Basic Statistics
Reference no: EM13880746

Assignment: Multivariate Data Analysis

Part A

Refer to the data in FoodConsumptionNutrients en.xls. It has information for about 175 countries. Choose 30 or so countries that interest you to work on. Be sure that you use countries from at least three different country groups from different regions (see the sheet CountryGroupComposition to get some ideas for groupings that you might use). Collect the information on energy consumption, fat consumption and protein consumption for your chosen countries onto a single sheet. Create a variable for the country group.

1. Choose two of the three original variables. Draw a scatterplot with the country group of each point indicated. Comment.

2. Generate classification rules using

• Linear discriminant analysis

• Quadratic discriminant analysis

• Multinomial logistic regression

• Classification trees

3. Using the confusion matrix and the apparent error rate, compare the effectiveness of each of the classifi- cation rules.

4. Assume that you did not know which countries were in which groups. Use the following methods to group the observations.

• One hierarchical implementation of cluster analysis

• K-means cluster analysis

• Multidimensional scaling

Do any of these correctly divide all the observations into the original groups?

Part B appears overleaf.

Find two datasets using online sources that you can use to demonstrate the techniques that you have learned in this subject. Some good places to find interesting data are:

• https://blog.visual.ly/data-sources/
• https://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/
• https://www.tableausoftware.com/public/community/sample-data-sets
• https://www.kaggle.com/
• https://lib.stat.cmu.edu/DASL/
• https://www.models.kvl.dk/datasets
• https://research.library.gsu.edu/c.php?g=115854&p=754836
• https://www.stat.ufl.edu/ winner/datasets.html
• https://www.statsci.org/data/

You must get approval from me for your datasets before you begin. I may not approve two students using the same dataset.

Some datasets are quite extensive and you may feel that you can illustrate a range of techniques with different subsets of the same dataset. If you think this applies to your chosen dataset talk to me about this when you are getting approval for your dataset.

If you are having trouble thinking about what you need to be able to do, think back over the broad areas that we have covered in class - inferences about mean vectors, MANOVA (one- and two-way), multivariate linear regression, PCA and factor analysis, canonical correlation, discrimination and classification including clustering. You don't need to show that you can do all of these but I would hope (read expect) to see at least 5 of these broad areas represented in your answer.

For each of your chosen datasets, you need to pose one or more questions that you believe you can (try to) address using the dataset. You then need to use appropriate techniques to analyse the data to address the research question(s) that you have posed. Finally, you will need to reflect on the adequacy of the dataset to address the questions that you have posed, and make suggestions about how you might collect the data differently to better address your question (consider what to collect or how to collect, for instance).

Your answer to this question should include (separately for each of the two datasets, if appropriate):

• A report that describes the data, poses the research question(s), analyses the research question(s) and reflects on the usefulness of the data to answer the question(s). This should be in a report format, with essential output in the report and any other output that you use in an appendix. You should also indicate where you obtained the data from (e.g. reference to a paper or URL).

• A .R file containing your code.

• A .csv file containing the data set (if it is not already in your .R file)

Attachment:- FoodConsumptionNutrients en.xlsx

Reference no: EM13880746

Questions Cloud

Analyse the current financial state of anthonys orchard : Analyse the current financial state of Anthonys Orchard and evaluate the impact of a major customer cancelling their expected order - Evaluate the impact of this on the budgeted statements contained in the case study.
Right to possess property for an agreed period of time : The right to possess property for an agreed period of time. the present right to own or possess land at some date that has not yet arrived. a temporary, revocable right to be on someone else's property. an irrevocable right to use some portion of ano..
Bond issuer the option to redeem the bond : An agreement giving the bond issuer the option to redeem the bond at a specified price prior to maturity is the                 provision.
How much would you still owe at the end of the first year : Suppose you borrowed $10,000 at a rate of 8.0% and must repay it in 5 equal instalments at the end of each of the next 5 years. How much would you still owe at the end of the first year, after you have made the first payment?
Draw scatterplot with country group of each point indicated : Choose two of the three original variables. Draw a scatterplot with the country group of each point indicated. Comment. Generate classification rules using Linear discriminant analysis.
Determine the primary manner in which orion has increased : Determine the primary manner in which ORION has increased your business knowledge in the related subject area
What explanation might there be for such a strategy : Before the merger, each of the separate newspapers was losing about $10 million per year. What forecast would you make for the merged firms' profits? Explain.
Annual coupon payments : A bond with a face value of $1,000 has annual coupon payments of $100 and was issued 7 years ago. The bond currently sells for $1,000 and has 8 years remaining to maturity. This bond’s must be 10%.
Temperature of the air and the surroundings : The pipe emissivity is 0.8, and the temperature of the air and the surroundings is 20°C. What is the heat loss per unit length of pipe?

Reviews

Write a Review

Basic Statistics Questions & Answers

  Probability calculation by using normal distribution

Suppose that the number of calories in McDonald's Egg McMuffin is normally distributed random variable with mean of 290 calories and standard deviation of 14 calories.

  Sample variance and sample standard deviation

Compute the sample variance and sample standard deviation as a measure of volatility of monthly total return for Chevron (to 2 decimals): Sample variance and Sample standard deviation.

  Heights of trees in a forest are normally distributed with

heights of trees in a forest are normally distributed with mean 50 feet and standard deviation 12 feet. in a random

  Using of anova instead of t-tests

Describe the circumstances under which you should use ANOVA instead of t tests, and explain why t tests are inappropriate in these circumstances. Find a peer-reviewed article that reflects these circumstances, describe the research conducted (i.e...

  Probability that the individual actually has the disease

what is the probability that the individual actually has the disease ?

  A mail order company has an 8 success rate if it mails

a mail order company has an 8 success rate. if it mails advertisements to 600 people find the probability of getting

  Probability that all selected vials have hairline cracks

In a shipment of 79 vials, only 17 do not have hairline cracks. If you randomly select 3 vials from the shipment, what is the probability that all 3 of the selected vials have hairline cracks?

  Evaluate-five step testing hypothesis

The following sample data shows the quarterly earnings in millions for two US companies. Your research interest is finding if the average earnings for these companies are different.

  Are the smith third graders better at math than third

the average raw math achievement score for third graders at a smith elementary school is 137 third graders statewide

  Determining sample size-nlrb

Suppose that the National Labor Relations Board (NLRB) wishes to estimate the average hourly wage rate for a certain classification of skilled workers. It wishes to be 95% certain to get an estimate of sample mean within 0.05 of the truth.

  Mode of the given test scores

The mean of a set of 7 numbers is 13. What is the sum of the numbers? Find the mode of the following test scores: 89, 78, 91, 82, 75, 89, 84, 95, 89, 93

  Testing equality of means of two populations

Find a dataset: approximately 30-100 scores or observations. Write a null and alternative hypothesis, conduct a statistical significance test (in SPSS or Excel), and tell what decision you'd make about the null hypothesis, and why.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd