Estimate correlation between the illiteracy and birth rates

Assignment Help Advanced Statistics
Reference no: EM131421443

Note: All dataset files have a heading, so be sure to indicate so in R/RStudio when loading each.

1. Suppose we have postulated the model

Y = βsin(X).

To estimate β, a random sample (x1, y1), (xn, yn) is obtained. Then, the equation y^i = β^sin(xi) denotes the fitted value of Y, when X = xi.

Derive a formula for the least-squares estimate of β, i.e., β^.

2. A campaign manager conducts a survey to gauge voter support for his candidate Lopez. She gathers data on the age of a registered voter (x) and whether this person supports Lopez (Y = 1) or somebody else (Y = 0).

An analysis yields the following logistic equation:

In. P^(x)/(1-P^(x)) = -0.324 + 0.012x.

where p(x) is the probability of a vote for Lopez.

(a) Find the estimated probability that a 21-year-old voter will vote for Lopez.

(b) Compare the odds of support for Lopez between two people that are 10 years apart in age.

(c) Interpret the coefficient on x (0.012) in the logistic equation.

(d) At what age is the estimated probability of a vote for Lopez equal to 0.5?

3. In your Own words, describe the difference in the classification methods given by logistic regression and linear discriminant analysis.

4. Is there a relationship between female illiteracy and birth rate? In particular, can the birth rate in a given country be effectively predicted using the illiteracy rate? The file ILLiteracy contains data On a sample of countries where female illiteracy is more than 5%.. The variable Illit is the percentage of women over 15 years of age that are illiterate, and the variable Births is the number of births per woman in that country.

(a) Estimate the correlation between the illiteracy and birth rates. Interpret the value. What does it say about a possible relationship between the variables.

(b) Create a scatter plot of birth rate against illiteracy rate and comment on the relationship.

(c) Give the estimated regression equation and interpret the slope coefficient and R2 statistic.

(d) Create a residuals plot and comment on the appropriateness of a linear model.

(e) Can we say that improving literacy (i.e., reducing illiteracy) will result in a lower birth rate? Justify your answer with an appropriate hypothesis test.

5. The data set Wal lee from the Minnesota Pollution Control Agency contains data on length (inches) and weight (pounds) measurements for a sample of 60 walleye caught in Minnesota lakes.

(a) Fit a linear model to the data that predicts weight based on length. Provide visual evidence that this model is not appropriate for the relationship between length and weight of fish.

(b) Applying an appropriate transformation to the data, fit a power model to the data, i.e., W = aLb. Give the estimated values of a and b. as well as a 95% confidence interval for b.
(c) Applying an appropriate transformation to the data, fit an exponential model to the data. i.e., W = aeu . Give the estimated values of a and b, as well as a 95% confidence interval for b.

(d) Which model, power or exponential, provides the best fit to the data? Justify your answer.

(e) Using the model selected in (d), give a 95% bootstrap percentile interval for the b. Compare this interval to the confidence interval found in either (b) or (c), depending on which model you selected.

6. The data set Carseat s contains sales information for child car seats at 400 different stores.

(a) Fit a multiple linear regression model to predict Sales using the following predictors:

Income - community income, level (in thousands of dollars)
c Advertising - local advertising budget for company at each location (in thousands of dollars}
o Price - price company charges for car seats at each site
o ShelveLoc - a factor with levels Bad, Good, and Medium indicating the quality of the shelving location for the car seats at each site
o Age - average age of the local population
o Urban - a factor with levels NO and Yes to indicate whether the store is in an urban or rural location
c US - a factor with levels No and Yes to indicate whether the store is in the U.S. or not

Also include an interaction between Income and Advert is i ng, and between Price and Age. Is the model significant overall in predicting sales?
(b) Provide an interpretation of each coefficient in the model you fit in (a).
(c) For which of the predictors in (a) can you reject the null hypothesis H0 : = o? Justify your
answer and explain what it means to reject H1.
(d) Comment on the results of (c). Do they make intuitive sense?
(e) On the basis of your response to (c), fit a smaller model that only uses the predictors for which there is evidence of association with the response.
(f) How well do the models in (a) and (e) fit the data?

7. The data set Boston contains the following information about 506 neighborhoods around Boston.
o c r im - per capita crime rate by neighborhood
o z n - proportion of residential land zoned for lots over 25. 000 sq.ft.
o Indus - proportion of non-retail business acres per neighborhood
o char - Charles River dummy variable (1 if neighborhood touches river; 0 otherwise)
o nox - nitrogen oxides concentration (parts per 10 million)
o rm - average number of rooms per dwelling
o age - proportion of owner-occupied units built prior to 1940
o di s - weighted mean of distances to five Boston employment centers
o rad - index of accessibility to radial highways
o t ax - full-value property-tax rate per $10. 000
o pt rat - pupil-teacher ratio by neighborhood
o black - 1000(Bk - 0.63)2, where Bk is the proportion of black residents by neighborhood
o 1 s t at - percentage of households with low socioeconomic status
o medv - median value of owner-occupied homes (in thousands of dollars)

In this question, you will develop models to predict whether a given neighborhood has a crime rate above or below the median.

(a) Create a binary variable crim01 that codes whether or not a neighborhood is above (1) or below (0) the median crime rate given by the data set.

(b) Fit a logistic regression model that predicts whether a neighborhood has a crime rate above or below the median using all other variables in Boston as predictors.

(c) In the model fit in (b), note that P. codes the individual significance of nox at the highest level ( **1 ). Given an explanation as to why this variable would be significant in predicting the probability that a neighborhood would have crime rate above the median.

(d) One by one, remove predictors from the model fit in (b) until only predictors with the highest level of individual significance ( 1***/ ) remain.

(e.) Compare the deviance of the full model fit in (b) with the final model fit in previous question. Com¬ment on the difference between the deviance for these models and the differences when compared to the null model.

(f) Split the data into a training set and a test set. Use set.seed ( 352017) .

(g) Using as predictors the variables identified as having the highest level of individual significance in (d), fit a logistic regression model to the training set and estimate the test error of the model using the test set.

(h) Using the same predictors used in (g), perform LDA on the training data to predict crim01 and estimate the test error of the model using the test set.

(i) Using the same predictors used in (0, perform QDA on the training data to predict crim01 and estimate the test error of the model using the test set.

(I) Which of these methods, logistic regression, LDA, orQDA, provides the best model for classifying neighborhoods with a high crime rate. Compare the performance of that model to that of the null model.

(k) Summarize your findings regarding predicting whether a given neighborhood has a crime rate above or below the median. What advice, based on these findings, would you give to a family moving to Boston on selecting a neighborhood to live in? Use appropriate visualizations to support your summary and advice.

Attachment:- datasets.zip

Reference no: EM131421443

Questions Cloud

Describe the effect that obesity has had on you personally : Obesity in America is considered an epidemic. There are many contributing factors to obesity (both childhood and adult), such as biological, environmental, social, or economic factors. Review the information on obesity on pages 419 to 421 in the t..
How much in dividends did duke receive in 2008 : Assume that Duke owns approximately 40 percent of the outstanding common stock of the affiliates and made no additional equity investment on sales during 2008. How much net loss did the affiliates report for 2008?
Analyze why equilibrium of supply and demand is desirable : Prepare a 1,050-word paper addressing the Explain why equilibrium of supply and demand is desirable. Explain the following concepts using the concept of consumer and producer surplus.
Find the magnitude of the combined magnetic field : If the currents are in opposite directions, find the magnitude of the combined magnetic field at the midpoint between the two wires. Calculate the answer in micro-tesla (µT) and rounded to three significant figures.
Estimate correlation between the illiteracy and birth rates : Find the estimated probability that a 21-year-old voter will vote for Lopez and compare the odds of support for Lopez between two people that are 10 years apart in age - what age is the estimated probability of a vote for Lopez equal to 0.5?
Electric field strength inside the capacitor : 1. A parallel-plate capacitor is formed from two 2.9 cm -diameter electrodes spaced 3.0 mm apart. The electric field strength inside the capacitor is 1.0×106N/C.
Compute the 2011 and 2012 income effects : Assume that the investment was originally classified as trading securities and then changed to available-for-sale on December 31, 2012. Provide the journal entries recorded at October 18, 2011; December 31, 2011; and December 31, 2012.
Do you rent the space to the theatre company : A parish chapel is closed for economic reasons, but, technically, is still owned by the local church. To raise funds, the church has decided to "rent" out the chapel to local organizations for meetings, events or activities, since it is no longer..
Find the current in the circuit one time constant : Find the charge on the capacitor in the circuit one time constant (τ=RC) after the circuit is connected to a 9.0-V battery. Find the current in the circuit one time constant (τ=RC) after the circuit is connected to a 9.0-V battery.

Reviews

len1421443

3/9/2017 7:52:14 AM

Subject: R Programming Comment: Please refer to attached pdf file and dataset zip folder. All dataset files have a heading, so be sure to indicate so in R/RStudio when loading each.Also include an interaction between Income and Advert is i ng, and between Price and Age. Is the model significant overall in predicting sales?

Write a Review

Advanced Statistics Questions & Answers

  What is the probability that there is no storm in january

What is the probability that there is no storm in january and what is the probability that there is no damage-inducing storm in january

  Purpose of a survey

What is the purpose of a survey? What are 3 examples of a survey, and for what potential purpose could each be used? What guidelines would you follow to ensure the participants of a survey will have their responses kept private?

  Show that the pair of variables is statistically independent

Find Pr{Xn+1 = i, Dn+1 = j | Dn} and show that the pair of variables (Xn+1, Dn+1) is statistically independent of Dn. What do your results mean relative to Burke's theorem.

  Question regarding business mathematics

Have you seen any graphs in a meeting at work or in a business situation that you thought were useful? Do you think graphs are useful for describing functions, or do they have the potential to cause even more confusion?

  Is the wlln sufficient to argue or is the slln necessary

Let J be the number of plays until the gambler loses all his money. Is the WLLN sufficient to argue that limn→∞ Pr{J > n} = 0 (i.e., that J is a rv) or is the SLLN necessary?

  What is the skewness of the distribution of prices

What is the highest price charged among all the regular brands for a 6-pack? State the dollar amount and what is the highest price charged among all thereduced calorie brands for a 6 pack? State the dollar amount.

  Unscientific sampling

A Milwaukee television station, WITI-TV, conducted a telephone call-in survey asking whether viewers liked the new newspaper, the Journal Sentinel.

  Analyzing production costs

A small publishing company is planning to publish a new book. The production costs will include onetime fixed costs (such as editing) and variable costs [such as printing).

  Calculate the cdf of the given rv s

Find the CDF of the following rv s. The maximum of n IID rv s, each with CDF FX (x) and The minimum of n IID rv s, each with CDF FX (x).

  Application of statistical process controls

For Linear Programming, you are to make up (or collect actual data) so as to be able to apply a complete linear programming optimization. Make sure to show the data to be used, show the linear program you create, and the solutio..

  Find the expected number of customers picked up

Draw a sketch of a sample function of R(t). Given that the first bus arrives at time X1 = x, find the expected number of customers picked up; then find E r(x R(t)dtl, again given the first bus arrival at X1 = x.

  Compute the means of all possible samples of size

Compute the means of all possible samples of size and construct the sampling distribution of the sample means - Construct and Interpreta 95% confidence interva

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd