Estimate correlation between the illiteracy and birth rates

Assignment Help Advanced Statistics
Reference no: EM131421443

Note: All dataset files have a heading, so be sure to indicate so in R/RStudio when loading each.

1. Suppose we have postulated the model

Y = βsin(X).

To estimate β, a random sample (x1, y1), (xn, yn) is obtained. Then, the equation y^i = β^sin(xi) denotes the fitted value of Y, when X = xi.

Derive a formula for the least-squares estimate of β, i.e., β^.

2. A campaign manager conducts a survey to gauge voter support for his candidate Lopez. She gathers data on the age of a registered voter (x) and whether this person supports Lopez (Y = 1) or somebody else (Y = 0).

An analysis yields the following logistic equation:

In. P^(x)/(1-P^(x)) = -0.324 + 0.012x.

where p(x) is the probability of a vote for Lopez.

(a) Find the estimated probability that a 21-year-old voter will vote for Lopez.

(b) Compare the odds of support for Lopez between two people that are 10 years apart in age.

(c) Interpret the coefficient on x (0.012) in the logistic equation.

(d) At what age is the estimated probability of a vote for Lopez equal to 0.5?

3. In your Own words, describe the difference in the classification methods given by logistic regression and linear discriminant analysis.

4. Is there a relationship between female illiteracy and birth rate? In particular, can the birth rate in a given country be effectively predicted using the illiteracy rate? The file ILLiteracy contains data On a sample of countries where female illiteracy is more than 5%.. The variable Illit is the percentage of women over 15 years of age that are illiterate, and the variable Births is the number of births per woman in that country.

(a) Estimate the correlation between the illiteracy and birth rates. Interpret the value. What does it say about a possible relationship between the variables.

(b) Create a scatter plot of birth rate against illiteracy rate and comment on the relationship.

(c) Give the estimated regression equation and interpret the slope coefficient and R2 statistic.

(d) Create a residuals plot and comment on the appropriateness of a linear model.

(e) Can we say that improving literacy (i.e., reducing illiteracy) will result in a lower birth rate? Justify your answer with an appropriate hypothesis test.

5. The data set Wal lee from the Minnesota Pollution Control Agency contains data on length (inches) and weight (pounds) measurements for a sample of 60 walleye caught in Minnesota lakes.

(a) Fit a linear model to the data that predicts weight based on length. Provide visual evidence that this model is not appropriate for the relationship between length and weight of fish.

(b) Applying an appropriate transformation to the data, fit a power model to the data, i.e., W = aLb. Give the estimated values of a and b. as well as a 95% confidence interval for b.
(c) Applying an appropriate transformation to the data, fit an exponential model to the data. i.e., W = aeu . Give the estimated values of a and b, as well as a 95% confidence interval for b.

(d) Which model, power or exponential, provides the best fit to the data? Justify your answer.

(e) Using the model selected in (d), give a 95% bootstrap percentile interval for the b. Compare this interval to the confidence interval found in either (b) or (c), depending on which model you selected.

6. The data set Carseat s contains sales information for child car seats at 400 different stores.

(a) Fit a multiple linear regression model to predict Sales using the following predictors:

Income - community income, level (in thousands of dollars)
c Advertising - local advertising budget for company at each location (in thousands of dollars}
o Price - price company charges for car seats at each site
o ShelveLoc - a factor with levels Bad, Good, and Medium indicating the quality of the shelving location for the car seats at each site
o Age - average age of the local population
o Urban - a factor with levels NO and Yes to indicate whether the store is in an urban or rural location
c US - a factor with levels No and Yes to indicate whether the store is in the U.S. or not

Also include an interaction between Income and Advert is i ng, and between Price and Age. Is the model significant overall in predicting sales?
(b) Provide an interpretation of each coefficient in the model you fit in (a).
(c) For which of the predictors in (a) can you reject the null hypothesis H0 : = o? Justify your
answer and explain what it means to reject H1.
(d) Comment on the results of (c). Do they make intuitive sense?
(e) On the basis of your response to (c), fit a smaller model that only uses the predictors for which there is evidence of association with the response.
(f) How well do the models in (a) and (e) fit the data?

7. The data set Boston contains the following information about 506 neighborhoods around Boston.
o c r im - per capita crime rate by neighborhood
o z n - proportion of residential land zoned for lots over 25. 000 sq.ft.
o Indus - proportion of non-retail business acres per neighborhood
o char - Charles River dummy variable (1 if neighborhood touches river; 0 otherwise)
o nox - nitrogen oxides concentration (parts per 10 million)
o rm - average number of rooms per dwelling
o age - proportion of owner-occupied units built prior to 1940
o di s - weighted mean of distances to five Boston employment centers
o rad - index of accessibility to radial highways
o t ax - full-value property-tax rate per $10. 000
o pt rat - pupil-teacher ratio by neighborhood
o black - 1000(Bk - 0.63)2, where Bk is the proportion of black residents by neighborhood
o 1 s t at - percentage of households with low socioeconomic status
o medv - median value of owner-occupied homes (in thousands of dollars)

In this question, you will develop models to predict whether a given neighborhood has a crime rate above or below the median.

(a) Create a binary variable crim01 that codes whether or not a neighborhood is above (1) or below (0) the median crime rate given by the data set.

(b) Fit a logistic regression model that predicts whether a neighborhood has a crime rate above or below the median using all other variables in Boston as predictors.

(c) In the model fit in (b), note that P. codes the individual significance of nox at the highest level ( **1 ). Given an explanation as to why this variable would be significant in predicting the probability that a neighborhood would have crime rate above the median.

(d) One by one, remove predictors from the model fit in (b) until only predictors with the highest level of individual significance ( 1***/ ) remain.

(e.) Compare the deviance of the full model fit in (b) with the final model fit in previous question. Com¬ment on the difference between the deviance for these models and the differences when compared to the null model.

(f) Split the data into a training set and a test set. Use set.seed ( 352017) .

(g) Using as predictors the variables identified as having the highest level of individual significance in (d), fit a logistic regression model to the training set and estimate the test error of the model using the test set.

(h) Using the same predictors used in (g), perform LDA on the training data to predict crim01 and estimate the test error of the model using the test set.

(i) Using the same predictors used in (0, perform QDA on the training data to predict crim01 and estimate the test error of the model using the test set.

(I) Which of these methods, logistic regression, LDA, orQDA, provides the best model for classifying neighborhoods with a high crime rate. Compare the performance of that model to that of the null model.

(k) Summarize your findings regarding predicting whether a given neighborhood has a crime rate above or below the median. What advice, based on these findings, would you give to a family moving to Boston on selecting a neighborhood to live in? Use appropriate visualizations to support your summary and advice.

Attachment:- datasets.zip

Reference no: EM131421443

Questions Cloud

Describe the effect that obesity has had on you personally : Obesity in America is considered an epidemic. There are many contributing factors to obesity (both childhood and adult), such as biological, environmental, social, or economic factors. Review the information on obesity on pages 419 to 421 in the t..
How much in dividends did duke receive in 2008 : Assume that Duke owns approximately 40 percent of the outstanding common stock of the affiliates and made no additional equity investment on sales during 2008. How much net loss did the affiliates report for 2008?
Analyze why equilibrium of supply and demand is desirable : Prepare a 1,050-word paper addressing the Explain why equilibrium of supply and demand is desirable. Explain the following concepts using the concept of consumer and producer surplus.
Find the magnitude of the combined magnetic field : If the currents are in opposite directions, find the magnitude of the combined magnetic field at the midpoint between the two wires. Calculate the answer in micro-tesla (µT) and rounded to three significant figures.
Estimate correlation between the illiteracy and birth rates : Find the estimated probability that a 21-year-old voter will vote for Lopez and compare the odds of support for Lopez between two people that are 10 years apart in age - what age is the estimated probability of a vote for Lopez equal to 0.5?
Electric field strength inside the capacitor : 1. A parallel-plate capacitor is formed from two 2.9 cm -diameter electrodes spaced 3.0 mm apart. The electric field strength inside the capacitor is 1.0×106N/C.
Compute the 2011 and 2012 income effects : Assume that the investment was originally classified as trading securities and then changed to available-for-sale on December 31, 2012. Provide the journal entries recorded at October 18, 2011; December 31, 2011; and December 31, 2012.
Do you rent the space to the theatre company : A parish chapel is closed for economic reasons, but, technically, is still owned by the local church. To raise funds, the church has decided to "rent" out the chapel to local organizations for meetings, events or activities, since it is no longer..
Find the current in the circuit one time constant : Find the charge on the capacitor in the circuit one time constant (τ=RC) after the circuit is connected to a 9.0-V battery. Find the current in the circuit one time constant (τ=RC) after the circuit is connected to a 9.0-V battery.

Reviews

len1421443

3/9/2017 7:52:14 AM

Subject: R Programming Comment: Please refer to attached pdf file and dataset zip folder. All dataset files have a heading, so be sure to indicate so in R/RStudio when loading each.Also include an interaction between Income and Advert is i ng, and between Price and Age. Is the model significant overall in predicting sales?

Write a Review

Advanced Statistics Questions & Answers

  Relationship between speed, flow and geometry

Write a project proposal on relationship between speed, flow and geometry on single carriageway roads.

  Logistic regression model

Compute the log-odds ratio for each group in Logistic regression model.

  Logistic regression

Foundations of Logistic Regression

  Probability and statistics

The tubes produced by a machine are defective. If six tubes are inspected at random , determine the probability that.

  Solve the linear model

o This is a linear model. If your model needs a different engine, then you need to rethink your approach to the model. Remember, there are no IF, Max, or MIN statements in linear models.

  Plan the analysis

Plan the analysis

  Quantitative analysis

State the hypotheses that you are going to test.

  Modelise as a markov chain

modelise as a markov chain

  Correlation and regression

What are the degrees of freedom for regression

  Construct a frequency distribution for payment method

Construct a frequency distribution for Payment method

  Perform simple linear regression

Perform simple linear regression

  Quality control analysis

Determining the root causes

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd