Apply the function plot to the formula

Assignment Help Basic Statistics
Reference no: EM131892847

ASSIGNMENT

In the current assignment we apply some of the tools to analyze the data. The data was collected from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The current assignment involves data collected on a random sample of 748 donors. The data was obtained from the UCI Machine Learning Repository.

The file "transfusion.csv" contains the data. The file can be found here. The file contains 5 variables:
- recency = The number of months since the last donation. (numeric)
- frequency = The total number of donations. (numeric)
- monetary = Total blood donated (in c.c.). (numeric)
- time = The number of months since the first donation. (numeric)
- march2007 = An indicator. Indicates those that donated blood in March, 2007. (factor)
In the assignment we consider the last four variables.

Comparing Two Samples
Consider "frequency" as a response and "march2007" as an explanatory variable. Plot the relation between the two variables, test the equality of the expectation in the two sub-samples and the equality of the variance. Repeat the same analysis for the case where the response "frequency" is replaced by the log-transformed response: "log(frequency)". In Tasks 1-3 you are asked to describe the results of the analysis.

Linear Regression
In Tasks 4-7 you are asked to conduct an analysis similar to the analysis of Tasks 1-3. The difference is that the numerical variable "time" is used as the explanatory variable. The model of linear regression assumes that the expectation of the response is a linear function of the explanatory variable. Another assumption of the model is that the variance of the response is constant for each value of the explanatory variable. Frequently, however, one may observe an increase in the variance for larger values of the explanatory variable. Replacing the response by the log-transformed response is a commonly used method to overcome this difficulty. The analysis that involves the log of the response can be carried out via the replacement of the response "frequency" in the formula by the transformed response "log(frequency)".

The Relation Between Two Variables
The final Task 8 involves the investigation of the relation between the response "frequency" and the variable "monetary".

Tasks

Comparing Two Samples:

1. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "march2007" in order to produce the two box-plots of the response. Redo the plotting with "frequency" replaced by "log(frequency)". The distribution of the variable "log(frequency)" is:

__ More symmetric, __ Less symmetric compared to the distribution of the variable "frequency".

Mark the most appropriate option and attach the R code that produces the two plots:

2. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The expectation of "frequency" is the same in the two subsets,

(Reject/Don't Reject) H0: The expectation of "log(frequency)" is the same in the two subsets.

Explain your answer:

3. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The variance of "frequency" is the same in the two subsets,

(Reject/Don't Reject) H0: The variance of "log(frequency)" is the same in the two subsets.

Explain your answer:

Linear Regression:

4. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "time" in order to produce the scatter plot. Add the regression line to the plot. The variability of the variable "frequency, for larger values of the explanatory variable, is:

__ Smaller, __ Larger, __ Constant.

Mark the most appropriate option and attach the R code that produces the two plots:

5. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response "frequency" is equal to zero,

(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response "log(frequency)" is equal to zero.
Explain your answer:

6. The 95%-confidence interval of slope of "time" in the regression line of the response "log(frequency)" is:
Lower end = ____, Upper end = ____.

Attach the R code that produces the confidence interval:

7. The regression line between "time" as an explanatory variable and "log(frequency)" as a response is:
__ Increasing, __ Decreasing, __ Constant.

Mark the most appropriate option and explain your answer:

The Relation Between Two Variables:

8. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "monetary" in order to produce the scatter plot. Add the regression line to the plot. The points in the scatter plot are:

__ All on the same line, __ Show a linear trend but are not on the same line, __ Don't show a linear trend.

Mark the most appropriate option and attach the R code that produces the plot:

Attachment:- Data.rar

Reference no: EM131892847

Questions Cloud

Potential negative consequences for individuals : List two potential negative consequences for individuals and organizations when Asian Americans are perceived as the "model minority".
Statistical breakdown of the student population : Identify a college or university that is known for having a diverse student population. Explain: 1) the statistical breakdown of the student population
Discuss the advantages of systems approach : The text advocates a systems approach to career development. Discuss the advantages and disadvantages of this approach. Support your position using peer.
Specific listening techniques : From a personal level of communication rather than a work aspect, what are some specific listening techniques that you think could be used to improve
Apply the function plot to the formula : Apply the function plot to the formula that relates the response "frequency" to the explanatory variable "monetary" in order to produce the scatter plot
Provide an example of the most significant barrier : Recall a situation when your organization decided to make changes to an internal process or procedure. In your response, address these topics:
Would you recommend visiting the site to a friend-why : Would you recommend visiting this site to a friend-why or why not? Why should this site continue to be preserved for future generations?
Well-structured process-several logical steps : Hypothesis testing is a well-structured process that consists of several logical steps, and it aims at refining a business decision.
Explain the term psychological empowerment : The term psychological empowerment describes how the intrinsic motivation and self-efficacy of people are influenced by leadership behavior.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Construct a stem and leaf display for given data

Construct a stem-and-leaf display for these data.- Construct a frequency distribution for these data.- Determine the interval width and the class mark for each of the classes in your frequency distribution.

  Confidence interval for time spent on computer terminals

In order to estimate the average time spent on the computer terminals per student at a local university, data were collected for a sample of 81 business students over a one week period.

  Determining probability density function

Let X be a random variable with probability density distribution given by. Find the probability density function of Y=10X - 4.

  Establishment of subsidiaries in foreign markets

If perfect markets existed, would wages, prices, and interest rates among countries be more similar or less similar than under conditions of imperfect markets? Why?

  What is the standard error of the sampling distribution

What is the expected value of the proportion of those in the sample who are retired?- What is the standard error of the sampling distribution of the proportion, p?

  Performing a hypothesis test to determine good evidence

Assume that sigma = 60, choose an alpha level, and perform a hypothesis test to determine whether this is good evidence to conclude that the mean score for all young men is less than 275.

  Determine a typical seasonal index

The Appliance Centre sells a variety of electronic equipment and home appliances. For the last four years the following quarterly sales (in $ millions) were reported.

  You write a new app that can take a picture of someone and

you write a new app that can take a picture of someone and automatically draw a funny mustache on them. it takes 3

  Definition of confidence interval

The 95% confidence interval is defined as ____. a) an interval such that we are 95% confident that the population value lies within the interval b) an interval such that the probability is 0.95 that the interval contains the population value

  Identify the explanatory x and response y variable find

does familiarity with neighbors lead to lower crime rates? a researcher has theorized that if neighbors know each other

  Standard error of the difference in means

The mean soda consumption for girls was 30 oz per day and the standard deviation was 5. What is the standard error of the difference in means?

  Find probability that three cartons contain unbroken eggs

Suppose the probability that a carton of eggs will contain at least one broken egg is 1/3. use a simulation of 20 trials to find the probability that the next three cartons will contain only unbroken eggs.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd