FIT5197 Modelling for data analysis Assignment

Assignment Help Programming Languages
Reference no: EM132490721

FIT5197 - Modelling for data analysis - Monash University

Confidence Interval, Hypothesis Testing, Data Mining Models

Objectives
This assignment assesses your understanding of Confidence Interval, Hypothesis Testing, and Data Mining Models.

Question 1: Central Limit Theorem
Centrl Limit Theorem belives that the sampling distribution of the mean of samples has a particular property. Regardless of the population that we want to make inference about it, if we draw many samples, the sampling distribution of the sample mean is always symmetric and bell-shaped. Please program to simulate the Central Limit Theorem in different pupolation distributions (at least 3) and sampling sizes (at least 3). Totally 9 trails.

Question 2: Hypothesis Testing

(1). A steel-making factory want to know if the introduced new method can increase its productivity. The staffs recorded 10 productivity results of the old method and the new method, respectively. The results are given as follow. Explain your t-test findings.

Old Method: 78.1 72.4 76.2 74.3 77.4 78.4 76.0 75.5 76.7 77.3 <br> New Method: 79.1 81.0 77.3 79.1 80.0 79.1 79.1 77.3 80.2 82.1 <br>

Note that samples are independent with each other and come from normal distributions N(μ1, σ2) and N(μ2, σ2) where μ1,μ2 and σ2 are unknown.

old <- c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3)
new <- c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1)
(2). Do the old and new samples truely come from two distributions of the same variance?

Question 3: Linear Regression and Anova

Use the dataset ‘Q3 Data.txt' (which is tab delimited) on different brands of cigarettes - you want to predict CO (Carbon Monoxide output) given the other variables.

1. Fit all seven possible linear models with CO as the dependent variable (i.e. with all possible sets of independent variables except for no independent variables) and summarise the results in a table.

2. Identify what you think is the best model for predicting CO and explain why you think it is good.

3. Include a summary of diagnostic checks that you try for your best model (Residuals versus Fitted, Normal Q-Q, scale-location, and residuals vs leverage.).

Question 4: Logistic Regression
You are required to predict affair with Logistic Regression in this task. The used dataset comes from a survey conducted by Psychology Today in 1969 which contains 601 observations on 9 variables.A detailed data description is shown as below.

affairs
numeric. How often engaged in extramarital sexual intercourse during the past year? 0 = none, 1 = once, 2 = twice, 3 = 3 times, 7 = 4-10 times, 12 = monthly, 12 = weekly, 12 = daily.

gender
factor indicating gender.

age
numeric variable coding age in years: 17.5 = under 20, 22 = 20-24, 27 = 25-29, 32 = 30-34, 37 = 35-39, 42 = 40-44, 47 = 45-49, 52 = 50-54, 57 = 55 or over.

yearsmarried
numeric variable coding number of years married: 0.125 = 3 months or less, 0.417 = 4-6 months, 0.75 = 6 months-1 year, 1.5 = 1-2 years, 4 = 3-5 years, 7 = 6-8 years, 10 = 9-11 years, 15 = 12 or more years.

children
factor. Are there children in the marriage?

religiousness
numeric variable coding religiousness: 1 = anti, 2 = not at all, 3 = slightly, 4 = somewhat, 5 = very.

education
numeric variable coding level of education: 9 = grade school, 12 = high school graduate, 14 = some college, 16 = college graduate, 17 = some graduate work, 18 = master's degree, 20 = Ph.D., M.D., or other advanced degree.

occupation
numeric variable coding occupation according to Hollingshead classification (reverse numbering).

rating
numeric variable coding self rating of marriage: 1 = very unhappy, 2 = somewhat unhappy, 3 = average, 4 = happier than average, 5 = very happy.

# install.packages("AER")
data(Affairs,package="AER")

summary(Affairs)
## affairs gender age yearsmarried children
## Min. : 0.000 female:315 Min. :17.50 Min. : 0.125 no :171
## 1st Qu.: 0.000 male :286 1st Qu.:27.00 1st Qu.: 4.000 yes:430
## Median : 0.000 Median :32.00 Median : 7.000
## Mean : 1.456 Mean :32.49 Mean : 8.178
## 3rd Qu.: 0.000 3rd Qu.:37.00 3rd Qu.:15.000
## Max. :12.000 Max. :57.00 Max. :15.000
## religiousness education occupation rating
## Min. :1.000 Min. : 9.00 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:14.00 1st Qu.:3.000 1st Qu.:3.000
## Median :3.000 Median :16.00 Median :5.000 Median :4.000
## Mean :3.116 Mean :16.17 Mean :4.195 Mean :3.932
## 3rd Qu.:4.000 3rd Qu.:18.00 3rd Qu.:6.000 3rd Qu.:5.000
## Max. :5.000 Max. :20.00 Max. :7.000 Max. :5.000
1.Data Pre-Processing. (E.g. removal of null values; numeralization of factor features; split of training and test set with a ratio of 8:2, etc.)

2.Built a logistic regression model on the training data. You need to determine which feature to use based on the p values analysis.

3.Evaluate the trained model on the test set.

Attachment:- Modelling for data analysis.zip

Reference no: EM132490721

Questions Cloud

Explain how risk of shares calculated by standard deviation : Explain how the risk of shares can be calculated by the standard deviation. Your explanation should include the usage of the dispersion statistics.
Compound of calcium and chlorine : For the following questions consider a compound of calcium and chlorine.
Explain accrual vs cash accounting method : Accrual vs. Cash accounting method. Explain the differences between the 2 methods. Which method is preferable? Are there advantages for using the cash method?
Which situations is best described by the timeline : Which of the following situations is best described by the timeline shown below?Month 0 Cash Flow -$250 minus, Month 1 Cash Flow -$250 minus
FIT5197 Modelling for data analysis Assignment : FIT5197 Modelling for data analysis Assignment help and solution, Monash University - assessment writing service - Linear Regression and Anova
What is the change in net working capital : At the end of the year, the current assets were $122,418 and the current liabilities were $103,718. What is the change in net working capital?
What is the amount of the cash flow to creditors : The beginning and ending total debt balances were $84,652 and $78,613, respectively. The interest paid was $4,767. What is the amount of the cash flow to credit
What is the difference in the expected npv : The CFO thinks an adjustment is required. What is the difference in the expected NPV if the inflation adjustment is made vs. if it is not made?
Determine what ethical issues involved as a result of sale : What are the ethical issues involved as a result of the sales manager's request? If you were the accountant of this organisation, what action (if any) would you

Reviews

len2490721

4/11/2020 2:17:21 AM

Read attached A2.html file for assignment requirements. Using R Studio (I''m using Windows), input the answers to Assignment questions in the A2.Rmd file and ensure the R code can be run completely without error to generate the answers in html format (knit to html option). For Question 4, it is acceptable to predict whether there is an affair or not (not required to predict the number of affairs). Please ensure the code is well documented and the answers are clear.

Write a Review

Programming Languages Questions & Answers

  Write a haskell program to calculates a balanced partition

Write a program in Haskell which calculates a balanced partition of N items where each item has a value between 0 and K such that the difference b/w the sum of the values of first partition,

  Create an application to run in the amazon ec2 service

In this project you will create an application to run in the Amazon EC2 service and you will also create a client that can run on local machine and access your application.

  Explain the process to develop a web page locally

Explain the process to develop a Web page locally

  Write functions

These 14 questions covers java class, Array, link list , generic class.

  Programming assignment

If the user wants to read the input from a file, then the output will also go into a different file . If the user wants to read the input interactively, then the output will go to the screen .

  Write a prolog program using swi proglog

Write a Prolog program using swi proglog

  Create a custom application using eclipse

Create a custom Application Using Eclipse Android Development

  Create a application using the mvc architecture

create a application using the MVC architecture. No scripting elements are allowed in JSP pages.

  Develops bespoke solutions for the rubber industry

Develops bespoke solutions for the rubber industry

  Design a program that models the worms behavior

Design a program that models the worm's behavior.

  Writing a class

Build a class for a type called Fraction

  Design a program that assigns seats on an airplane

Write a program that allows an instructor to keep a grade book and also design and implement a program that assigns seats on an airplane.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd