Bhv estimation project proposal

Assignment Help Basic Statistics
Reference no: EM131068782

BHV Estimation project proposal

Member 4 recently dropped, need one more member

B. What would be the important predictors for estimating the Boston housing value and how would these predictors affect the housing value of Boston? We want to figure out the relationship between these predictor variables and the response variable (home value).

C. Description of the dataset:

There are 506 observations of 14 variables in this dataset. These 14 variables contain 13 continuous attributes and 1 binary-valued attribute.

1) CRIM refers to per capita crime rate by town
2) ZN refers to the proportion of residential land zoned for lots over 25,000 square feet
3) INDUS refers to the proportion of non-retail business acres per town
4) CHAS is a qualitative variable that refers to Charles River dummy variable (= 1 if tract bounds river; 0 otherwise
5) NOX refers to the nitric oxides concentration (parts per 10 million)
6) RM refers to average number of rooms per dwelling
7) AGE refers to the proportion of owner-occupied units built prior to 1940
8) DIS refers to weighted distances to five Boston employment centres
9) RAD refers to the index of accessibility to radial highways
10) TAX refers to the full-value property-tax rate per $10,000
11) PTRATIO refers to pupil-teacher ratio by town
12) B refers to 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13) LSTAT refers to % lower status of the population
14) MEDV refers to Median value of owner-occupied homes in $1000's

D. The techniques we think would be useful are cross-validation, PCA and LDA. For the cross-validation, we need to first separate the data into train set and test set with proper percentages. Then we use set.seed() method to get the corresponding train set and test set, we can then use the train set and test set to find the misclassification test error using the KNN-Fold Cross-Validation strategy. We can then compare the test errors at each K value and find the minimal test error and the best K value for the number of folds. For the PCA, we need to draw histograms for the response variable(s) to check for their skewness and normality. If the data is normal, we need to scale the data using mean and standard deviation. If the data is skewed, we need to scale the data using median and median absolute deviation.

We can then look at the loadings for each Principal Component and find the best PC's for our predictor variables. For further investigation, we can use biplot to visualize the PC's and determine which ones work the best. We can then apply the LDA test to do the logistic discrimination analysis for the data. We then can compare the LDA and PCA to find the best estimators for our predictor variables. These are the ones we learned in class so far. There might be some more useful techniques we can apply after getting further in the course, so our techniques for this data might change in the future.

E. How many PC's should we select or use?
What should we do if there are more than one indicator variable?
How should we treat the outliers?
Do we need to use Box-Cox transformation for the data?

Reference no: EM131068782

Questions Cloud

Calculate the velocity and acceleration vectors : Calculate the velocity and acceleration vectors and the speed at t=pi/4 for a particle whose position at time t isgiven by vector r(t)=(cost t) i +( cos 2t) j +( cos 3t) k.
Length of the rods coming out of our new cutting machine : The length of the rods coming out of our new cutting machine can be said to approximate a normal distribution with a mean of 10 inches and a standard deviation of .2 inch find Find the probability that a rod will have a length of less than 10.inch..
Write about accounting for strategic management and control : Write a paper on Accounting for Strategic Management and Control (ASMC). Wordcount - 1750 words.
What is the mean hdl for group males : Application Assignment Worksheet- SPSS Hypothesis Testing. What is the mean HDL for group males? What is the HDL standard deviation for males? What is the mean HDL for group females? What is the HDL standard deviation for females
Bhv estimation project proposal : B. What would be the important predictors for estimating the Boston housing value and how would these predictors affect the housing value of Boston? We want to figure out the relationship between these predictor variables and the response variabl..
Exponential function to model : a. Write an exponential function to model this situation using the formula A = P(1 + r)^t. Is it growth or decay? b. If he hasn't withdrawn any money, how much is in the account when he retires?
Evaluate the development of the movement and its successes : A strong thesis statement supported by research from at least 5 different sources. In a research-based project like this, it is important to refer to and cite your sources throughout the paper to show where your information is coming from and to s..
Write essay on the concepts discussed in article : You may want to clarify the thesis topic, highlight the assumptions made, the biases of the author etc and provide a brief summary of the argument and the supporting evidence.
Term paper on- hydrogen fuel cell vehicles : Choose one of these topics , TRANSPORTATION:- Transportation Energy Alternatives, Hydrogen Fuel Cell Vehicles, The Revival of Elactric Cars, Plug-in Hybrid Cars and The Past, Present and Future of Compressed-Air Vehicles.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Statistics-probability assignment

MATH1550H: Assignment:  Question:  A word is selected at random from the following poem of Persian poet and mathematician Omar Khayyam (1048-1131), translated by English poet Edward Fitzgerald (1808-1883). Find the expected value of the length of th..

  What is the least number

MATH1550H: Assignment:  Question:     what is the least number of applicants that should be interviewed so as to have at least 50% chance of finding one such secretary?

  Determine the value of k

MATH1550H: Assignment:  Question:     Experience shows that X, the number of customers entering a post office during any period of time t, is a random variable the probability mass function of which is of the form

  What is the probability

MATH1550H: Assignment:Questions: (Genetics) What is the probability that at most two of the offspring are aa?

  Binomial distributions

MATH1550H: Assignment:  Questions:  Let’s assume the department of Mathematics of Trent University has 11 faculty members. For i = 0; 1; 2; 3; find pi, the probability that i of them were born on Canada Day using the binomial distributions.

  Caselet on mcdonald’s vs. burger king - waiting time

Caselet on McDonald’s vs. Burger King - Waiting time

  Generate descriptive statistics

Generate descriptive statistics. Create a stem-and-leaf plot of the data and box plot of the data.

  Sampling variability and standard error

Problems on Sampling Variability and Standard Error and Confidence Intervals

  Estimate the population mean

Estimate the population mean

  Conduct a marketing experiment

Conduct a marketing experiment in which students are to taste one of two different brands of soft drink

  Find out the probability

Find out the probability

  Linear programming models

LINEAR PROGRAMMING MODELS

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd