Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Probability, Modern hotels and certain establishments make use of an electr...

Modern hotels and certain establishments make use of an electronic door lock system. To open a door an electronic card is inserted into a slot. A green light indicates that the doo

Two-phase sampling, Two-phase sampling is the sampling scheme including tw...

Two-phase sampling is the sampling scheme including two distinct phases, in the first of which the information about the particular variables of interest is collected on all the m

White''s general heteroscedasticity test, The Null Hypothesis - H0:  γ 1 =...

The Null Hypothesis - H0:  γ 1 = γ 2 = ...  =  0  i.e.  there is no heteroscedasticity in the model The Alternative Hypothesis - H1:  at least one of the γ i 's are not equal

Evaluate the maximum flow, In the network shown below, the rst of the two ...

In the network shown below, the rst of the two numbers on each arc indicates the arc capacity and the second (in parentheses) of the two numbers indicates the current  flow. Use t

Assignment, Hi there i have send mail on info@expertminds regarding assignm...

Hi there i have send mail on info@expertminds regarding assignment, i am waiting nearly 45 minutes for reply

Quality-adjusted survival analysis, Quality-adjusted survival analysis is ...

Quality-adjusted survival analysis is a method for evaluating the effects of treatment on survival which allows the consideration of quality of life as well as the quantity of lif

Staitistics project, I need a statistics project done. How much will it cos...

I need a statistics project done. How much will it cost?

Mardia''s multivariate normality test, Mardia's multivariate normality test...

Mardia's multivariate normality test is a test that a set of the multivariate data arise from the multivariate normal distribution against departures due to the kurtosis. The test

Regression, calculate the mean yearly value using the average unemployment ...

calculate the mean yearly value using the average unemployment rate by month

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd