Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Describe population pyramid, Population pyramid : The diagram designed to s...

Population pyramid : The diagram designed to show the comparison of the human population by sex and age at a given instant time, consisting of a pair of the histograms, one for eve

Differences-in-differences estimator, The estimator of the group by the tim...

The estimator of the group by the time period interaction in a study in which the subjects in two different groups are observed in two different time periods. Normally one of th

Describe item-total correlation, Item-total correlation is an  extensively...

Item-total correlation is an  extensively used method for checking the homogeneity of the scale made up of number of items. It is simply the Pearson's product moment correlation c

White''s general heteroscedasticity test, The Null Hypothesis - H0:  γ 1 =...

The Null Hypothesis - H0:  γ 1 = γ 2 = ...  =  0  i.e.  there is no heteroscedasticity in the model The Alternative Hypothesis - H1:  at least one of the γ i 's are not equal

Randomization tests, Randomization tests are the procedures for determinin...

Randomization tests are the procedures for determining the statistical significance directly from the data with- out recourse to some particular sampling distribution. For instanc

Assignment, Different approaches to the study of early indian history

Different approaches to the study of early indian history

T test , How do I report the results in the table?

How do I report the results in the table?

Log-linear models, Log-linear models is the models for count data in which...

Log-linear models is the models for count data in which the logarithm of expected value of a count variable is modelled as the linear function of parameters; the latter represent

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd