Outliers - reasons for screening data, Advanced Statistics

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Posted Date: 3/4/2013 6:22:24 AM | Location : United States







Related Discussions:- Outliers - reasons for screening data, Assignment Help, Ask Question on Outliers - reasons for screening data, Get Answer, Expert's Help, Outliers - reasons for screening data Discussions

Write discussion on Outliers - reasons for screening data
Your posts are moderated
Related Questions
This is the theorem which states that if the error terms in a multiple regression have the same variance and are not corrected, then the estimators of the parameters in the model p

The statistical methods for estimation and inference which are based on a function of sample observations, probability distribution of which does not rely upon a complete speci?cat

Briefly explain the importance of forecasting for managers?

Procrustes analysis is a technique of comparing the alternative geometrical representations of a group of multivariate data or of the proximity matrix, for instance, two competing

Particlefilters is a simulation method for tracking moving target distributions and for reducing computational burden of the dynamic Bayesian analysis. The method uses a Markov ch

Product-limit estimator is a method for estimating the survival functions for the set of survival times, some of which might be censored observations. The logic behind the procedu

O'Brien's two-sample tests are the extensions of the conventional tests for assessing the differences between treatment groups which take account of the possible heterogeneous nat

a psychic claims to be able to "feel colors" there are three pieces of colored paper(red, blue,green) he will place his hand on radomly selected pieces while blindfolded. you perfo

Generalized method of moments (gmm) is the estimation method popular in econometrics which generalizes the method of the moments estimator. Essentially same as what is known as the

Data which occur when failure period is recorded which are dependent. Such type of data can arise in number contexts, for instance, in epidemiological cohort studies in which th