Outliers - reasons for screening data, Advanced Statistics

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Posted Date: 3/4/2013 6:22:24 AM | Location : United States







Related Discussions:- Outliers - reasons for screening data, Assignment Help, Ask Question on Outliers - reasons for screening data, Get Answer, Expert's Help, Outliers - reasons for screening data Discussions

Write discussion on Outliers - reasons for screening data
Your posts are moderated
Related Questions
Jelinski  Moranda model is t he model of software reliability which supposes that failures occur according to the Poisson process with a rate decreasing as more faults are diagnos

what is pdf,mean & variance for multimodal distribution?

I need help solving a problem using excel.

A rule for computing the number of classes to use while constructing a histogram and  can be given by   here n is the sample size and ^ γ is the estimate of kurtosis.

An unusual aggregation of the health events, real or perceived. The events might be grouped in the particular region or in some short period of time, or they might happen among the

Behrens Fisher problem : The difficulty of testing for the equality of the means of the two normal distributions which do not have the equal variance. Various test statistics have

Glim is the software package specifically suited for fitting the generalized linear models (the acronym stands for the Generalized Linear Interactive Modelling), including the log

Briefly explain the importance of forecasting for managers?

The linear component ηi, de?ned just in the traditional way: η i = x' 1 A monotone differentiable link function g that describes how E(Yi) = µi is related to the linear compon

Computer-intensive methods : The statistical methods which require almost identical computations on the data repeated number of times. The term computer intensive is, certainly, a