Outliers - reasons for screening data, Advanced Statistics

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Posted Date: 3/4/2013 6:22:24 AM | Location : United States







Related Discussions:- Outliers - reasons for screening data, Assignment Help, Ask Question on Outliers - reasons for screening data, Get Answer, Expert's Help, Outliers - reasons for screening data Discussions

Write discussion on Outliers - reasons for screening data
Your posts are moderated
Related Questions
The procedure in which initially the sample of subjects is selected for generating the auxillary information only, and then the second sample is selected in which the variable of i

The interplay of the genes and environment on, for instance, the risk of disease. The term represents the step away from the argument as to whether the nature or nurture is the pre

Median absolute deviation (MAD) : It is the very robust estimator of the scale given by the following equation   or, in other words we can say that, the median of the absolute

Cartogram : It is the diagram in which descriptive statistical information is displayed on the geographical map by the means of shading, different symbols or in some other possibly

In the time series plot and scatter graphs there were many outliers that were clearly visible. These have been removed to identify if they were influential or had high leverage and

Duck Lovers Unlimited (DLU) Inc. assembles specially configured light jet aircrafts for airborne duck hunting. The quarterly demand forecasts for the upcoming fiscal year are:

1) Let N1(t) and N2(t) be independent Poisson processes with rates, ?1 and ?2, respectively. Let N (t) = N1(t) + N2(t). a) What is the distribution of the time till the next epoch

In the network shown below, the rst of the two numbers on each arc indicates the arc capacity and the second (in parentheses) of the two numbers indicates the current  flow. Use t

Basic reproduction number : A term used in the theory of infectious diseases for the number of secondary cases which one case would generate in a completely susceptible population.

Particlefilters is a simulation method for tracking moving target distributions and for reducing computational burden of the dynamic Bayesian analysis. The method uses a Markov ch