Outliers - reasons for screening data, Advanced Statistics

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Posted Date: 3/4/2013 6:22:24 AM | Location : United States







Related Discussions:- Outliers - reasons for screening data, Assignment Help, Ask Question on Outliers - reasons for screening data, Get Answer, Expert's Help, Outliers - reasons for screening data Discussions

Write discussion on Outliers - reasons for screening data
Your posts are moderated
Related Questions
The method of summarizing the large amounts of data by forming the frequency distributions, scatter diagrams, histograms, etc., and calculating statistics like means variances and

show all the ways in which 3 games of football can be concluded(it can be a win W,a loss L,or a draw X)

Quota sample is the sample in which the units are not selected at the random, but in terms of a particular number of units in each of a number of categories; for instance, 10 men

The functions of the data and the parameters of interest which can be brought in use to conduct inference about the parameters when full distribution of the observations is unknown

Graduation is the term is employed most often in the application of the actuarial statistics to denote procedures by which the set or group of observed probabilities is adjusted t

Duck Lovers Unlimited (DLU) Inc. assembles specially configured light jet aircrafts for airborne duck hunting. The quarterly demand forecasts for the upcoming fiscal year are:

A vague concept which occurs all through statistics. Essentially the term means the number of independent units of the information in an easy relevant to the estimation of the para

how to constuct design matrix

Write a c++ program to find the sum of 0.123 ? 103 and 0.456 ? 102 and write the result in three significant digits

Coincidences : Astonishing concurrence of the events, perceived as meaningfully related, with no apparent causal connection. Such type of events abounds in everyday life and is oft