Outliers - reasons for screening data, Advanced Statistics

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Posted Date: 3/4/2013 6:22:24 AM | Location : United States







Related Discussions:- Outliers - reasons for screening data, Assignment Help, Ask Question on Outliers - reasons for screening data, Get Answer, Expert's Help, Outliers - reasons for screening data Discussions

Write discussion on Outliers - reasons for screening data
Your posts are moderated
Related Questions
What is statistical inference?   Statistical inference can be defined as the  method of drawing conclusions from data which are subject to random variations. This is based o

Suppose the graph G is n-connected, regular of degree n, and has an even number of vertices. Prove that G has a one-factor. Petersen's 2-factor theorem (Theorem 5.40 in the note

when there is tie in sequencing then what we do

This process of estimating from a data set those values lying beyond range of the data. In the regression analysis, for instance, a value of the response variable might be estimate

Multivariate analysis of variance is the procedure for testing equality of the mean vectors of more than two populations for the multivariate response variable. The method is dire

Hello-goodbye effect : The phenomenon initially described in psychotherapy research, but one which might arise whenever a subject is assessed on two occasions, with some interventi

Orthogonal is a term which occurs in several regions of the statistics with different meanings in each case. Most commonly the encountered in the relation to two variables or t

Hamilton County judges try thousands of cases per year. In an overwhelming majority of the cases disposed, the verdict stands as rendered. However, some cases are appeale

Input to the compress is a text le with arbitrary size, but for this assignment we will assume that the data structure of the file fits in the main memory of a computer. Output of

Prognostic scoring system is a technique of combining the prognostic information contained in the number of threat factors, in a manner which best predicts each patient's risk of