Outliers - reasons for screening data, Advanced Statistics

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Posted Date: 3/4/2013 6:22:24 AM | Location : United States







Related Discussions:- Outliers - reasons for screening data, Assignment Help, Ask Question on Outliers - reasons for screening data, Get Answer, Expert's Help, Outliers - reasons for screening data Discussions

Write discussion on Outliers - reasons for screening data
Your posts are moderated
Related Questions
3. a. A researcher in Hong Kong computes the correlation between the percentage of employee turnover and the local unemployment rate (also expressed as a percentage) over a 20-mont

Quantile regression is an extension of the classical least squares from estimation of the conditional mean models to the estimation of the variety of models for many conditional q

The problem that the studies are not uniformly probable to be published in the scientific journals. There is evidence that the statistical significance is a main determining factor

Regression to the mean is the procedure first noted by Sir Francis Galton that 'each peculiarity in man is shared by his kinsmen, but on average to the less degree.' Hence the ten

Command-Line options Compression: C++:  ./compress  -f  myfile.txt  [-o  myfile.hzip  -s Java:  sh  compress.sh  -f  myfile.txt  [-o  myfile.hzip  -s] Decompression:

The procedure which targets to use the health and health-related data which precede diagnosis and/or confirmation to identify possible outbreaks of the disease, mobilize a rapid re

The functions of the data and the parameters of interest which can be brought in use to conduct inference about the parameters when full distribution of the observations is unknown

Information theory: This is the branch of applied probability theory applicable to various communication and signal processing problems in the field of engineering and biology. In

Completeness : A term applied to a statistic t when there is only one function of that the statistic which can have the given expected value. If, for instance, the one function of

Mean-range plot   is the graphical tool or device useful in selecting a transformation in the time series analysis. The range is plotted against the mean for each of the seasona