Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Business forcastin.., elements , importance, limitation, and theories

elements , importance, limitation, and theories

Minimum volume ellipsoid, Minimum volume ellipsoid is a term for ellipsoid...

Minimum volume ellipsoid is a term for ellipsoid of the minimum volume which covers some specified proportion of the set of multivariate data. It is commonly used to construct rob

Canonical correlation analysis, Canonical correlation analysis : A process ...

Canonical correlation analysis : A process of analysis for investigating the relationship between the two groups of variables, by ?nding the linear functions of one of the sets of

Biplots, Biplots: It is the multivariate analogue of the scatter plots, wh...

Biplots: It is the multivariate analogue of the scatter plots, which estimates the multivariate distribution of the sample in a few dimensions, typically two and superimpose on th

Reciprocal transformation, Reciprocal transformation is a transformation o...

Reciprocal transformation is a transformation of the form y =1/x, which is specifically useful for certain types of variables. Resistances, for instance, become conductances, and

Regression analysis, The regression analysis is used to fit a model descr...

The regression analysis is used to fit a model describing the relationship of a dependent variable with independent variable(s). Here we have fitted three regression models:

L''abbe ´ plot, L'Abbe ´ plot is often used in the meta-analysis of the cl...

L'Abbe ´ plot is often used in the meta-analysis of the clinical trials where the result is the binary response of it. The event risk (number of events/number of the patients in a

Implementation of huffman coding, Input to the compress is a text le with a...

Input to the compress is a text le with arbitrary size, but for this assignment we will assume that the data structure of the file fits in the main memory of a computer. Output of

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd