Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Direct edacyclic graph, Formal graphical representation of the "causal diag...

Formal graphical representation of the "causal diagrams" or the "path diagrams" where the  relationships are directed but acyclic (that is no feedback relations allowed). Plays an

Per-experiment error rate, Per-experiment error rate is the possibility of...

Per-experiment error rate is the possibility of the incorrectly rejecting at least one null hypothesis or assumption in the experiment including one or more tests or comparisons,

Frequency distribution, The division of a sample of observations into sever...

The division of a sample of observations into several classes, together with the number of observations in each of them.  It acts as a useful summary of the main features of the da

Pre analysis data screening, need answers to questions in book advanced and...

need answers to questions in book advanced and multivariate statistical methods

Mareg, MAREG is the software package for the analysis of the marginal regr...

MAREG is the software package for the analysis of the marginal regression models. The package permits the application of generalized estimating equations and the maximum likelihoo

correlation, i will like to submit my project for you to do on chi-square,...

i will like to submit my project for you to do on chi-square, ANOVA, and correlation and simple regression. how can we do this?

Chi-squared distribution, Chi-squared distribution : It is the probability ...

Chi-squared distribution : It is the probability distribution, f (x), of the random variable de?ned as the sum of squares of the number (v) of independent standard normal variables

Whites general heteroscedasticity test, The Null Hypothesis - H0:  γ 1 = γ...

The Null Hypothesis - H0:  γ 1 = γ 2 = ...  =  0  i.e.  there is no heteroscedasticity in the model The Alternative Hypothesis - H1:  at least one of the γ i 's are not equal

Barrett and marshall model for conception, Barrett and Marshall Model for c...

Barrett and Marshall Model for conception : A biologically reasonable model for the probability of conception in a particular menstrual cycle, which supposes that the batches of sp

Differences-in-differences estimator, The estimator of the group by the tim...

The estimator of the group by the time period interaction in a study in which the subjects in two different groups are observed in two different time periods. Normally one of th

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd