Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Assignment, Different approaches to the study of early indian history

Different approaches to the study of early indian history

Describe jonckheere terpstra test, Jonckheere Terpstra test  is the test fo...

Jonckheere Terpstra test  is the test for detecting particular types of departures from the independence in a contingency table in which both the row and column categories contain

Function of Power, In an experiment, power is a function of 1. The number o...

In an experiment, power is a function of 1. The number of variables being measured and the beta level 2. The effect size, internal validity and the beta level 3. The number of part

Clinical vs. statistical significance, Clinical vs. statistical significanc...

Clinical vs. statistical significance : The distinction among results in terms of their possible clinical importance rather than simply in terms of their statistical importance. Wi

Bartlett decomposition, Bartlett decomposition : The expression for the ra...

Bartlett decomposition : The expression for the random matrix A which has a Wishart distribution as the product of the triangular matrix and the transpose of it. Letting each of x

Cluster sampling, Cluster sampling : A method or technique of sampling in w...

Cluster sampling : A method or technique of sampling in which the members of the population are arranged in groups (called as 'clusters'). A number of clusters are selected at the

Student, the problem that demonstrates inference from two dependent samples...

the problem that demonstrates inference from two dependent samples uses hypothetical data from TB vaccinations and the number of new cases before and after vaccinations for cases o

Multiple correlation coefficient, Multiple correlation coefficient is th...

Multiple correlation coefficient is the correlation among the observed values of dependent variable in the multiple regression, and the values predicted by estimated regression

Resentful demoralization, Resentful demoralization is the possible phenome...

Resentful demoralization is the possible phenomenon in the clinical trials and intervention studies in which comparison groups not attaining a perceived desirable treatment become

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd