Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Response feature analysis, Response feature analysis is the approach to th...

Response feature analysis is the approach to the analysis of longitudinal data including the calculation of the suitable summary measures from the set of repeated measures on each

Wilcoxon''s ranksum test, Wilcoxon's ranksum test is the distribution free...

Wilcoxon's ranksum test is the distribution free method or technique used as an alternative to the Student's t-test for assessing whether two populations have the same location. G

Factorial designs, Designs which permits two or more questions to be addres...

Designs which permits two or more questions to be addressed in the investigation. The easiest factorial design is one in which each of the two treatments or interventions are p

Odds ratio, Odds ratio is the ratio of the odds for the binary variable in...

Odds ratio is the ratio of the odds for the binary variable in two groups of the subjects, such as, males and females. If the two possible states of variable are labeled as 'succe

Student, the problem that demonstrates inference from two dependent samples...

the problem that demonstrates inference from two dependent samples uses hypothetical data from TB vaccinations and the number of new cases before and after vaccinations for cases o

Nested design, Nested design  is the design in which levels of one or more ...

Nested design  is the design in which levels of one or more factors are subsampled within one or more other factors such that, for instance, each level of a factor B happens at onl

File drawer problem, The problem that the studies are not uniformly probabl...

The problem that the studies are not uniformly probable to be published in the scientific journals. There is evidence that the statistical significance is a main determining factor

Describe meta-analysis, Meta-analysis is the collection of techniques wher...

Meta-analysis is the collection of techniques whereby the results of two or more independent studies are statistically combined to yield the overall answer to a question of intere

Factor, The term used in a variety of methods in statistics, but mostly to ...

The term used in a variety of methods in statistics, but mostly to refer to the categorical variable, with a less number of levels, under examination in an experiment as a possible

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd