Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Explain household interview surveys, Household interview surveys : The surv...

Household interview surveys : The surveys in which the primary sampling units are typically geographic regions such as nations or cities. For each such unit sampled, there are addi

Complier average causal effect (cace), Complier average causal effect (CACE...

Complier average causal effect (CACE): The treatment effect amid true compliers in the clinical trial. For the suitable response variable, the CACE is given by the difference in o

Draw histogram of income, The skewness is a measure of asymmetry and as it ...

The skewness is a measure of asymmetry and as it is positive at 4.29, it is greater than zero which reveals that the tail extends to the right indicating the distribution to be mor

January 2015 Take-Home Assignment, 3. a. A researcher in Hong Kong computes...

3. a. A researcher in Hong Kong computes the correlation between the percentage of employee turnover and the local unemployment rate (also expressed as a percentage) over a 20-mont

Explain o. j. simpson paradox, O. J. Simpson paradox is a term coming from...

O. J. Simpson paradox is a term coming from the claim made by the defence lawyer in murder trial of O. J. Simpson. The lawyer acknowledged that the statistics demonstrate that onl

Explain human height growth curves, Human height growth curves : The growth...

Human height growth curves : The growth of human height is, in common, remarkably regular, apart from the pubertal growth spurt. The satisfactory longitudinal development curve is

T-test , Ha: If hyperlipidemia is believed to be a side effect of second-ge...

Ha: If hyperlipidemia is believed to be a side effect of second-generation antipsychotics (SGAs), then Hispanic patients with SGAs treatment will have the higher frequency of devel

Math, A statewide survey of 1,706 California adults’ residents include the ...

A statewide survey of 1,706 California adults’ residents include the following question: would you favor or oppose providing a path to citizenship for illegal immigrants in the U.S

Morbidity, Morbidity is the term used in the epidemiological studies to de...

Morbidity is the term used in the epidemiological studies to describe sickness in the human populations. The WHO Expert Committee on the Health Statistics noted in its sixth repor

Explain regression through the origin, Regression through the origin : In s...

Regression through the origin : In some of the situations a relationship between the two variables estimated by the regression analysis is expected to pass by the origin because th

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd