Outliers - reasons for screening data, Advanced Statistics

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Posted Date: 3/4/2013 6:22:24 AM | Location : United States







Related Discussions:- Outliers - reasons for screening data, Assignment Help, Ask Question on Outliers - reasons for screening data, Get Answer, Expert's Help, Outliers - reasons for screening data Discussions

Write discussion on Outliers - reasons for screening data
Your posts are moderated
Related Questions
Given: There are 4 jobs and 4 persons. The cost incurred for each person and each job is as follows: Persons Job 1 Job 2 Job 3 Job 4 A 10 9 21 11 B 15 12 25 17 C 12 10 20 12 D 17

Prognostic scoring system is a technique of combining the prognostic information contained in the number of threat factors, in a manner which best predicts each patient's risk of

A standard IQ test has a mean of 98 and a standard deviation of 16. We want to be 99% certain that we are within 8 IQ points of the true mean. Determine the sample size

How has quantitative analysis changed the current scenario in the management world today?

Paired availability design  is a design which can lessen selection bias in the situations where it is not possible to use random allocation of the subjects to treatments. The desig

Missing Data - Reasons for screening data In case of any missing data, the researcher needs to conduct tests to ascertain that the pattern of these missing cases is random.

importance of mathamatical expection in business

Records on the computer manufacturing process at Pratt-Zungia Limited show that the percentage of defective computers sent to  customers has been 5% over the last few years. Shipme

Interval-censored observations are the  observations which often occur in the context of studies of time elapsed to the particular event when subjects are not monitored regularl

Committees to monitor the accumulating data from the clinical trials. Such committees have chief responsibilities for ensuring the continuing safety of the trial participants, rele