Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

What is the expectation of the number of tosses required, Question 1 A box...

Question 1 A box contains 20 fuses of which 5 are defective If 2 fuses are chosen together at random what is the probability that both the fuses are defective? Question 2 A c

Chapter 7&8, Chapter 7 2. Describe the distribution of sample means (shape...

Chapter 7 2. Describe the distribution of sample means (shape, expected value, and standard error) for samples of n =36 selected from a population with a mean of µ = 100 and a sta

SCATTER DIAGRAM, MEANING ,IMPORTANCE AND RELEAVANCE OF SCATTER DIAGRAM

MEANING ,IMPORTANCE AND RELEAVANCE OF SCATTER DIAGRAM

Median, Median is the value in a set of the ranked observations which divi...

Median is the value in a set of the ranked observations which divides the data into two parts of equal size. When there are an odd number of observations the median is middle v

Sequencing problem, 2 jobs n machines,graphical method,how to determine wh...

2 jobs n machines,graphical method,how to determine which job should proceed first on each machine

Explain prospective studies, Prospective study : The studies in which indiv...

Prospective study : The studies in which individuals are followed-up over the period of time. A general example of this type of investigation is where the samples of individuals ar

Collapsing categories, Collapsing categories : A procedure generally applie...

Collapsing categories : A procedure generally applied to contingency tables in which the two or more row or column categories are combined, in number of cases so as to yield the re

Explain kleiner hartigan trees, Kleiner Hartigan trees is a technique for ...

Kleiner Hartigan trees is a technique for displaying the multivariate data graphically as the 'trees' in which the values of the variables are coded into length of the terminal br

Historigram, difference between histogram and historigram

difference between histogram and historigram

Weathervane plot, Weathervane plot is the graphical display of the multiva...

Weathervane plot is the graphical display of the multivariate data based on bubble plot. The latter is enhanced by the addiction of the lines whose lengths and directions code the

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd