Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Hypotheses, a company suppliers specialized, high tensile Pins to customers...

a company suppliers specialized, high tensile Pins to customers. It uses an automatic lathe to produce the pins. Due to the factors such as vibration, temperature and wear and tear

Weighted least squares, Weighted least squares  is the method of estimation...

Weighted least squares  is the method of estimation in which the estimates arise from minimizing the weighted sum of squares of the differences between response variable and its pr

Imprecise probabilities, Imprecise probabilities is a n approach used by s...

Imprecise probabilities is a n approach used by soft techniques in which uncertainty is represented by the closed, convex sets of probability distributions and the probability of

Define kappa coefficient, Kappa coefficient : The chance corrected index of...

Kappa coefficient : The chance corrected index of the agreement between, for instance, judgements and diagnoses made by the two raters. Calculated as the ratio of the noticed exces

Explain perturbation theory, Perturbation theory : The theory useful in ass...

Perturbation theory : The theory useful in assessing how well a specific algorithm or the statistical model performs when the observations suffer less random changes. In very commo

Implementation of huffman coding, Input to the compress is a text le with a...

Input to the compress is a text le with arbitrary size, but for this assignment we will assume that the data structure of the file fits in the main memory of a computer. Output of

Factor, The term used in a variety of methods in statistics, but mostly to ...

The term used in a variety of methods in statistics, but mostly to refer to the categorical variable, with a less number of levels, under examination in an experiment as a possible

Dot plot, The more effective display than a number of other methods or tech...

The more effective display than a number of other methods or techniques, for instance, pie charts and bar charts, for displaying the quantitative data which are labeled. An instanc

Principal factor analysis, Principal factor analysis is the method of fact...

Principal factor analysis is the method of factor analysis which is basically equivalent to a principal components analysis performed on reduced covariance matrix attained by repl

Bartlett decomposition, Bartlett decomposition : The expression for the ra...

Bartlett decomposition : The expression for the random matrix A which has a Wishart distribution as the product of the triangular matrix and the transpose of it. Letting each of x

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd