Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Cycle plot, The graphical method for studying the behavior of the seasonal ...

The graphical method for studying the behavior of the seasonal time series. In such a plot, the January values of seasonal component are graphed for the upcoming years, then the

Queuing, The number of passengers arriving at an airport terminal average 1...

The number of passengers arriving at an airport terminal average 1200 each hour. To process passengers (check in, take luggage, etc) take an average of 6 minutes each. There are

Latin square, Latin square  is an experimental design targeted at removing ...

Latin square  is an experimental design targeted at removing from the experimental error the variation from two extraneous sources so that a more sensitive test of the treatment ef

Gaussian process, The generalization of the normal distribution used for th...

The generalization of the normal distribution used for the characterization of functions. It is known as a Gaussian process because it has Gaussian distributed finite dimensional m

Assignment, Different approaches to the study of early indian history

Different approaches to the study of early indian history

Drug stability studies, The studies conducted in the pharmaceutical industr...

The studies conducted in the pharmaceutical industry to calculate the degradation of the new drug product or an old drug formulated or packaged in the new manner. The main study ob

Linear regression assignment help, Using World Bank (2004) World Developmen...

Using World Bank (2004) World Development Indicators; Washington: International Bank for Reconstruction & Development/ The World Bank, located in the reference section of the Learn

Define probability judgements, Probability judgements : Human beings often ...

Probability judgements : Human beings often require assessing the probability which some event will occur and accuracy of these probability judgements often determines success of o

Define non linear mapping (nlm), Non linear mapping (NLM ) is a technique f...

Non linear mapping (NLM ) is a technique for obtaining a low-dimensional representation of the set of multivariate data, which operates by minimizing a function of the differences

Public network, This is given by common network e.g. Phone Company. The pub...

This is given by common network e.g. Phone Company. The public networks are those networks, which are given by common carriers. It can be a telephone company or an other organizati

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd