Outliers - reasons for screening data, Advanced Statistics

Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.

Related Discussions:- Outliers - reasons for screening data

Fuzzy set theory, A radically different approach of dealing with the uncert...

A radically different approach of dealing with the uncertainty than the traditional probabilistic and the statistical methods. The necessary feature of the fuzzy set is a membershi

Direct edacyclic graph, Formal graphical representation of the "causal diag...

Formal graphical representation of the "causal diagrams" or the "path diagrams" where the relationships are directed but acyclic (that is no feedback relations allowed). Plays an

Cohort study, Cohort study : An investigation in which the group of individ...

Cohort study : An investigation in which the group of individuals (or the cohort) is identi?ed and followed prospectively, possibly for many years, and their subsequent medical his

Geographical analysis machine, Geographical analysis machine is the proced...

Geographical analysis machine is the procedure designed to detect the clusters of rare diseases in a particular area. Circles of fixed radii are created at each point of the squar

Regression analyze, I do have a data of real gdp for each state and from 20...

I do have a data of real gdp for each state and from 2000 to 2010 and I also have estimated population of illigel immigrants for each state from 2000 to 2010. In my thesis I am try

Explain post stratification adjustment, Post stratification adjustmen t: On...

Post stratification adjustmen t: One of the most often used population weighting adjustments used in the complex surveys, in which weights for the elements in a class are multiplie

Odds ratio, Odds ratio is the ratio of the odds for the binary variable in...

Odds ratio is the ratio of the odds for the binary variable in two groups of the subjects, such as, males and females. If the two possible states of variable are labeled as 'succe

Data mining, The non-trivial extraction of implicit, earlier unknown and po...

The non-trivial extraction of implicit, earlier unknown and potentially useful information from data, specifically high-dimensional data, using pattern recognition, artificial inte

TIME SERIES, moving and semi average method graphical reprsentation

moving and semi average method graphical reprsentation

Explain Genstat, Genstat: The basic purpose piece of statistical software ...

Genstat: The basic purpose piece of statistical software for the management and the analysis of data. The package incorporates the wide variety of data handling events and a wi

Write Your Message!

Name

Email id

Message

Verfication Code

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

Submit Assignment

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd