Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

#titleassignment, I want to get the quotation of my on-line assignment its ...

I want to get the quotation of my on-line assignment its based on 1000 words. lecturer provide the video links and we have to watch the videos and highlights the key points also de

Direct edacyclic graph, Formal graphical representation of the "causal diag...

Formal graphical representation of the "causal diagrams" or the "path diagrams" where the  relationships are directed but acyclic (that is no feedback relations allowed). Plays an

Cauchy integral, Cauchy integral : The integral of the function, f (x), fro...

Cauchy integral : The integral of the function, f (x), from a to b are de?ned in terms of the sum   In the statistics this leads to the below shown inequality for the expecte

Hill-climbing algorithm, Hill-climbing algorithm is  an algorithm which is ...

Hill-climbing algorithm is  an algorithm which is made in use in those techniques of cluster analysis which seek to find the partition of n individuals into g clusters by optimizin

Best subsets regression, In the time series plot and scatter graphs there w...

In the time series plot and scatter graphs there were many outliers that were clearly visible. These have been removed to identify if they were influential or had high leverage and

Hanging rootogram, Hanging rootogram is   he diagram comparing the observe...

Hanging rootogram is   he diagram comparing the observed rootogram with the ?tted curve, in which dissimilarities between the two are displayed in relation to the horizontal axis,

Over dispersion, Over dispersion is the phenomenon which occurs when empir...

Over dispersion is the phenomenon which occurs when empirical variance in the data exceeds the nominal variance under some supposed model. Most often encountered when the modeling

Markov chains.., a shop is selling laptops at regular price and at half pri...

a shop is selling laptops at regular price and at half price.If the laptops are regular price a day they will be at regular price tha day after with proba 2/3, if the laptops are a

Public network, This is given by common network e.g. Phone Company. The pub...

This is given by common network e.g. Phone Company. The public networks are those networks, which are given by common carriers. It can be a telephone company or an other organizati

Daycare, facts and statistics about daycare

facts and statistics about daycare

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd