Missing data - reasons for screening data, Advanced Statistics

Missing Data - Reasons for screening data

In case of any missing data, the researcher needs to conduct tests to ascertain that the pattern of these missing cases is random.

Create dichotomous variable - non-missing vs missing for a specific variable. Run a simple independent samples t-test on a different variable in the collected sample to see if there are any significant differences.

Handling missing values:

1. Delete missing data (good idea if there are only a few missing cases)

2. Delete variables containing missing values (good idea if most of the missing values are concentrated to only a couple of variables. Still problematic if they are important to the ultimate goal of the research)

3. Estimate missing values

4. Prior knowledge

5. Replace missing values with the mean (main concern: lowers the calculated variance as compared to the unknown actual variance)
One variation involves using group means for missing values for cases involving group comparison analysis

6. Regression approach: use several IVs to explain the DV (that includes several missing values). Predict missing values using IV values.

7. Concerns include finding proper IVs that explain DV, estimates obtained from prediction more consistent with the scores used to predict them compared to the real values.

8. When we use any of the techniques described above, as a researcher we have to ascertain that our solution hasn't changed the results of the analysis (run the tests, with and without the treatment).

Posted Date: 3/4/2013 6:07:24 AM | Location : United States







Related Discussions:- Missing data - reasons for screening data, Assignment Help, Ask Question on Missing data - reasons for screening data, Get Answer, Expert's Help, Missing data - reasons for screening data Discussions

Write discussion on Missing data - reasons for screening data
Your posts are moderated
Related Questions
The particular projection which an investigator believes is most likely to give an accurate prediction of the future value of some process. Commonly used in the context of the anal

Paired samples are the two samples of the observations with the characteristic feature with each of the observation in one sample have only one matching observation in the other s

Biplots: It is the multivariate analogue of the scatter plots, which estimates the multivariate distribution of the sample in a few dimensions, typically two and superimpose on th

The values assigned to factors for the individual sample units in a factor analysis. The most common approach is "regression method". When the factors are seen as the random variab

Genetic algorithms: The optimization events motivated by the biological analogies. The prime idea is to try to mimic the 'survival of the fittest' rule of the genetic mutation in

Confidence interval : A range of the values, calculated from the sample observations which is believed, with the particular probability, to posses the true parameter value. A 95% c

Bayesian network : It is essentially an expert system in which the uncertainty is dealt with using the conditional probabilities and Bayes' Theorem. Formally such type of network c

Multilevel models are the regression models for the multilevel or clustered data where units i are nested in the clusters j, for example a cross-sectional study where students are

The statistical methods for estimation and inference which are based on a function of sample observations, probability distribution of which does not rely upon a complete speci?cat

ACC – A pioneer in the Indian cement industry Associated Cement Companies Ltd. (ACC) came into existence in 1936, after the merger of 10 companies belonging to four important bus