Missing data - reasons for screening data, Advanced Statistics

Missing Data - Reasons for screening data

In case of any missing data, the researcher needs to conduct tests to ascertain that the pattern of these missing cases is random.

Create dichotomous variable - non-missing vs missing for a specific variable. Run a simple independent samples t-test on a different variable in the collected sample to see if there are any significant differences.

Handling missing values:

1. Delete missing data (good idea if there are only a few missing cases)

2. Delete variables containing missing values (good idea if most of the missing values are concentrated to only a couple of variables. Still problematic if they are important to the ultimate goal of the research)

3. Estimate missing values

4. Prior knowledge

5. Replace missing values with the mean (main concern: lowers the calculated variance as compared to the unknown actual variance)
One variation involves using group means for missing values for cases involving group comparison analysis

6. Regression approach: use several IVs to explain the DV (that includes several missing values). Predict missing values using IV values.

7. Concerns include finding proper IVs that explain DV, estimates obtained from prediction more consistent with the scores used to predict them compared to the real values.

8. When we use any of the techniques described above, as a researcher we have to ascertain that our solution hasn't changed the results of the analysis (run the tests, with and without the treatment).

Posted Date: 3/4/2013 6:07:24 AM | Location : United States







Related Discussions:- Missing data - reasons for screening data, Assignment Help, Ask Question on Missing data - reasons for screening data, Get Answer, Expert's Help, Missing data - reasons for screening data Discussions

Write discussion on Missing data - reasons for screening data
Your posts are moderated
Related Questions
Cellular proliferation models : Models are used to describe the growth of the  cell populations. One of the example is the deterministic model   where N(t) is the number of cel

Matching is the method of making a study group and a comparison group comparable with respect to the extraneous factors. Generally used in the retrospective studies when selecting

A vague concept which occurs all through statistics. Essentially the term means the number of independent units of the information in an easy relevant to the estimation of the para

Quality-adjusted survival analysis is a method for evaluating the effects of treatment on survival which allows the consideration of quality of life as well as the quantity of lif

Non-response is the term generally used for the failure to give the relevant information being collected in the survey. Poor response can be because of the variety of causes, for

Probability judgements : Human beings often require assessing the probability which some event will occur and accuracy of these probability judgements often determines success of o

Pattern recognition is a term for a technology that recognizes and analyses patterns automatically by machine and which has been used successfully in many areas of application inc

ain why the simulated result doesn''t have to be exact as the theoretical calculation

Multi co linearity is the term used in the regression analysis to indicate situations where the explanatory variables are related by a linear function, making the inference of the

Non linear mapping (NLM ) is a technique for obtaining a low-dimensional representation of the set of multivariate data, which operates by minimizing a function of the differences