Data reduction, Applied Statistics

The PCA is amongst the oldest of the multivariate statistical methods of data reduction. It is a technique for simplifying a dataset, by reducing multidimensional datasets to lower dimensions for analysis. It produces a small number of derived variables that are uncorrelated and that account for most of the variation in the original data set.'By reducing the number of variables'in this way, we can understand the underlying structure of the data. 'The derived variables are combinations of the original variables. For example, it might be that students take I0 examinations and some students do well in one examination while other students do better in another. It is difficult to compare one student with another when we have 10 marks to consider. One obvious way of comparing students is to calculate the mean score.

This is a constructed combination of the existing variables. However, one might get a more useful comparison of overall performances by considering other constructed cwbinations of the 10 exam marks. The PCA is one way of constructing such combinations, doing so in such a way as to account fer the maximum possible variation in the original data. We can then compare students' performance by considering this much smaller number of variables.

PCA states and then solves a well-defined statistical problem, and except for special cases always gives a unique solution wi.th some very nice mathematical properties. We can even describe some very artificial practical problems for which PCA provides the exact solution. The difficulty comes in trying to relate PCA to real-life scientific problems; the match is simply not very good. Actually PCA often provides a good approximation to common factor analysis, but that feature is now unimportant since both methods are now easy enough.

Posted Date: 4/4/2013 3:43:13 AM | Location : United States







Related Discussions:- Data reduction, Assignment Help, Ask Question on Data reduction, Get Answer, Expert's Help, Data reduction Discussions

Write discussion on Data reduction
Your posts are moderated
Related Questions
Construct your initial multivariate model by selecting a dependent variable Y and two independent variables X. Clearly define what each variable represents and how this relates t

Question: (a) A normal distribution is thought to have a mean of 50. A random sample of 100 gave a mean of 52.6 and a standard deviation of 14.5. A significance test was carri

The following data on calcium content of wheat are consistent with summary quantities that appeared in the article “Mineral Contents of Cereal Grains as Affected by Storage and Ins

Disadvantages For calculating median it is necessary to arrange the data; other averages do not need any arrangement. Since it is a positional average, its value is not d

The management at Superior Health Care System Incorporated recently purchased several new facilities including the central patient information management center. This purchase will

it is said that management is equivalent to decision making? do you agree? explain

Consider the following game: (a) If (top, left) is a Weakly Dominant Strategy Equilibrium, then what inequalities must hold among (a, ..., h)? (b) If (top, left) is a Na

This probability rule determined by the research of the two mathematicians Bienayme' and Chebyshev, explains the variability of data about its mean when the distribution of the dat

a. How can break-even analysis be used in selecting a new plant site? b. What are potential advantages and disadvantage of locating a production facility in foreign country i

Chi-square analysis can be used with both Goodness-of-Fit Tests and with Tests for Independence. There are specific instances when each test should be used based on the information