Data reduction, Applied Statistics

The PCA is amongst the oldest of the multivariate statistical methods of data reduction. It is a technique for simplifying a dataset, by reducing multidimensional datasets to lower dimensions for analysis. It produces a small number of derived variables that are uncorrelated and that account for most of the variation in the original data set.'By reducing the number of variables'in this way, we can understand the underlying structure of the data. 'The derived variables are combinations of the original variables. For example, it might be that students take I0 examinations and some students do well in one examination while other students do better in another. It is difficult to compare one student with another when we have 10 marks to consider. One obvious way of comparing students is to calculate the mean score.

This is a constructed combination of the existing variables. However, one might get a more useful comparison of overall performances by considering other constructed cwbinations of the 10 exam marks. The PCA is one way of constructing such combinations, doing so in such a way as to account fer the maximum possible variation in the original data. We can then compare students' performance by considering this much smaller number of variables.

PCA states and then solves a well-defined statistical problem, and except for special cases always gives a unique solution wi.th some very nice mathematical properties. We can even describe some very artificial practical problems for which PCA provides the exact solution. The difficulty comes in trying to relate PCA to real-life scientific problems; the match is simply not very good. Actually PCA often provides a good approximation to common factor analysis, but that feature is now unimportant since both methods are now easy enough.

Posted Date: 4/4/2013 3:43:13 AM | Location : United States







Related Discussions:- Data reduction, Assignment Help, Ask Question on Data reduction, Get Answer, Expert's Help, Data reduction Discussions

Write discussion on Data reduction
Your posts are moderated
Related Questions
In 120 tosses of a coin, 45 heads and 75 tails are observed. Is this a balanced coin? Use a=0.05. (Follow the basic steps of hypothesis testing)

Show how the Normal bin width rule can be modi ed if f is skewed or kurtotic. Examine the eff ect of bimodality. Compare your rules to Doane's (1976) extensions of Sturges' rule.

What is an interaction? Describe an example and identify the variables within your population (work, social, academic, etc.) for which you might expect interactions?

The manager of Pizza Hut provides a delivery service for customers who telephone in an order. The manager would like to give callers an idea of the time it will take to deliver an

What type of correlation coefficient would you use to examine the relationship between the following variables? Explain why you have selected the correlation coefficients. A. Re

Mode The mode is the value which occurs most frequently in a set of observations on the point of maximum frequency and around which other items of the set cluste

Using Chi Square Test when more than two Rows are Present   To understand this, let us consider the contingency table shown below. It gives us the information about the stage


The quick method for a confidence interval for a proportion uses as an approximation for a 95% confidence interval.  The margin of error in this case is slightly larger tha

Simple Linear Regression One calculate of the risk or volatility of an individual stock is the standard deviation of the total return (capital appreciation plus dividends) over