Data reduction, Applied Statistics

The PCA is amongst the oldest of the multivariate statistical methods of data reduction. It is a technique for simplifying a dataset, by reducing multidimensional datasets to lower dimensions for analysis. It produces a small number of derived variables that are uncorrelated and that account for most of the variation in the original data set.'By reducing the number of variables'in this way, we can understand the underlying structure of the data. 'The derived variables are combinations of the original variables. For example, it might be that students take I0 examinations and some students do well in one examination while other students do better in another. It is difficult to compare one student with another when we have 10 marks to consider. One obvious way of comparing students is to calculate the mean score.

This is a constructed combination of the existing variables. However, one might get a more useful comparison of overall performances by considering other constructed cwbinations of the 10 exam marks. The PCA is one way of constructing such combinations, doing so in such a way as to account fer the maximum possible variation in the original data. We can then compare students' performance by considering this much smaller number of variables.

PCA states and then solves a well-defined statistical problem, and except for special cases always gives a unique solution wi.th some very nice mathematical properties. We can even describe some very artificial practical problems for which PCA provides the exact solution. The difficulty comes in trying to relate PCA to real-life scientific problems; the match is simply not very good. Actually PCA often provides a good approximation to common factor analysis, but that feature is now unimportant since both methods are now easy enough.

Posted Date: 4/4/2013 3:43:13 AM | Location : United States







Related Discussions:- Data reduction, Assignment Help, Ask Question on Data reduction, Get Answer, Expert's Help, Data reduction Discussions

Write discussion on Data reduction
Your posts are moderated
Related Questions
In a study of outcomes for patients who had been in the Intensive care Unit (ICU) at a large hospital, the records from last 150 patients who had been in the ICU for more than one

Consider the following new business venture. An agent is considering investment in one of three real estate parcels: • Option 1: multiunit rentals • Option 2: commercial building

What would be the cutoff score to indicate a score that is in the top 15% of the scores on a test with a mean of 100 and a standard deviation of 15? This question has multiple p

Question: (a) (i) Define the term multicollinearity. (ii) Explain why it is important to guard against multicollinearity. (b) (i) Sometimes we encounter missing values

The file Midterm Data.xls has a tab labeled "Many vs. S&P" which presents historical price data for several assets, a volatility condition (VIDX = 1 if the NYSE volatility is grea

You want to know the thoughts of air travelers in fields such as tickets, comffort, safety, securuty, services and economic growth. You are given a database and 20 questions to ask

Zinc is a trace element and it is important in wound healing, building up the immune system and DNA synthesis. The data in Table 1 represents the zinc intake (in milligrams) for a

Where do I Access the gss04student_corrected dataset

Your organization purchases bottles of a popular commercial solvent for resale.  Each bottle is labeled as containing 32 fluid ounces of the solvent.  Your cont

discuss the mathematical test of adequacy of index number of formulae. prove algebraically that the laspeyre, paasche and fisher price index formulae satisfies this test. What is