Data reduction, Applied Statistics

Assignment Help:

The PCA is amongst the oldest of the multivariate statistical methods of data reduction. It is a technique for simplifying a dataset, by reducing multidimensional datasets to lower dimensions for analysis. It produces a small number of derived variables that are uncorrelated and that account for most of the variation in the original data set.'By reducing the number of variables'in this way, we can understand the underlying structure of the data. 'The derived variables are combinations of the original variables. For example, it might be that students take I0 examinations and some students do well in one examination while other students do better in another. It is difficult to compare one student with another when we have 10 marks to consider. One obvious way of comparing students is to calculate the mean score.

This is a constructed combination of the existing variables. However, one might get a more useful comparison of overall performances by considering other constructed cwbinations of the 10 exam marks. The PCA is one way of constructing such combinations, doing so in such a way as to account fer the maximum possible variation in the original data. We can then compare students' performance by considering this much smaller number of variables.

PCA states and then solves a well-defined statistical problem, and except for special cases always gives a unique solution wi.th some very nice mathematical properties. We can even describe some very artificial practical problems for which PCA provides the exact solution. The difficulty comes in trying to relate PCA to real-life scientific problems; the match is simply not very good. Actually PCA often provides a good approximation to common factor analysis, but that feature is now unimportant since both methods are now easy enough.


Related Discussions:- Data reduction

Statistics to support learning, Scenario: Many of the years 5 and year 6 l...

Scenario: Many of the years 5 and year 6 learners' at Woodlands Park School were excited about being chosen for the cross-country team.  Every day, they were able to run laps of t

Level process control lab, Based on the following graphs (next page) you sh...

Based on the following graphs (next page) you should write a discussion report (2 pages) on: 1. Determination of whether the open-loop system response is consistent with a 1st o

Good measure of quality, Education seems to be a very difficult field in wh...

Education seems to be a very difficult field in which to use quality methods. One possible outcome measures for colleges is the graduation rate (the percentage of the students matr

Confidence interval, a) List down several measures of central tendency and ...

a) List down several measures of central tendency and define the difference among them? b) What do you mean by confidence interval, and why it is useful? What is a confidence lev

Simple linear regression, Simple Linear Regression   While correlati...

Simple Linear Regression   While correlation analysis determines the degree to which the variables are related, regression analysis develops the relationship between the var

Plot diagnostic quantities, The data in the data frame compensation are fro...

The data in the data frame compensation are from Myers (1990), Classical andModern Regression with Applications (Second Edition)," Duxbury. The response y here is executive compens

Principal components analysis, In the context of multivariate data analysis...

In the context of multivariate data analysis, one might be faced with a large number of v&iables that are correlated with each other, eventually acting as proxy of each other. This

Test for equality of two means, Let X 1  and X 2  be two independent po...

Let X 1  and X 2  be two independent populations with population means μ 1  and  μ 2  respectively. Two samples are taken, one from each population, of sizes n 1  and n 2  re

Empirical mode, Empirical Mode Where mode is ill-defined, its value may...

Empirical Mode Where mode is ill-defined, its value may be ascertained by the following formula based upon the empirical relationship between Mean, Median and Mode: Mode = 3

Break-even analysis, a. How can break-even analysis be used in selecting a ...

a. How can break-even analysis be used in selecting a new plant site? b. What are potential advantages and disadvantage of locating a production facility in foreign country i

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd