Data reduction, Applied Statistics

Assignment Help:

The PCA is amongst the oldest of the multivariate statistical methods of data reduction. It is a technique for simplifying a dataset, by reducing multidimensional datasets to lower dimensions for analysis. It produces a small number of derived variables that are uncorrelated and that account for most of the variation in the original data set.'By reducing the number of variables'in this way, we can understand the underlying structure of the data. 'The derived variables are combinations of the original variables. For example, it might be that students take I0 examinations and some students do well in one examination while other students do better in another. It is difficult to compare one student with another when we have 10 marks to consider. One obvious way of comparing students is to calculate the mean score.

This is a constructed combination of the existing variables. However, one might get a more useful comparison of overall performances by considering other constructed cwbinations of the 10 exam marks. The PCA is one way of constructing such combinations, doing so in such a way as to account fer the maximum possible variation in the original data. We can then compare students' performance by considering this much smaller number of variables.

PCA states and then solves a well-defined statistical problem, and except for special cases always gives a unique solution wi.th some very nice mathematical properties. We can even describe some very artificial practical problems for which PCA provides the exact solution. The difficulty comes in trying to relate PCA to real-life scientific problems; the match is simply not very good. Actually PCA often provides a good approximation to common factor analysis, but that feature is now unimportant since both methods are now easy enough.


Related Discussions:- Data reduction

Critique 2, prepare a critical analysis of a quantitative study focusing on...

prepare a critical analysis of a quantitative study focusing on protection of human participants data collection data management and analysis problem statement and interpretation o

Determine the maximum process variability, You are going to purchase a part...

You are going to purchase a part from a specialty vendor.  Your company needs a C p of at least 1.67 on a critical dimension of the part.  The dimensional specification for this p

Median, Median Median is a position average. It is the value of middle ...

Median Median is a position average. It is the value of middle item of a variable when the items are arranged according to their values either in ascending or descending order.

Penman-monteith method, (a) Average rainfall during the month of January...

(a) Average rainfall during the month of January is found to be 58 mm. A Class A pan evaporation recorded an average of 8.12 mm/day near an irrigation reservoir. The average

Determine the subset of variables, Agency revenues. An economic consultant ...

Agency revenues. An economic consultant was retained by a large employment agency in a metropolitan area to develop a regression model for predicting monthly agency revenues ( y ).

Multiple regression analysis, Complete the multiple regression model using ...

Complete the multiple regression model using Y and your combined X variables.  State the equation.  Next, make sure that you evaluate overall model performance with the Anova table

Ryan-joiner - normal probability plot, The Null Hypothesis - H0:  The rando...

The Null Hypothesis - H0:  The random errors will be normally distributed The Alternative Hypothesis - H1:  The random errors are not normally distributed Reject H0: when P-v

Option price binomial tree, Modify your formulas from (1) to compute the pr...

Modify your formulas from (1) to compute the price at time 0 of an American put option with the same contract speci cations in the binomial model. Report the price of the American

Compute the sample mean and sample standard deviation, We want to investiga...

We want to investigate the income data.  In the Excel file Midterm  Data.xls there is a tab labeled "Income Data 2006".  The data in the tab is the income reported by 400 people in

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd