Explain your understanding of eigen vectors and eigen values

Assignment Help Other Subject
Reference no: EM132320219

Part A

Objective:

The purpose of this project is to provide you with an opportunity to demonstrate an advanced level of synthesis, understanding and communication of the concepts, statistical methods and practical analyses within R that you have learnt throughout this course.

Please remember that this course is a postgraduate level course which requires that students demonstrate an advanced level of knowledge, skills, reasoning and problem-solving. Also, this project is a significant assessment item worth 50% of your final grade. As such, you should expect to find it challenging and expect to spend considerable time working on it. I encourage you to start as soon as possible. You do not need to have completed all the course work and topics to make a start on becoming familiar with the data.

The Data:

This data comes from a water quality study [1] where samples were taken from sites on different European rivers during a period of approximately one year. These samples were analysed for various chemical substances and, in parallel, algae samples were collected to determine the algae population distributions for seven algal species.

The impact on the environment of toxic waste, from a wide variety of manufacturing processes, is well known. It has also become clear that the subtler effects of nutrient level and chemical balance changes arising from farming land run-off and sewage water treatment also have a serious, but indirect, effect on the states of rivers, lakes and even the sea. Research has focused on the relationships among groups of chemical variables, among algae species and between both chemical and biological variables. The influence of season, river size and river flow rates are also considered important.

There are a total of 200 cases (rows/rivers) each containing 18 values (columns/variables). The first 3 variables in the data set are the season, river size, and fluid velocity of the river. The next eight variables are the chemical concentrations which should be relevant for the algae distribution: pH (measured on of scale of 1 to 14) and nitrogen, nitrates, nitrites, ammonia, phosphate, oxygen and chloride (all measured in mg/L).

The last seven values of each row are the amount of different kinds of algae. These 7 kinds are only a very small part of the whole algae community. The value 0.0 means that the frequency is very low. The data set also contains some missing data which are labelled with the string XXXXX.

A consultancy firm has asked you to explore this data and address three specific aspects of interest (Tasks 1, 2 and 3 below) for their client, and then report your process (what you have done and why) and findings in a written report. Before beginning the Tasks, you may need to do some data cleaning due to missing data or outliers. All analysis for the following tasks should be based on your cleaned data set and the structure/characteristics of your final data set should be well defined.

For this exercise assume that the data meets any required MVN assumptions (do not test for UVN or MVN).

Reference:

[1] Lichman, M. (2013). UCI Machine Learning Repository

The Tasks:

Task 1 : The client would like to know the number of rivers in thesample after cleaning. In addition, the number of rivers measured in each season, and each river size is required and some appropriate summary statistics/plots for each of the 8 chemical variables individually.

Action: Clean the data as you think necessary and then provide a frequency table of the number of rivers measured in each season, and then each river size. Determine appropriate summary statistics for each of the 8 chemical variables and the best way to present this information. Interpret interesting aspects of this data summary.

They would also like to know what the relationships are between the combination of river size & velocity based on the chemical variables. Which groups are most similar, and which are most different?

Action: First, create a new variable called ‘river_size_vel' with categories that are combinations of the 3 River_Size and 3 River_Vel categories and provide a frequency table of the number of rivers in each new category. To show the multivariate relationships among the categories of the new ‘river_size_vel' variable present the dendrogram and an MDS plot that best represent the relationships - as part of your interpretation explain what types of distances and clustering methods you have used and why.

Task 2 : The client would like to know if there are significantdifferences among the four seasons in terms of average river health as indicated by all of the chemical and algae variables?

Action: Select the best method (from those covered in this course only) to explore this question, perform the analysis and interpret. Include in your answer appropriate p-values for all significance tests performed.

Task 3 : The client would like to know if the season can be predictedbased on all chemical and algae variables. The client is only interested in data related to Autumn, Spring and Summer?

Action: Select the best method (from those covered in this course only) to explore this question, perform the analysis, explain all relevant details of your process and interpret the results.

Part B:

Include your responses to these Part B questions at the end of your Part A pdf submission (i.e. only one pdf file to be submitted for this whole project assessment item)

Question 1 :

Recreate and complete the table below by indicating which features are relevant to each method.

Feature

MANOVA

PCA

FA

DFA

(CA

CA

NIDS

Eigen analysis

 

 

 

 

 

 

 

Distance matrix

 

 

 

 

 

 

 

Data/Dimension reduction

 

 

 

 

 

 

 

Classification

 

 

 

 

 

 

 

Can be used to Identify group structure/clusters

 

 

 

 

 

 

 

Need independent a priori categorical variable(s)

 

 

 

 

 

 

 

Ordination method

 

 

 

 

 

 

 

Question 2 :

Construct, by hand, a simple nearest-neighbour dendrogram from the distance matrix below. Do not produce the dendrogram in R. Use the distances to ‘sketch' the relationships

1 2 3 4
2 1.912370

3 5.382450 7.120542
4 3.385996 5.059430 2.138709
5 1.512238 3.190303 4.575420 2.910661

Question 3 :

Calculated by hand the Euclidian distance between individuals 1 and 2 for variables X1 and X2. Show all working.

 

X1

X2

1

-0.46

-0.46

2

-1.41

-1.79

3

1.78

1.48

4

0.60

0.55

5

0.13

0.31

Question 4 :

What are some limitations or disadvantages of multivariate methods generally? (no more than 300 words)

Question 5 :

Explain your understanding of eigen vectors and eigen values (your answer must be in your own words and will be checked using a plagiarism checker) (no more than 300 words)

Question 6 :

Based on the Parallel Analysis table below, how many factors would you interpret? Explain you answer.

Factor

Actual eigen value

95th percentile

 

 

 

1

2.45

1.99

 

 

 

2

1.98

1.89

 

 

 

3

1.13

1.14

 

 

 

4

1.02

1.08

 

 

 

5

0.89

1.03

 

 

 

Reference no: EM132320219

Questions Cloud

Unique minimum spanning tree : Let G = (V,E) be a connected undirected graph with distinct edge weights. Prove that G has a unique minimum spanning tree.
Describe what the function does-hint : Create an English statement to describe what the function does-hint, try running the program in the terminal.
Draw a nfa that recognizes exactly the language : For each of the following, draw a NFA that recognizes exactly the language described.
Dijkstra algorithm produces incorrect answers : Give a simple example of a directed graph with negative-weight edges for which Dijkstra's algorithm produces incorrect answers.
Explain your understanding of eigen vectors and eigen values : What are some limitations or disadvantages of multivariate methods generally? Explain your understanding of eigen vectors and eigen values?
Limitation and vulnerabilities of the mobile payment system : What is the limitation and vulnerabilities of the mobile payment system? Are there any examples for details?
Which processor is installed : On your home or lab computer, use UEFI/BIOS setup to answer these questions:
Analysis of HRM-related issues and their solutions : Focus of report: Analysis of HRM-related issues and their solutions. Identify the most contentious issues, develop solutions and submit a report
What is scope creep : Is scope creep inevitable? That is, is it normal? Will we always (or at least, usually) have to deal with scope creep on any given project?

Reviews

len2320219

6/11/2019 3:55:53 AM

Please read this document fully and carefully. There are two parts to this assessment. Part A is the project analysis of data as described in detail on pages 2-4 of this document. Your submission will be no more than 6-9 pages for Part A. Part B is the completion of a set of 4 questions on page 5-6 of this document. Additionally, you will submit one R script file for your work in Part A.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd