Write a function that has as input two row numbers

Assignment Help Applied Statistics
Reference no: EM131918872

Question - The city of Pittsburgh, Pennsylvania, lies where three rivers, the Allegheny, Monongahela and Ohio, meet. It has long been important to build bridges there, to enable its residents to cross the rivers safely. See List of bridges of Pittsburgh Wikipedia page for a listing (with pictures) of the bridges. The data contains detail for a large number of past and present bridges in Pittsburgh. All the variables we will use are categorical.

Here they are:

  • id identifying the bridge (we ignore)
  • river: initial letter of river that the bridge crosses
  • location: a numerical code indicating the location within Pittsburgh (we ignore)
  • erected: time period in which the bridge was built (a name, from CRAFTS, earliest, to MODERN, most recent.
  • purpose: what the bridge carries: foot traffic ("walk"), water (aqueduct), road or railroad.
  • length categorized as long, medium or short.
  • lanes of traffic (or number of railroad tracks): a number, 1, 2, 4 or 6, that we will count as categorical.
  • clear g: whether a vertical navigation requirement was included in the bridge design (that is, ships of a certain height had to be able to get under the bridge). I think G means "yes".
  • t_d: method of construction. DECK means the bridge deck is on top of the construction, THROUGH means that when you cross the bridge, some of the bridge supports are next to you or above you.
  • material the bridge is made of: iron, steel or wood.
  • span: whether the bridge covers a short, medium or long distance.
  • rel_l: Relative length of the main span of the bridge (between the two central piers) to the total crossing length. The categories are S, S-F and F. I don't know what these mean.
  • type of bridge: wood, suspension, arch and three types of truss bridge: cantilever, continuous and simple.

The website SteelConstruction is an excellent source of information about bridges.

(a) The bridges are stored in CSV format. Some of the information is not known and was recorded in the spreadsheet as ?. Turn these into genuine missing values by adding na="?" to your file-reading command. Display some of your data, enough to see that you have some missing data.

(b) The R function complete.cases takes a data frame as input and returns a vector of TRUE or FALSE values. Each row of the data frame is checked to see whether it is "complete" (has no missing values), in which case the result is TRUE, or not (has one or more missing values), in which case the result is FALSE. Add a new column called is complete to your data frame that indicates whether each row is complete. Save the result, and then display (some of) your length column along with your new column. Do the results make sense?

(c) Create the data frame that will be used for the analysis by picking out only those rows that have no missing values. (Use what you have done so far to help you.)

(d) We are going to assess the dissimilarity between two bridges by the number of the categorical variables they disagree on. This is called a "simple matching coefficient", and is the same thing we did in the question about clustering fruits based on their properties. This time, though, we want to count matches in things that are rows of our data frame (properties of two different bridges), so we will need to use a strategy like the one I used in calculating the BrayCurtis distances.

First, write a function that takes as input two vectors v and w and counts the number of their entries that differ (comparing the first with the first, the second with the second, . . . , the last with the last. I can think of a quick way and a slow way, but either way is good.) To test your function, create two vectors (using c) of the same length, and see whether it correctly counts the number of corresponding values that are different.

(e) Write a function that has as input two row numbers and a data frame to take those rows from. The function needs to select all the columns except for id, location and is complete, select the rows required one at a time, and turn them into vectors. (There may be some repetitiousness here. That's OK.) Then those two vectors are passed into the function you wrote in the previous part, and the count of the number of differences is returned. This is like the code in the Bray-Curtis problem. Test your function on rows 3 and 4 of your bridges data set (with the missings removed).

There should be six variables that are different.

(f) Create a matrix or data frame of pairwise dissimilarities between each pair of bridges (using only the ones with no missing values). Use loops, or crossing and map2 int, as you prefer. Display the first six rows of your matrix (using head) or the first few rows of your data frame. (The whole thing is big, so don't display it all.)

(g) Turn your matrix or data frame into a dist object. Do not display your distance object.

(h) Run a cluster analysis using Ward's method, and display a dendrogram. The labels for the bridges (rows of the data frame) may come out too big; experiment with a cex less than 1 on the plot so that you can see them.

(i) How many clusters do you think is reasonable for these data? Draw them on your plot.

(j) Pick three bridges in the same one of your clusters (it doesn't matter which three bridges or which cluster). Display the data for these bridges. Does it make sense that these three bridges ended up in the same cluster? Explain briefly.

Finish Question 8 - d, e, f, g, give me both R code and output.

Attachment:- Assignment Files.rar

Reference no: EM131918872

Questions Cloud

How can sales promotion reinforce a brand image : How can sales promotion reinforce a brand's image? Is this a major objective of sales promotion? Compare sweepstakes, contests, and games in terms.
Biggest employment challenge at organization : In a study of 405 nonprofits? nationwide, 87 indicated that turnover has been the biggest employment challenge at their organization. Complete parts? (a)
Capital budgeting apply to both foreign-domestic operations : How do international factors affect decision making? Although the same basic principles of capital budgeting apply to both foreign and domestic operations,
Write a discussion response about the technological issues : Write a discussion board response to the claim that we should have more courses that are focused on technological issues that are presented in shows like Black
Write a function that has as input two row numbers : Write a function that has as input two row numbers and a data frame to take those rows from. The function needs to select all the columns except for id
What sample size is? needed : If the manager of a bottled water distributor wants to? estimate, with 90?% ?confidence, the mean amount of water in a? 1-gallon bottle to within ±0.004 gallons
What is the chance the baby will be a carrier of the disease : Draw a Punnett square to determine the likelihood of Marsha and Clement. What is the chance the baby will be a carrier of the disease, just like the parents?
Buy-sell to delta hedge this position : How many shares of stock should you buy/sell to delta hedge this position?
Why a neuron normally does not transform into a tumor : ANATOMY AND PHYSIOLOGY II - Why a neuron normally does not transform into a tumor and What other signs are commonly seen apart from blepharoptosis

Reviews

Write a Review

Applied Statistics Questions & Answers

  Risk factors using a multivariate analysis

The authors stated that they controlled for confounding many risk factors using a multivariate analysis. State an alternative method that the authors could have used to control for confounding in the design or analysis.

  Identify the null and alternative hypotheses

Identify the null and alternative hypotheses you should form for this test. State each both as a written explanation and as a math equation. Identify the appropriate statistical test to accept or reject the null hypothesis. Calculate the p-value. Wha..

  The probability a judge acquitting a randomly

A study revealed that the probability a judge acquitting a randomly chosen defendant was 0.17 while the probability that a jury would acquit a randomly chosen defendant was 0.33. Further, the probability that both a judge and a jury would have ..

  Perform two manova analyses

Perform two MANOVA analyses, a 3-Group analysis and a Factorial Design with 2-Independent variables as directed below on the HATCO dataset.

  Population proportion of union represented employees

What is the 99% confidence interval for π = the population proportion of union-represented employees who intend to vote for the labor contract?

  A manufacturing company produces three primary products

A description of the various methods used to analyze the problem in the order in which it is presented in the problem statement.

  Suppose that the lifetimes of tires of a certain brand are

Suppose that the lifetimes of tires of a certain brand are normally distributed with a mean of 75,000 miles and a standard deviation of σ miles. These tires come with a 60,000-mile warranty. The manufacturer of the tires can adjust σ during the produ..

  Find the average time a plane must wait in line

Find the average time a plane must wait in line before itcan land.c.Calculate the average time it takes a plane to clear therunway once it has noti?ed the airport that it is in thevicinity and wants to land.d.The FAA has a rule that an air traf..

  Identify and run the appropriate statistical test

EDUC 9400 - Advanced Data Analysis Assignment. Identify and run the appropriate statistical test to compare the scores for one group

  Compute the specificity of the test

a. Compute the sensitivity of the test b. compute the specificity of the test

  What is the conclusion about the hypothesis

What is the conclusion about the hypothesis? Identify the appropriate distribution to use. What is the point estimate of µ1 - µ2? Round to two decimal places.

  Develop a pareto chart

Develop a Pareto chart to identify the more significant types of rejection.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd