Experimental Design and Statistics Assignment -

Section 1: Which test?

The Human Microbiome Project analyzed the diversity of the microbial communities that live in and on the human body by taking samples from healthy individuals, sequencing the DNA of the microbes that were present in different regions of the body (see image in attached file). This allowed the identification and of the different taxa of bacteria present in each region, as well as quantifying the relative number of each taxon.

There are many statistical questions that can be addressed with these data. For each research question below, state the null and alternate hypotheses and the test you would use, including the variables to be tested. Be as specific as possible, including whether the test should be one- or two- tailed where appropriate.

A) Is there a difference in the number of bacterial taxa present in the saliva of men and women? Assume that the number of taxa present follows a normal distribution, with the same standard deviation in men and women.




B) Do people tend to harbor more bacterial taxa on the skin behind their ears or in their elbows? Assume the measurements for the left and right sides for each area are combined for each person.




C) An earlier paper proposed that individuals could be classified as belonging to one of three "enterotypes" based on the types of bacteria present in their gut. Is there is a difference in the frequencies of the three enterotypes among meat-eaters and vegetarians?




D) The enterotype hypothesis come partly from the observation that the distribution of frequencies of some bacterial groups across individuals are bimodal; for one of these taxa, Prevotella, people have either fairly high frequencies of Prevotella, or nearly undetectable levels. Few people have moderate frequencies. You want to test whether the frequency of Prevotella in the gut is affected by dietary fat levels, so you talk to a friend who has been doing an unrelated study where subjects were randomly assigned to either a high or low fat diet. You do not know what each individual's Prevotella level was before the study began, but you can measure the current level.




Section 2: Snakes and Snails

A number of snake species in south-east Asia have evolved to prey extensively or exclusively on land snails, and have evolved special morphological features to facilitate extracting snails from their shells, including jaws with many teeth to grip the slippery, slimy beasts. Most snails' shells that coil to the right, so a snake with a similar asymmetry in its own morphology might have an advantage in predation.

Researchers measured the asymmetry in snake jaws across a number of snake species by counting the number of teeth on the right and left side of the jaw (R and L, respectively) and calculating an asymmetry index: 100 × (R-L )/(R+L) . This index was normally distributed within species.

A) Why did they calculated the asymmetry index rather than just using R-L?

B) One of the snake species, Pareas iwasakii, had a mean asymmetry index in a sample of 28 snakes of 17.5, with a standard deviation of 8.5. Perform an appropriate test to determine if P. iwasakii shows significant asymmetry in tooth number. Be sure to clearly state your conclusions.

C) To test whether an asymmetrical jaw was helpful in predation against coiled snails, the researchers tested snake predation success on a number of different snails with either left- or right-handed shells. The snakes were scored based on the frequency with which they successfully extracted and ate the snails. Each snake was tested only on one type of shell. Using the data below, test whether the snakes are better at extracting snails with coils of one direction or the other. You may assume the extraction frequencies follow a normal distribution in each group.


left-handed shells


right-handed shells

Success Rate (%)









D) To improve the experiment above, a scientist decides to (1) test more snakes, (2) have each snake attempt to open both left and right handed shells. To simplify planning, she tests each snake (3) first on left-handed shells, then on right handed shells. Describe the effects on sampling error and/or bias for each of the three modifications.

Section 3: False and False

All of the following statements are false. Please correct the statement or explain the error.

A) According to the Central Limit Theorem, the larger the sample size, closer a sample's distribution will be to the normal distribution.

B) An experiment with a larger sample size will always be more accurate, with less bias, than one with smaller sample.

C) A scientist observed grizzly bears fishing for salmon in a stream. After the bear has left, she collects the fish carcasses and measures the jawbones of the fish to estimate their sizes. In a sample of 10 fish, she finds a mean jawbone length of 6.8 cm, with a standard deviation of 1.2 cm. Assuming jaw lengths in the population are normally distributed, her 95% confidence interval for the mean is 6.05 - 7.54 cm.

D) 6.8 cm is an unbiased estimate of the mean jaw length of the salmon in the stream.

E) In a case-control study of rates of smoking and lung cancer in Beijing, 126 of 226 smokers were found to have lung cancer, as compared to 35 lung cancer cases in a sample of 96 non-smokers. This means the odds ratio for lung cancer associated with smoking (in Beijing) is 1.53.

F) The odds ratio for lung cancer associated with smoking is much lower in China than in the United States. This suggests that lung cancer rates in China are lower than in the United States.

Section 4: Oh, reporters!

Article - Putting a Value to 'Real' in Medical Research By NICHOLAS BAKALAR.

The first paragraph and the last one are mostly okay, though I might have some quibbles. The real trouble is in that middle paragraph.

A) Rewrite the first sentence of the second paragraph to make it accurate.

B) The last sentence of the second paragraph implies that a p-value of 0.06 indicates that a study's results "were probably due only to chance." Why is this incorrect?

C) If we actually wanted to quantify the probability that the results of an experiment were due to chance, what probability would we need to know in addition to the p-value?

Section 5: Elephant Evolution

Recently, it was discovered that African elephants, previously classified as one species, are actually two distinct species: the African forest elephant and the African savannah elephant. You want to get a sense of the rate at which differences have accumulated in DNA between the forest and savannah elephants, so you sequence 1000 base pair segments of DNA from each of 100 genetic regions in the two species, and count the number of differences between the species. The results appear below.


Number of regions

Expected number of regions




























A) What is the mean number of differences per base-pair between the two species?

B) I have pre-calculated the expected values for the number of regions with a given number of differences. What distribution did I use? What is the null hypothesis associated with this set of expected values?

C) Perform the appropriate test of the null hypothesis and report your results.

D) By sequencing the Asian elephant and wooly mammoth, it is sometimes possible to identify whether a mutation that separates the two African species occurred in the ancestors of the forest elephants or savannah elephants. Looking at those mutations, we can then classify them by whether the mutation occurred in an amino acid coding region or between genes (intergenic). If 8 of 56 mutations that occurred in the forest elephant are from coding regions and 9 of 48 mutations in the savannah elephant are from coding regions, do the two species differ in the proportion of their mutations that occur in coding regions? Perform the appropriate test and report your results.

