Reference no: EM133191314
Genetic Data Analysis for Conservation Management and Wildlife Forensics
Assignment - Data interpretation report
Aim
The aim of this assignment is to demonstrate your ability to demonstrate an understanding of genetic data analyses relevant to conservation genetics and wildlife forensics. You will achieve this by evaluating data generated in two case scenarios.
Background & Task
During this course, we explored a variety of DNA sequence and microsatellite analyses relevant for conservation genetics and wildlife forensics. We have learnt about the different methodologies to acquire DNA sequence and microsatellite data, approaches to process this sequence and genotypic data, the use of different software packages to conduct an array of data analyses, and how to interpret the resulting data.
Learning Outcome 1: Demonstrate the knowledge of the standard approaches to data analysis and common software packages within applied conservation genetics and wildlife DNA forensics.
Learning Outcome 2: Analyse data to address principal questions in applied conservation genetics and wildlife forensics.
Learning Outcome 3: Critically evaluate relevant numerical and graphical results.
Data interpretation report
Using the templates provided answer all the questions for part A and part B. Part A is worth 50 marks, Part B is worth 50 marks. Details on word count and marks for the questions in each Part are specified on the respective templates.
For this assignment, you do not need to include any bibliographic material but you might be asked to briefly justify your answers to the questions based on your learning of the course material.
Word limit for the interpretation report: 1200 (600 words part A, 600 words part B)
The assignment should be no more than 1200 words, and the word limits for each part respected. This is the maximum number of words permitted. Word counts must be stated at the beginning of your submission. Word count limits exclude the reference list, tables, figures, footnotes and appendices. All citations (references in the main body of the text) are included in the word count.
If you exceed the word count a penalty of a matched reduction in the total mark for each percentage over the word count will be applied. For example, if you would have received a mark of 70% but have exceeded the word limit by 10%, your mark will be reduced to 60%. Penalties will be applied by the Exam Board. Your provisional mark will therefore not include the penalty.
If you have any questions about the assessment, please place them on the assessment discussion board so that others can also benefit from the answer to your question.
Discussion board contribution
As part of this assignment, you are also required to summarise and reflect on your best discussion board contribution. This can include a summary of your best contribution and a reflection of how this contribution might have prompted responses from other students. Please include details of the discussion board you are referring to (i.e. the title of the discussion board and week of teaching). The maximum word Genetic Data Analysis for Conservation Management and Wildlife Forensics: Assessment brief count for this Part is 100 words.
Data interpretationreport part A - Microsatellite data analyses
Scenario
As a conservation geneticist, you have been asked to study the genetic diversity and population structure of four populations of Gray brocket (Mazamagouazoubira) in Uruguay. Samples originated from populations near Río Negro, and affluent of the Río de la Plata. Two populationslocated at the east side, and two at the west side, of Río Negro. Individuals for samples collected from the four populations have been genotyped at 16 microsatellite loci. These microsatellites were previously developed for other deer or ungulates studies. You can find all the details of this dataset and screenshots of the results from all data analyses in the associated document to this scenario. Carefully inspect the output of the data analyses and answer all thequestions for the sections below.
Marks and word count are specified for each of the sections.
Part 1. Microsatellite scoring
Examine the electropherograms included in Figure 1a and Figure 1b and answer the following questions (max. words 150, 10 marks) :
1. Figure 3a illustrates the screenshot of a multiplex for one of the microsatellite panels used in the study.
• Considering the microsatellite markers highlighted by a black box, what artefacts you should be aware of during the allele scoring process? Are there any other areas in the multiplex (not highlighted with the box) that you should be aware of any similar artefacts?
2. Figure 3b illustrates the electropherogram for the genotype of a locus for one individual.
• How would you score the alleles in the electropherogram? Why the peaks do not fall within the bins? Could you do anything to try to make the peaks coincide with the bins?
Part 2. HWE and LD tests
Examine all the tables with the outputs from the HWE and LD tests (Table 2a - Table 3d) and answer the following questions. Justify your answers briefly.
3. Tables 2a - 2d. Results from the HWE tests.
• What loci are not in HWE in each population?
• Would you exclude any loci from further analyses?
4. Tables 3a - 3d. Results from the LD tests.
• What loci were not in linkage equilibrium?
• Would you exclude any loci from further analyses?
Part 3. Genetic diversity analyses
Examine the tables and figure with the genetic diversity analyses outputs (Table 4a, Table 4b, and Figure 4) and answer the following questions (max. words 100, 10 marks). Justify your answers briefly.
1. What locus presented the highest diversity?
2. What population was most genetically diverse?
Part 4. Population structure F- statistics
Examine all the tables with the population structure outputs using an F-statistics approach (Table 5a - Table 5c) and answer the following questions (max. words 100, 10 marks). Justify your answers briefly.
7. What was the overall population structure in the study area?
8. What four loci had the most resolution to assess population structure in the dataset?
9. What populations presented the highest genetic differentiation? And the lowest?
Part 5. Population structure Bayesian clustering
Examine all the figures with the outputs from STRUCTURE and Structure Harvester (Figures 5a -5c) and answer the following questions (max 200 words, 15 marks). Justify your answers briefly.
1. What is the most likely number of genetic populations in the data set?
2. Where is the largest genetic differentiation in the study area?
3. Do these results agree with the population structure estimates using F-statistics?
4. Do these results make sense biologically?
Sequence Analysis
Scenario
You are running a wildlife DNA forensic laboratory in Africa. The investigations unit of the National Parks Authority has seized a quantity of meat that they believe is from an illegally killed wild species. The suspect in possession of the meat denies that it is bushmeat, but refuses to say what species it did come from. The investigations unit has removed a sample of the meat, submitted it as evidence to your laboratory and asked you to conduct DNA analysis of the sample to identify the species.
You decided to conduct DNA extraction, PCR amplification and mitochondrial DNA sequencing analysis. You used universal primers to target a Part of the cytochrome b gene region. Your task now is to analyse the resulting sequence data and to draw conclusions on the identity of the meat.
Sequence editing (100 words)
Examine Results 1a and 1b showing the forward and reverse sequence raw reads for the test sample.
1. State the sequence position (in bases) where you would choose to trim each sequence and briefly explain why.
2. After trimming, are there any other nucleotide bases that you would consider ambiguous in either sequence? If so, identify them.
Sequence alignment (100 words)
Examine Result 2a & b. The forward and reverse sequence have been trimmed, the reverse sequence has been subject to reverse complementation and the two sequences have been aligned.
3. Explain the similarities and differences between the two sequences in the alignment.
4. Are you confident in the accuracy of the sequence data? Describe the key issues that may affect base calling in the two sequences.
Species identification using a BLASTn search of Genbank (200 words)
Examine Result 3.
5. Can you identify the species of the test sample (DK_NY23)? Provide scientific and common names and the percentage sequence divergence between the test and the top reference sample sequences.
6. What possible reasons are there for not getting a 100% sequence match?
7. The 6th and 9th highest sequence matches have different genus names, but the same species name. Which is correct, how can this happen and what are the implications for data interpretation?
Species identification using a phylogenetic tree approach (200 words)
Examine Result 4. Due to the divergence between the test sample sequence and the top reference sequence match, you have decided to verify your identification using a phylogenetic reconstruction (tree-building) approach.
8. Does the position of the test sample (DK_NY23) in the tree support or refute the BLASTn result (Result 3)?
9. What information is missing from the tree that may affect how it should be interpreted?
10. Before concluding on the identity of the meat sample for the investigation, what other species sequences should you consider for comparison with the test sample sequence?