Examine multiple variation parameters for a genomic region , Biology

  1. Determine SNP variation among the aligned DNAs for a genomic region.   See below for how to count SNP variation.  The output file (Your_name_snp.txt) should have two columns of numbers.  The first column will indicate total number of SNP sites per species and the second will be the percent of sequences/species having that same number of variant nucleotides.
  2. Determine in-del variation among the aligned DNAs for a genomic region. The output file (Your_name_in_del.txt) should be two columns of numbers.  The first column will indicate total number of in-del sites per species and the second will be the percent of sequences/species having that same number of in-del.
  3. Determine overall variation (SNPs and in-dels) among the aligned DNAs for a genomic region. The output file (Your_name_both.txt) two columns of numbers.  The first column will indicate total number of variant sites (SNP and in-del) per species and the second will be the percent of sequences/species having that same number of variant nucleotides.  This will generate the same data used for the figure on page 3.

Sample Alignment: 48 bases,  differences are highlighted

Seq1      ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC

Seq2      AAAAATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC

Seq3      AAAAATGCATGCATGCA-GCATGCATGCATGCATGCATGCATGCATGC

Seq4      AAAAATGCATGCATGCA-GCATGCATGCATTTTTGCATGCATGCATGC

Seq5      AAAAATGCATGCATGCA-GCATGCATGCATTTTTGCAT-CATGCATGC

Computation:  Compare Seq1 to 2,3,4, and 5 you find the differences (SNPs and InDels).

Seq1:Seq1 = 0 changes

Seq1:Seq2 = 3 changes

Seq1:Seq3 = 4 changes

Seq1:Seq4 = 7 changes

Seq1:Seq5 = 8 changes

 Repeat using each of the other sequences as the basis for comparison

Seq2:Seq1 = 3 changes                  Seq3:Seq1 = 4 changes

Seq2:Seq2 = 0 changes                  Seq3:Seq2 = 1 changes

Seq2:Seq3 = 1 changes                  Seq3:Seq3 = 0 changes

Seq2:Seq4 = 4 changes                  Seq3:Seq4 = 3 changes

Seq2:Seq5 = 5 changes                  Seq3:Seq5 = 4 changes

 

Seq4:Seq1 = 7 changes                  Seq5:Seq1 = 8 changes

Seq4:Seq2 = 4 changes                  Seq5:Seq2 = 5 changes

Seq4:Seq3 = 3 changes                  Seq5:Seq3 = 4 changes

Seq4:Seq4 = 0 changes                  Seq5:Seq4 = 1 changes

Seq4:Seq5 = 1 changes                  Seq5:Seq5 = 0 changes

 

Our input file is a FASTA format file of all sequences/species that has been previously aligned and trimmed.  There are some odd characters in the file, so we'll have to deal with that.

Posted Date: 2/25/2013 7:29:13 AM | Location : United States







Related Discussions:- Examine multiple variation parameters for a genomic region , Assignment Help, Ask Question on Examine multiple variation parameters for a genomic region , Get Answer, Expert's Help, Examine multiple variation parameters for a genomic region Discussions

Write discussion on Examine multiple variation parameters for a genomic region
Your posts are moderated
Related Questions
Explain Fontan Operation ? The original Fontan operation was done in a case where classical Glenn operation had been done (end-Lo-end anastomosis of superior vena cava and lig

What type of compound is the major metabolic waste of Porifera? Why is it important for the organism to get rid of this compound?

Q. Bone Loss - criteria for endosteal implants? Crestal bone loss after intial healing is a primary indicator of the need for initial preventive therapy. Early loss of crestal

What is the Endoplasmic Reticulum The cytoplasm of most eukaryotic cells contains a very complex network of internal membranes, called the endoplasmic reticulum, which forms ch

????? # 100 ??????????? #Minimum ?????? ?????

a nutrient agar plate labelled 10^(-5)ml had 154 colonies after incubated. what is the cell density in the original sample

Aspergillosis Aspergillosis is the most common chronic, granulomatous, necrotizing and cavitary disease of lungs, and is characterized by formation of white yellowish caseous

Define Standard or Total Plate Counts (SPC/TPC)? It is the most widely used method to know the microbiological quality of the food sample. It is quick and efficient method, giv

Discuss the role of NADPH in erythrocytes The fragility of erythrocytes  is impaired in the absence of NADPH generation due to the deficiency of glucose-6-phosphate dehydrogena

Seed Coat Effects In some seeds dormancy is imposed by the presence of the seed coat; if this is removed, the seed germinates. Two possible types of mechanisms could be invol