+1-415-670-9189
info@expertsmind.com
Write perl code to process and analyze the sequence data
Course:- Programming Languages
Reference No.:- EM131469714




Assignment Help
Assignment Help >> Programming Languages

1. Download and decompress the sequence data of chromosome 22.

2. Write Perl code to process and analyze the sequence data file downloaded.
 a. Read in the data from the file
 b. Use regular expression to extract the sequence from the file.
 c. Remove non-ATGC characters from the sequence
 d. Extract all the open reading frames (ORFs) from the whole sequence.
     - An ORF is a part of DNA sequence that has the potential to be translated.
     - Its length should be a multiple of 3.
    - ORFs are defined as those subsequences which have a start codon 'ATG' and any of the three stop codons 'TAA', 'TAG' and 'TGA'.Each codon includes threenucleotides.
     - In addition to start and stop codon, ORFs extracted here should have 30-90 nucleotides.
     - Only one stop codon is allowed in each ORF.
 e. Print out a message showing how many open reading frames are found in the screen.
 f. Translate each ORF into amino acid sequence using the subroutines provided.
 g. Write all found ORFsto a new data file.
 h. Write all translated amino acid sequence into another new data file.

Importantnotes:
    2.a - 2.ishould be done in one .pl file.
     Please use the subroutines provided to perform the translation.
     Once I implement your function, I would expect to input the data file name from the screen. Shown below is an example.

2073_Figure.jpg

Your output file in step 2.hand 2.i should automatically be created. Below is the example of 2.h output file.

1417_Figure1.jpg

Support information:

1. To write an array into a file, where each entry is shown in one line, use the following command:

print MFILE "$_\n" for @ORFs;

MFILE is the handle for output file.
@ORFs is the array.

2. If you have groups in your pattern
my $string = "TTTATGTGCTGCTAAAAA";
@matches = $string =~ m/^(ATG).*(TAA)$/g

=> With parenthesis () surrounding the subpattern "ATG" ad "TAA", substrings matching the "ATG" and "TAA" part will also be returned.
=> values in @matches will be ("ATGTGCTGCTAA", "ATG", "TAA")
=> Add ?: in the front, such as m/^(?:ATG).*(?:TAA)$/g, substring matching "ATG" and "TAA" will not be returned.
=> In this case the output will be ("ATGTGCTGCTAA")

Download Sequence data of chromosome 22

https://www.dropbox.com/s/6v5whj22boa3kur/hs_ref_GRCh38.p7_chr22.fa.gz?dl=0




Put your comment
 
Minimize


Ask Question & Get Answers from Experts
Browse some more (Programming Languages) Materials
"Why is it significant to use systems analysis and design methodologies when building system? Why not just build system in whatever way seems to be quick and easy?
The total price will be the number of computed rental days times the cost for the model of car selected ($19.95/day for Compact, $24.95/day for Standard and $39/day for Luxu
Create a crosstab query to show how many enrollments took place in various conventions from different companies. Also show the total number of enrollments in each convention
Write a MARIE program using a loop that multiplies two non-negative numbers by using repeated addition. For example, to multiply 3 x 6, the program would add 3 six times (e.
Create a file that contains your favorite movie quote. Use a text editor such as Notepad and save the file as Quote.txt. Copy the file contents and paste them into a word-pr
Choose two different formats for audio and with each record a short passage of speech and a short passage of music. Try to discern any difference in quality. What do you thi
Give the user specific instructions for what their answer should be (i.e. Y or N). If their answer is anything other than one of the specified choices, write an error messa
Write a program that lets the user perform arithmetic operations on fractions. Fractions are of the form a/b, where a and b are integers and b is not equal to 0.