+1-415-670-9189
info@expertsmind.com
Write perl code to process and analyze the sequence data
Course:- Programming Languages
Reference No.:- EM131469714




Assignment Help
Assignment Help >> Programming Languages

1. Download and decompress the sequence data of chromosome 22.

2. Write Perl code to process and analyze the sequence data file downloaded.
 a. Read in the data from the file
 b. Use regular expression to extract the sequence from the file.
 c. Remove non-ATGC characters from the sequence
 d. Extract all the open reading frames (ORFs) from the whole sequence.
     - An ORF is a part of DNA sequence that has the potential to be translated.
     - Its length should be a multiple of 3.
    - ORFs are defined as those subsequences which have a start codon 'ATG' and any of the three stop codons 'TAA', 'TAG' and 'TGA'.Each codon includes threenucleotides.
     - In addition to start and stop codon, ORFs extracted here should have 30-90 nucleotides.
     - Only one stop codon is allowed in each ORF.
 e. Print out a message showing how many open reading frames are found in the screen.
 f. Translate each ORF into amino acid sequence using the subroutines provided.
 g. Write all found ORFsto a new data file.
 h. Write all translated amino acid sequence into another new data file.

Importantnotes:
    2.a - 2.ishould be done in one .pl file.
     Please use the subroutines provided to perform the translation.
     Once I implement your function, I would expect to input the data file name from the screen. Shown below is an example.

2073_Figure.jpg

Your output file in step 2.hand 2.i should automatically be created. Below is the example of 2.h output file.

1417_Figure1.jpg

Support information:

1. To write an array into a file, where each entry is shown in one line, use the following command:

print MFILE "$_\n" for @ORFs;

MFILE is the handle for output file.
@ORFs is the array.

2. If you have groups in your pattern
my $string = "TTTATGTGCTGCTAAAAA";
@matches = $string =~ m/^(ATG).*(TAA)$/g

=> With parenthesis () surrounding the subpattern "ATG" ad "TAA", substrings matching the "ATG" and "TAA" part will also be returned.
=> values in @matches will be ("ATGTGCTGCTAA", "ATG", "TAA")
=> Add ?: in the front, such as m/^(?:ATG).*(?:TAA)$/g, substring matching "ATG" and "TAA" will not be returned.
=> In this case the output will be ("ATGTGCTGCTAA")

Download Sequence data of chromosome 22

https://www.dropbox.com/s/6v5whj22boa3kur/hs_ref_GRCh38.p7_chr22.fa.gz?dl=0




Put your comment
 
Minimize


Ask Question & Get Answers from Experts
Browse some more (Programming Languages) Materials
Write a program that uses while loop to perform the following steps:Output the sum of all the even numbers between firstnum and secondnum inclusive. Output all the numbers and
Write a program which calculate and displays a 15 percent tip when the price of a meal is input by the user. (Hint: the tip is computed by multiplying the price of the meal
Implement and check the time to do a matrix-matrix product of a 1000x500 matrix with a 500x800 matrix of floats sequentially and using 1,2,3,4, and 8 threads on dual and q
Design the application which simulates functionality of gas pump. User enters number of gallons to buy and clicks desired grade (regular special super). Using Visual Basic.
Write down program which opens the file and copy file to another and for file there must be three columns of floating points for each column get maximum and minimum.
Design an implement an application that displays a button and a label on a screen. Every time the button is pushed, the label will display a random number.
Design a program that accepts an account number, the account owner's first and last names, and a balance. Print the projected running total balance for each year for the
Design and implement a program that creates an exception class called StringTooLongException, designed to be thrown when a string is discovered that has too many characte