Write perl code to process and analyze the sequence data
Course:- Programming Languages
Reference No.:- EM131469714

Assignment Help
Expertsmind Rated 4.9 / 5 based on 47215 reviews.
Review Site
Assignment Help >> Programming Languages

1. Download and decompress the sequence data of chromosome 22.

2. Write Perl code to process and analyze the sequence data file downloaded.
 a. Read in the data from the file
 b. Use regular expression to extract the sequence from the file.
 c. Remove non-ATGC characters from the sequence
 d. Extract all the open reading frames (ORFs) from the whole sequence.
     - An ORF is a part of DNA sequence that has the potential to be translated.
     - Its length should be a multiple of 3.
    - ORFs are defined as those subsequences which have a start codon 'ATG' and any of the three stop codons 'TAA', 'TAG' and 'TGA'.Each codon includes threenucleotides.
     - In addition to start and stop codon, ORFs extracted here should have 30-90 nucleotides.
     - Only one stop codon is allowed in each ORF.
 e. Print out a message showing how many open reading frames are found in the screen.
 f. Translate each ORF into amino acid sequence using the subroutines provided.
 g. Write all found ORFsto a new data file.
 h. Write all translated amino acid sequence into another new data file.

    2.a - 2.ishould be done in one .pl file.
     Please use the subroutines provided to perform the translation.
     Once I implement your function, I would expect to input the data file name from the screen. Shown below is an example.


Your output file in step 2.hand 2.i should automatically be created. Below is the example of 2.h output file.


Support information:

1. To write an array into a file, where each entry is shown in one line, use the following command:

print MFILE "$_\n" for @ORFs;

MFILE is the handle for output file.
@ORFs is the array.

2. If you have groups in your pattern
@matches = $string =~ m/^(ATG).*(TAA)$/g

=> With parenthesis () surrounding the subpattern "ATG" ad "TAA", substrings matching the "ATG" and "TAA" part will also be returned.
=> values in @matches will be ("ATGTGCTGCTAA", "ATG", "TAA")
=> Add ?: in the front, such as m/^(?:ATG).*(?:TAA)$/g, substring matching "ATG" and "TAA" will not be returned.
=> In this case the output will be ("ATGTGCTGCTAA")

Download Sequence data of chromosome 22


Put your comment

Ask Question & Get Answers from Experts
Browse some more (Programming Languages) Materials
Write a Lisp function that: given a two-element list, reverse the order of the elements in the list (cannot use the LISP reverse function).
Write a piece of pseudocode that takes a number entered by the user and prints out every number from that number up to 100 and then prints END.
Draw a GUI that will create the objects and provide access to each object's processing methods. Use the drawing tool in Microsoft Word.
Write a higher order function list To that takes a function f and a number n and uses list comprehension to return a list of all the numbers from 1 to n for which the function
Write the implementation (.cpp file) of the Counter class. Here is the full specification of the class: A data member counter of type int.
COMP348: PRINCIPLES OF PROGRAMMING LANGUAGES - write a Prolog Program which describes a directed graph (G), with the following structure (shown below) and allows us to ask so
Write the output from your Instrument class methods to a text file that a user entered from the command line arguments (e.g. java violinOutput.txt). This allows your program
In many programs you will want to store a number of values, where each of those values is a record - how to declare and work with arrays of records - calculate the total of al