Perl program that finds all the orfs in a genomic sequence

Assignment Help Programming Languages
Reference no: EM131017675

Project Description:

Given the genomic sequences for an organism; one of the first steps in identifying the genes is to identify the open reading frames (ORFs).

An open reading frame is a maximal length sequence of the DNA that starts with a start codon ATG and ends with a stop codon (TAA, TAG or TGA). In prokaryotes, gene may occur within ORFs. in eukaryotes, the story is complicated by the presence of introns that are spliced out of the mRNA before translation. in this assignment, you will write a Perl program that finds all the ORFs in a genomic sequence.

A genomic sequence has 6 reading frames, corresponding to the six possible ways of translating the sequence into three-letter codons. Frame 1 treats each group of three bases as a codon, starting from the first base. Frame 2 starts at the second base, and frame 3 starts at the third base. Frames 4, 5 and 6 are defined in a similar way, but refer to the opposite strand, which is the reverse complement of the first strand.

Specifications:

Write a Pert program called oils to find all the open reading frames

INPUT:
The program will take in as input a file, which will contain any number of DNA sequences in the FASTA format:
- A line beginning with a ">" is the header line for the next sequence
- All lines after the header contain sequence data.
- There will be any number of sequences per file.
- Sequences may be split over many lines.
- Sequence data may be upper or lower case.
- Sequence data may contain white space, which should be ignored.
Ask the user for the minimum ORF to search for. The default is 50, which means your program should print out all ORFs with at least 50 bases.

OUTPUT:

Print your output in FASTA format, with one header line for each ORF, followed by the DNA in the ORF. The header should be the same as the header in the input file, followed by a bar "1" followed by

FRAME = <N> POS = <P> LEN = <L>, where
<N> is the frame number (1-6)
<P> is the genomic position of the start of the ORF (left end is base 1) <L> is the length of the ORF (in bases)

If N = 4, 5 or 6, then P should be a negative number that indicates the position of the start of the ORF from the right end of the sequence.
The DNA in the ORF should be printed out with a space between each codon, and no more than 15 codons per line. For example:

>gi117861811 Escherichia coli K-12 1 FRAME = 1 POS = 5215 LEN = 138 ATG ATA AAA GGA GTA ACC TGT GAA AAA GAT GCA ATC TAT CGT ACT CGC ACT TTC CCT GGT TCT GGT CGC TCC CAT GGC AGC ACA GGC TGC GGA AAT TAC GTT AGT CCC GTC AGT AAA ATT ACA GAT AGG CGA TCG TGA

Reference no: EM131017675

Questions Cloud

Be sure to make an electronic copy of your answer : Be sure to make an electronic copy of your answer before submitting it to Ashworth College for grading. Unless otherwise stated, answer in complete sentences, and be sure to use correct English spelling and grammar.
Find probability of disease a2 given symptoms : suppose that any one of three mutually exclusive symptom states (B1, B2, and B3) may be associated with each of these diseases. Experience shows that the likelihood of P(Bi /Ai) having a given symptom state when the disease is present is as shown ..
Compare your hometown barquisimeto to city you live in miami : Compare and contast essay on topic "Compare and contrast your hometown (Barquisimeto,Venezuela) to the city you live in now(Miami, Florida)"
Prepare a single journal entry to record : It also pays a total of $1,440,000 in construction costs this amount consists of $1,354,500 for the new building and $85,500 for lighting and paving a parking area next to the building. Prepare a single journal entry to record these costs incurred..
Perl program that finds all the orfs in a genomic sequence : Given the genomic sequences for an organism; one of the first steps in identifying the genes is to identify the open reading frames - write a Perl program that finds all the ORFs in a genomic sequence.
Calculate the break-even time for the new product : Sales will begin after two years and will generate an annual discounted net cash flow of $200,000 starting in year three. Calculate the break-even time for the new product.
Notes receivable and crediting accounts receivable : Accounts Receivable and crediting Notes Receivable and Interest Revenue.
Find the probability that a patient truly had appendicitis : Find the probability that a patient truly did not have appendicitis given that the radiological determination was definite appendicitis (DA). Find the probability that a patient truly did not have appendicitis given that the radiological determin..
What did they gain by resisting and assimilating : What did they gain by resisting and/or assimilating? What did they hope to gain? Were they successful? Why or why not?

Reviews

Write a Review

Programming Languages Questions & Answers

  Determine the first two lines of the new file

Determine the first two lines of the new file created by the code below. This exercise refers to the file Justices.txt that contains data about the Supreme Court justices, past and present.

  Programming concepts

The assignment problem is straightforward. All necessary details have been supplied. The solution of the problem will be straight line code which will use the programming concepts and strategies covered in Workshops 1-3. The subgoals are:

  Does the kind of values assigned to a a variable

Does the kind of values assigned to a a variable (numeric or character) influence the way SAS reads data?

  Program to convert the currency

Display "Enter a currency value (positive number):"

  Translate the given code into mips64 assembly language

you can use the instructions 'blt' (branch on less than), 'ble' (branch on less or equal), 'bgt' (branch on greater than) and 'bge' (branch on greater or equal) - Translate the given code into MIPS64 assembly language.

  Design program to enter a series of numbers

Design a program that asks the user to enter a series of 20 numbers. The program should determine whether the number is valid by comparing it to the following list of valid charge account numbers.

  Explain drawbacks to using ajax technology

What are some of the other drawbacks to using AJAX technology? Why are some of these very significant items to consider before implementing AJAX on a given website?

  Create a working program for alpha testing

you want to put together the work that has been completed to deliver a working program for alpha testing. You will combine the elements you have written to this point and deliver a working solution.

  About linux programming language

creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as theparent

  Now an electric resistor placed in the tank is turned on

an insulated rigid tank initially contains 1.4-kg saturated liquid water and water vapor at 200degc. at this state 25

  Explaining object hierarchy in object-oriented programming

Study the concept of inheritance and the object hierarchy in object-oriented programming, and write a paper comparing these concepts with the concept of inheritence.

  Compute the trajectory of the particle

Your program should then compute the trajectory of the particle and display the motion of the particle as an animation in 3 dimensions

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd