Perl program that finds all the orfs in a genomic sequence

Assignment Help Programming Languages
Reference no: EM131017675

Project Description:

Given the genomic sequences for an organism; one of the first steps in identifying the genes is to identify the open reading frames (ORFs).

An open reading frame is a maximal length sequence of the DNA that starts with a start codon ATG and ends with a stop codon (TAA, TAG or TGA). In prokaryotes, gene may occur within ORFs. in eukaryotes, the story is complicated by the presence of introns that are spliced out of the mRNA before translation. in this assignment, you will write a Perl program that finds all the ORFs in a genomic sequence.

A genomic sequence has 6 reading frames, corresponding to the six possible ways of translating the sequence into three-letter codons. Frame 1 treats each group of three bases as a codon, starting from the first base. Frame 2 starts at the second base, and frame 3 starts at the third base. Frames 4, 5 and 6 are defined in a similar way, but refer to the opposite strand, which is the reverse complement of the first strand.

Specifications:

Write a Pert program called oils to find all the open reading frames

INPUT:
The program will take in as input a file, which will contain any number of DNA sequences in the FASTA format:
- A line beginning with a ">" is the header line for the next sequence
- All lines after the header contain sequence data.
- There will be any number of sequences per file.
- Sequences may be split over many lines.
- Sequence data may be upper or lower case.
- Sequence data may contain white space, which should be ignored.
Ask the user for the minimum ORF to search for. The default is 50, which means your program should print out all ORFs with at least 50 bases.

OUTPUT:

Print your output in FASTA format, with one header line for each ORF, followed by the DNA in the ORF. The header should be the same as the header in the input file, followed by a bar "1" followed by

FRAME = <N> POS = <P> LEN = <L>, where
<N> is the frame number (1-6)
<P> is the genomic position of the start of the ORF (left end is base 1) <L> is the length of the ORF (in bases)

If N = 4, 5 or 6, then P should be a negative number that indicates the position of the start of the ORF from the right end of the sequence.
The DNA in the ORF should be printed out with a space between each codon, and no more than 15 codons per line. For example:

>gi117861811 Escherichia coli K-12 1 FRAME = 1 POS = 5215 LEN = 138 ATG ATA AAA GGA GTA ACC TGT GAA AAA GAT GCA ATC TAT CGT ACT CGC ACT TTC CCT GGT TCT GGT CGC TCC CAT GGC AGC ACA GGC TGC GGA AAT TAC GTT AGT CCC GTC AGT AAA ATT ACA GAT AGG CGA TCG TGA

Reference no: EM131017675

Questions Cloud

Be sure to make an electronic copy of your answer : Be sure to make an electronic copy of your answer before submitting it to Ashworth College for grading. Unless otherwise stated, answer in complete sentences, and be sure to use correct English spelling and grammar.
Find probability of disease a2 given symptoms : suppose that any one of three mutually exclusive symptom states (B1, B2, and B3) may be associated with each of these diseases. Experience shows that the likelihood of P(Bi /Ai) having a given symptom state when the disease is present is as shown ..
Compare your hometown barquisimeto to city you live in miami : Compare and contast essay on topic "Compare and contrast your hometown (Barquisimeto,Venezuela) to the city you live in now(Miami, Florida)"
Prepare a single journal entry to record : It also pays a total of $1,440,000 in construction costs this amount consists of $1,354,500 for the new building and $85,500 for lighting and paving a parking area next to the building. Prepare a single journal entry to record these costs incurred..
Perl program that finds all the orfs in a genomic sequence : Given the genomic sequences for an organism; one of the first steps in identifying the genes is to identify the open reading frames - write a Perl program that finds all the ORFs in a genomic sequence.
Calculate the break-even time for the new product : Sales will begin after two years and will generate an annual discounted net cash flow of $200,000 starting in year three. Calculate the break-even time for the new product.
Notes receivable and crediting accounts receivable : Accounts Receivable and crediting Notes Receivable and Interest Revenue.
Find the probability that a patient truly had appendicitis : Find the probability that a patient truly did not have appendicitis given that the radiological determination was definite appendicitis (DA). Find the probability that a patient truly did not have appendicitis given that the radiological determin..
What did they gain by resisting and assimilating : What did they gain by resisting and/or assimilating? What did they hope to gain? Were they successful? Why or why not?

Reviews

Write a Review

Programming Languages Questions & Answers

  Write accessors for each of the declared class variables.

Read the file and display in the listbox each record splitting out the fields, eliminating the comma delimiters and placing spaces between the fields.

  Create the structure with 3 members and fill in data

Create a function that will display() all the data for each member and call it from the main program.

  Write program to find smallest-largest value from n numbers

Write down program which will determine the smallest, largest and average values in collection of N numbers. Get value of N before scanning each value in collection of N numbers.

  Create gui application that allows the user to enter the

A movie theater only keeps a percentage of the revenue earned from ticket sales. The remainder goes to the movie company. Create a GUI application that allows the user to enter the data into text fields.

  Ruby on rials to design app

Use ruby on rials to design app. It has to have a database and at least 4 pages Style is free you can design it as the way that you like

  Write program to compute the volume flow rate

Write program to compute volume flow rate in cubic feet per second of water flowing through pipe of diameter d in inches and a velocity of v feet per second.

  Calculate the total number of jobs

Write a function that takes the name of a report file as its argument and returns the percentage value from the bottom table in the report - create a data frame with the year-month for each report in one column and the percentage values in another c..

  Program which calculate and displays fifteen percent tip

Write a program which calculate and displays a 15 percent tip when the price of a meal is input by the user. (Hint: the tip is computed by multiplying the price of the meal by 0.15.).

  Discuss what is meant by low-level programming

Discuss what is meant by "low-level" programming. What are the advantages of assembly language over higher-level languages for this type of programming?

  Does the kind of values assigned to a a variable

Does the kind of values assigned to a a variable (numeric or character) influence the way SAS reads data?

  Two-dimensional array to store weekly hours for employees

Assume the weekly hours for all employees are stored in two-dimensional array. Each row records emaployee's seven-day workhours with seven columns.

  Create a simple command line program

Create a simple command line program that simulates the rolling of a pair of six sided dice a user given number of times. The number of times to roll the pair of dice should be read as input from the argv array on the command line.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd