Write a program that will open a blastn

Assignment Help Python Programming
Reference no: EM13718086

QUESTION:

The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here:

Note that this same file has both the annotation feature table and the FASTA sequence for the molecules referenced. (See the '##FASTA' directive in the specification.) Within the feature table another column of note is the 9th, where we can store any key=value pairs relevant to that row's feature such as ID, Ontology_term or Note.

Your task is to write a GFF3 feature exporter. A user should be able to run your script like this:

$ export_gff3_feature.py --source_gff=/path/to/some.gff3 --type=gene --attribute=ID --value=YAR003W

There are 4 arguments here that correspond to values in the GFF3 columns. In this case, your script should read the path to a GFF3 file, find any gene (column 3) which has an ID=YAR003W (column 9). When it finds this, it should use the coordinates for that feature (columns 4, 5 and 7) and the FASTA sequence at the end of the document to return its FASTA sequence.

Your script should work regardless of the parameter values passed, warning the user if no features were found that matched their query. (It should also check and warn if more than one feature matches the query.)

The output should just be printed on STDOUT (no writing to a file is necessary.) It should have a header which matches their query, like this:

>gene:ID:YAR003W
.... sequence here ...

Some bonus points will be awarded if you format the sequence portion of the FASTA output as 60-characters per line, which follows the standard.

Provide the complete source code AND the output of the program as it runs. You should do test runs with 3 features which are present in the file and 1 where you intentionally enter a feature NOT present in the file. Your script should handle this gracefully

QUESTION:

Write a program that will open a BLASTN (nucleotide to nucleotide search) output file, parse out specific information, and produce formatted output that will be written to STDOUT (i.e. Standard Output; the terminal window / command line).

Before writing your program, copy the Your program should start by opening the input file (you may hardcode the filename in this case), parsing and storing both the query sequence ID (from near the top of the file; look for the string following "Query=") and the query length (found on the line below the query sequence), and displaying them both to STDOUT. Add some additional characters and formatting to your
output such that these two fields appear exactly like this in STDOUT:

Query ID: IREALLYLIKEPYTHON

Query Length: 15

Then, it is time to parse information about the significant alignments for this query. Each alignment begins with the ">" symbol. For just the
first ten hits, parse out only the accession (located between the first set of pipe symbols, | | )
, length and score. For each of these hits, these three fields should then be written to STDOUT in exactly this format including capitalization, spacing, and punctuation (as shown here using the real values for the first hit ; study the file to understand exactly where these values came from):

Alignment #1: Accession = ref|XM_005094338.1| (Length = 2377, Score = 1098)

You must use regular expressions to pull out precisely the parts of the file that you want, which is the definition of parsing. Hint: you will very likely need to use parentheses to put some parts of those expressions into temporary memory (m.group(1), etc.) for later use.

Do not have your regular expression search for hardcoded values; your program should be able to read another BLASTN output file and run successfully, not just this specific one.

Pay careful attention to the exact appearance of the sample output, above. Although it is a good start to be able to, at a minimum, report the requested values, your program must also strive to match the formats specified.

Provide the complete source code AND the output of the program as it runs BLASTP output file,

/home/jorvis1/example_blast.txt

to your home directory.

Look through the file and explore the format.

Reference no: EM13718086

Questions Cloud

Determine whether vaporization is a major concern : In coal gasifiers, refractories and metals must withstand the coal/air or coal/oxygen combusting atmospheres. The Zr-Nb-Ti system is under consideration as alternative to the Fe and Ni alloy systems because the coal combustion degrades these alloy sy..
Explain a foundational knowledge of childrens development : Analyzing child development research and find one additional scholarly source - Explain a foundational knowledge of children''s development?
What is the pressure of this same gas : A tank of oxygen holds 14.0L of oxygen (O2) at a pressure of 30.0atm . When the gas is released, it provides 230.L of oxygen. What is the pressure of this same gas at a volume of 230.L and constant temperature?
Data from this table of thermodynamic properties : Use the data from this table of thermodynamic properties to calculate the maximum amount of work that can be obtained from the combustion of 1.00 moles of ethane, CH3CH3(g), at 25 °C and standard conditions.
Write a program that will open a blastn : Write a program that will open a BLASTN (nucleotide to nucleotide search) output file, parse out specific information, and produce formatted output that will be written to STDOUT
Design an automatic speed control for an automobile : We wish to design an automatic speed control for an automobile. Assuming that pure integral control (that is, no proportional term) is advantageous; select the feedback gain so that the roots have critical damping (? = 1).
Melt along the interface between the fe-si alloy-al coating : Your company is considering creating an alumina coating by immersing an Fe-Si alloy into an Al melt, withdrawing the coated Fe-Si, and then oxidizing the Al coated alloy at 1000°C in air to form an alumina coating.
Describe what a riser is in the context of sand casting : Describe what a riser is in the context of sand casting. What are its main functions? In light of Chvorinovs rule, what should be taken into account when determining the geometry, dimension, and location of a riser in a sand casting mold?
Volume rate of flow and time required to fill mold cavity : A mold has a downsprue of length = 6.0 in. The cross-sectional area at the bottom of the sprue is 0.45 in2. The sprue leads into a horizontal runner which feeds the mold cavity, whose volume = 75 in3. Determine (a) the velocity of the molten metal fl..

Reviews

Write a Review

Python Programming Questions & Answers

  You are tasked with improving the code for the haunted

you are tasked with improving the code for the haunted house game. please read the associated hand-out and the code

  Write a program using the ''requetinteger''

using python/jython programming write a program using the 'requetInteger' function that will ask the user to type a value that will draw a line from one point on a picture to another. I don't need specific help just a gerneral idea.

  The computer game function collision

The computer game function collision () checks whether two circular objects collide; it returns True if they do and False otherwise. Each circular object will be given by it's radius and the (x,y) coordinates of it's center.

  Let ll be a list of integers

Let LL be a list of integers. Use list comprehension to produce teh following lists. Each one should just take onel line. Anser questions as two comments.

  Write a function trans(m) which returns the transpose

Write a function trans(M) which returns the transpose of an n-by-n matrix M. The matrix M is represented by a list of n lists, each of length n. Transposing M means that each M[i][j] is swapped (once!) with M[j][i].

  Code for the haunted house game

Improve the game by adding more features, for example you can examine more items, more props etc. You may implement this using more lists regarding items and props, remember, you should check if the object is being carried or in the location of th..

  Interaction between the customer and the machine

In Python:Simulate a cash register or ATM including the interaction between the customer and the machine (i.e. assume that you are automating the responses)

  Write a python program that computes the mean, median, mode

How to write a python program that computes the mean, median, and mode?

  Write a function comp(d1,u1,d2,u2)

Write a function comp(d1,u1,d2,u2)

  Implement your algorithm in python

Write an algorithm in structured English (pseudocode) that describes the steps required to perform the task specified and reinforce topic material related to the programming work cycle, and the input, processing, output program structure.

  Write a function rmduplic(l), where l is any list

Write a function rmDuplic(L), where L is any list. The function should return a list M that contains the same items as L, except that repetitions (duplicates) have been removed: only the first occurrence of each entry is kept (i.e., order is prese..

  Two-dimensional word puzzle

Write a python function called find_words that finds and prints the list of valid words in a two-dimensional word puzzle

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd