Write a program that will open a blastn

Assignment Help Database Management System
Reference no: EM13545981

Question 1:

Your task is to write a GFF3 feature exporter. A user should be able to run your script like this:

$ export_gff3_feature.py --source_gff=/path/to/some.gff3 --type=gene --attribute=ID
--value=YAR003W

There are 4 arguments here that correspond to values in the GFF3 columns. In this case, your script should read the path to a GFF3 file, find any gene (column 3) which has an ID=YAR003W (column 9). When it finds this, it should use the coordinates for that feature (columns 4, 5 and 7) and the FASTA sequence at the end of the document to return its FASTA sequence.

Your script should work regardless of the parameter values passed, warning the user if no features were found that matched their query. (It should also check and warn if more than one feature matches the query.)

The output should just be printed on STDOUT (no writing to a file is necessary.) It should have a header which matches their query, like this:

>gene:ID:YAR003W
.... sequence here ...

Some bonus points will be awarded if you format the sequence portion of the FASTA output as 60-characters per line, which follows the standard.

Provide the complete source code AND the output of the program as it runs. You should do test runs with 3 features which are present in the file and 1 where you intentionally enter a feature NOT present in the file. Your script should handle this gracefully.

QUESTION 2:

Write a program that will open a BLASTN (nucleotide to nucleotide search) output file, parse out specific information, and produce formatted output that will be written to STDOUT (i.e. Standard Output; the terminal window / command line). Before writing your program, copy the BLASTP output file, /home/jorvis1/example_blast.txt to your home directory. Look through the file and explore the format.

Your program should start by opening the input file (you may hardcode the filename in this case), parsing and storing both the query sequence ID (from near the top of the file; look for the string following "Query=") and the query length (found on the line below the query sequence), and displaying them both to STDOUT. Add some additional characters and formatting to your output such that these two fields appear exactly like this in STDOUT:

 

Then, it is time to parse information about the significant alignments for this query. Each alignment begins with the ">" symbol. For just the first ten hits, parse out only the accession (located between the first set of pipe symbols, | | ), length and score. For each of these hits, these three fields should then be written to STDOUT in exactly this format including capitalization, spacing, and punctuation (as shown here using the real values for the first hit; study the file to understand exactly where these values came from):

Alignment #1: Accession = ref|XM_005094338.1| (Length = 2377, Score = 1098)

You must use regular expressions to pull out precisely the parts of the file that you want, which is the definition of parsing. Hint: you will very likely need to use parentheses to put some parts of those expressions into temporary memory (m.group(1), etc.) for later use.

Do not have your regular expression search for hardcoded values; your program should be able to read another BLASTN output file and run successfully, not just this specific one.

Pay careful attention to the exact appearance of the sample output, above. Although it is a good start to be able to, at a minimum, report the requested values, your program must also strive to match the formats specified.

Provide the complete source code AND the output of the program as it runs.

Reference no: EM13545981

Questions Cloud

What is approximate coefficient of restitution for collision : A 5-kg object moving with a velocity of 4m/s colides head on with a 10-kg object moving towards it at -3m/s. If the 10-kg object stops dead after the collision, what is the approximate coefficient of restitution for this collision
State the rate law for the reaction : The reaction of t-butyl bromide (CH3)3CBr with water to make t-butyl alcohol (CH3)3COH proceeds according to the following steps: write the rate law for the reaction
Calculate what is the moment of inertia of the pulley : A string is wrapped around a pulley with a radius of 2.0cm. The pulley is initally at rest. What is the moment of inertia of the pulley
Determine the kinetic energy of ball at its highest point : A 47.0 g golf ball is driven from the tee with an initial speed of 50.0 m/s and rises to a height of 25.6 m. (a) Neglect air resistance and determine the kinetic energy of the ball at its highest point. J (b) What is its speed when it is 10.0 m below..
Write a program that will open a blastn : Write a program that will open a BLASTN nucleotide to nucleotide search output file, parse out specific information, and produce formatted output that will be written to STDOUT
Find the maximum speed of the ball during its swing : An iron ball hangs from a 21.5-m steel cable and is used in the demolition of a building at a location where the acceleration due to gravity is 9.78 m/s2. find the maximum speed of the ball during its swing
Rectangular loop of wire hangs vertically as shown in figure : A rectangular loop of wire hangs vertically as shown in the figure. A magnetic field is di- rected horizontally, perpendicular to the wire, and points out of the page at all points as rep- resented by the symbol ?.
Fundamental frequency of open organ pipe corresponds : The fundamental frequency of an open organ pipe corresponds to the B above middle C (493.9 Hz on the chromatic musical scale). The third resonance of a closed organ pipe has the same frequency. (Assume that the speed of sound in air is 343 m/s.) (a) ..
What is the coordinate of the center of mass of the stick : A thin stick is placed along the x axis so that one of its ends is at x1 = 0.00 m and the other one at x2 = 1.35 m. The stick has a non-uniform mass density ?(x) = ax b, where a = 0.349 kg/m2 and b = 0.317 kg/m. What is the coordinate of the center o..

Reviews

ska545981

8/23/2016 1:18:03 AM

Dear tutor, thanks for your reply!! i will be sending more assignments in upcoming days, thanks always for assistance..

ska545981

8/23/2016 1:17:25 AM

The program is written in MS word document which is well written program and complied perfectly while running. Thanks tutor for your quick help, i got excellent grade in Assignment and i will keep more updates in future for my upcoming homework, I would like to ask few queries regarding assignment, May i get report on program? to know more about logic and process of execution of program, i need proficient knowledge about my each course assignments, i dont want to complete my grade in just taking help, i want proficient knowledge how to course assignments are developed or completed. Thanks in advance to understand my concern, I am waiting for your reply.

Write a Review

Database Management System Questions & Answers

  Create database for cover 2010 tour de france cycling race

Draw an Entity-Relationship diagram for this database using UML notation. Be sure to include all the entities mentioned above, together with attributes (including primary key attributes).

  Question 1 consider the relational schemapartpart-id name

question 1. consider the relational schemapartpart-id name costsubpartpart-id subpart-id counta tuple p1 p2 3 in the

  To analyse and comprehend a provided er diagram

Display the item id and the difference between the default price and cost (ITECH5006 - together with a percentage markup) of all products.

  Describe what the role of the database administrator is and

discuss what the role of the database administrator is and why it is such an important role in the company. do we

  I describe the application that you would like to design

i. describe the application that you would like to design. this should include the purpose of the application and an

  Write table in dbdl notation after applying methodology

Proceed with all steps in information-level design to add this user view to existing cumulative design. Is this table in1st NF?___No____ . If Yes skip. If not, write Table in DBDL notation after applying the methodology we use for converting into 1NF..

  Explaining unclustered b pus tree index

Suppose you have a table which contains 27,000 data records, and you have unclustered B+ Tree Index on the table.

  What is the cost of joining r and s using a sort-merge join

What is the cost of joining R and S using a page-oriented simple nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?

  Define database systems and data warehouses

Describe how that firms likely use or should use Management Information Systems, Information Systems and Information Technology as it relates to the various topics covered in the class.

  Write sql statements to calculate average salary

Write SQL statements that do the following: Calculate the average salary for all employees. Calculate the maximum salaries for exempt and non-exempt employees.

  Search for a record based on a particular field value

For each of the following queries, which of the listed index choices would you choose to speed up the query? If your database system does not consider index-only plans (i.e., data records are always retrieved even if enough information is available i..

  Our boss approached you to identify a solution for a

our boss approached you to identify a solution for a performance issue and system outages that have been experienced on

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd