Creation of a program that will assemble dna chains

Assignment Help Python Programming
Reference no: EM131030777

Genome Assembly

One of the most significant developments of the recent decades was the deciphering of the human genome. The techniques used in this effort can now be used to investigate genetic illnesses, identification of mutations, understanding genomes of ancient organisms that have disappeared, and in numerous other applications.

Genes are encoded within the DNA, a large organic molecule which consists of double chains of four bases: cytosine (cytosine, C), guanine (guanine, G), adenine (adenine, A), and thymine (thymine, T).

Each of the two chains comprises a series of bases, e.g. ACCGTATAG. The other chain consists of bases connected one by one to the first chain bases, with the rules: A-T, C-G. Thus if a chain is ACCGTATAG, the other will be TGGCATATC.

To find the composition of an unknown DNA chain, we work as follows. Using molecular biology techniques, we create multiple copies of the chain and break it into small fragments, for example fragments consisting of three bases each.

Such fragments can easily be identified with special devices. So we end up with a set of known fragments. The problem we then face is to assemble the known fragments into the DNA chain, so that we can learn its exact composition.

Suppose that we have these fragments, or polymersas they are called: GTG, TGG, ATG, GGC, GCG, CGT, GCA, TGC, CAA, AAT, ATG, AAT. Each of them has a length of 3, but generally theycould have a "k" length.To find out from which DNA sequencethey (the fragments) emerged, we create a graph whosevertices are polymers with a length of 2 (or generally k-1),which result from polymers with a length of 3, taking for each polymer with a length of 3, the first 2 (k-1) and the last 2 (k-1) polymers.

Thatway, from GTG wewilltake GT and TG, from TGG we will take TG and GG, etc.

In the graph we add an edge for each one of the initial polymers with a length of 3 (k), from which the current two vertices emerged. We give the name of the initial polymer to the edge. For example, from the polymer ATG we take the vertices AT and ATG and connect them with an edge, which we call ATG. The following illustration shows the graph obtained in this example.

2069_Figure.png

Then, in order to find the original DNA chain, all we have to do is find a path within the graph which visits all the edges exactly once.A path like that is called an Eulerian trail and it uses each edge exactly once.It exists only if at each vertex v, the in-degree of v equals the out-degree of v.

To locate the desired path within the graph we constructed, we use the Hierholzer algorithm:

• We select a random starting node, v.
• We move from node to node until we return to v. The path we have found until now does not necessarily cover all the edges of the graph.
• As long as there is a vertex uthat belongs to the path that we have found but has edges that do not belong to the path:
o We start another path from u, using the edges that have not been used until now, until we get back to u. Then, we connect this path to the path we have already found.

If we use this algorithm on the graph above (picture), we will find the Euler path shown in the figure below:

1772_Figure1.png

Then, by traversing the path and joining the nodes, keeping their common base only once, we obtain the DNA sequence ATGGCGTGCA.Note that the sequence is cyclic: the AA node exists in the sequence cycling from the end to the beginning, so we could also take GGCGTGCAAT sequence, or any other that results from rotating the first sequence.

Also, depending on where we start to traverse the path and what choices we make when we have more than one outgoing edges, the starting and finishing points of the sequence can appear to be different.For example, if we start from the TG node, we can obtainthe TGGCAATGCG sequence, or any other that resultsfrom the rotation of this sequence.This does not bother us. The following figures show the cyclical nature of the sequence.

415_Figure2.png

The aim of this assignment is the creation of a program that will assemble DNA chains, using the fragments given.

PROGRAM REQUIREMENTS

Each student will work in his personal repository on GitHub. In order for an assignment to be properly assessed, it must meet the following conditions:

1. The assignment must be contained in a list (folder) called"assignment-2016-2"within the student's repository.

2. The program must be called "dna_assembly.py".

3. The use of ready-made graph libraries or any ready implementations of algorithms or parts of them is strictly forbidden.

4. The program must be called as follows:
python dna_assembly.py fragments_file

The "fragments_file" parameter gives the name of the file in which the fragments are stored (this does not mean that the file MUST be called "fragments_file", the user can give any name he likes).

The file that contains the fragments must have the following format:

ATG
GTG
TGG
GGC
GCG
CGT
GCA
TGC
CAA
AAT
ATG
AAT

That means it has one polymer per line.

Output Description

Caution: if the output is not exactly consistent with its description,the assignment cannot be assessed.

The program should display as an output the DNA sequence that has been identified. So for example, the output will be:

ATGGCGTGCA

or, as mentioned above, another sequence that is equivalent to "ATGGCGTGCA" (since it is cyclic), such as:

TGGCAATGCG

Or

AATGGCGTGC

Reference no: EM131030777

Questions Cloud

Problem regarding the market analysis : Problem: The investors have asked that you consider running your analysis for at least 5 years. Your current widgets weigh 300 grams and the market analysis shows that you can sell the product for $3.15/widget.
Distinguish among joint tenancy with right of survivorship : Discuss and distinguish among joint tenancy with the right of survivorship, tenancy in common, and tenancy by the entirety. Is one form of ownership more advantageous than the others? Explain.
Contrast the critical success factors : Contrast the critical success factors (CSFs) and SWOT (i.e., strengths, weaknesses, opportunities, and threats) approaches for assessing opportunities as part of a strategic IS planning process. Under what circumstances might one of these approaches ..
Problem regarding the confusing numbers : Then I remembered that you use that funny LIFO retail inventory method where you play with such confusing numbers. Will the purchase reduce our retail ratio, or whatever you call it, so that our inventory is lower and cost of goods sold higher, be..
Creation of a program that will assemble dna chains : The aim of this assignment is the creation of a program that will assemble DNA chains, using the fragments given
Determine the minimum time needed for the plate temperature : Initially, the iron is in thermal equilibrium with the ambient air at 22°C. Assuming 85 percent of the heat generated in the resistance wires is transferred to the plate, determine the minimum time needed for the plate temperature to reach 140°C.
List the state where the law is effective : List the elements of the crime and facts that establish each element. Provide a specific law for the charge. List the state where the law is effective.
Determine the minimum time needed for the plate temperature : Initially, the iron is in thermal equilibrium with the ambient air at 22°C. Assuming 85 percent of the heat generated in the resistance wires is transferred to the plate, determine the minimum time needed for the plate temperature to reach 140°C.
Determine the price of the io and po security : Determine the price of the IO and PO security, assuming that there is no prepayment. Further assume that the IO investors require a market rate of return of 4.5% and the PO investors require a market return of 6%.

Reviews

Write a Review

 

Python Programming Questions & Answers

  Reinforce topic material on simple functions

Select 3 sets of test data that will demonstrate the correct 'normal' operation of your program. Select another 2 sets of test data that will demonstrate the "abnormal" operation of your program.

  1 one factor that leads to a decline in biodiversity is the

1. one factor that leads to a decline in biodiversity is the introduction of non-native species. however most species

  Write a python program that reads a dumbbasic program

You should write a Python program that reads a DUMBBASIC program from standard input1, and prints the results of executing that program to standard output.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

  The integers should be printed in order with addresses

The integers should be printed in order with addresses from main. When the steps of the function order have been completed the smallest value will be stored in a, the middle in b, and the largest in c.

  Development on windows and linux systems

develop a simple, data-intensive application in Python - Data Analysis of a Document Tracker

  Function by sum of sines

code the program using an editor (DO NOT COMMAND LINE THE PROGRAM) and show the results

  We would like to implement the lexical order

We would like to implement the lexical order for lists. For simplicity, we only consider lists of numbers, where , >= have their usual meaning.

  Project title email spam filterabstractanalyze the emails

project title email spam filterabstractanalyze the emails and predict whether the mail is a spam or not a spam.to work

  Write a loop that counts the number of space

Write a loop that counts the number of space characters in a string. Recall that the space character is represented a

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Asign true to the variable has_dups

Asign True to the variable has_dups if the string s1 has any duplicate character (that is if any character appears more than once) and False otherwise

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd