Creation of a program that will assemble dna chains

Assignment Help Python Programming
Reference no: EM131030777

Genome Assembly

One of the most significant developments of the recent decades was the deciphering of the human genome. The techniques used in this effort can now be used to investigate genetic illnesses, identification of mutations, understanding genomes of ancient organisms that have disappeared, and in numerous other applications.

Genes are encoded within the DNA, a large organic molecule which consists of double chains of four bases: cytosine (cytosine, C), guanine (guanine, G), adenine (adenine, A), and thymine (thymine, T).

Each of the two chains comprises a series of bases, e.g. ACCGTATAG. The other chain consists of bases connected one by one to the first chain bases, with the rules: A-T, C-G. Thus if a chain is ACCGTATAG, the other will be TGGCATATC.

To find the composition of an unknown DNA chain, we work as follows. Using molecular biology techniques, we create multiple copies of the chain and break it into small fragments, for example fragments consisting of three bases each.

Such fragments can easily be identified with special devices. So we end up with a set of known fragments. The problem we then face is to assemble the known fragments into the DNA chain, so that we can learn its exact composition.

Suppose that we have these fragments, or polymersas they are called: GTG, TGG, ATG, GGC, GCG, CGT, GCA, TGC, CAA, AAT, ATG, AAT. Each of them has a length of 3, but generally theycould have a "k" length.To find out from which DNA sequencethey (the fragments) emerged, we create a graph whosevertices are polymers with a length of 2 (or generally k-1),which result from polymers with a length of 3, taking for each polymer with a length of 3, the first 2 (k-1) and the last 2 (k-1) polymers.

Thatway, from GTG wewilltake GT and TG, from TGG we will take TG and GG, etc.

In the graph we add an edge for each one of the initial polymers with a length of 3 (k), from which the current two vertices emerged. We give the name of the initial polymer to the edge. For example, from the polymer ATG we take the vertices AT and ATG and connect them with an edge, which we call ATG. The following illustration shows the graph obtained in this example.

2069_Figure.png

Then, in order to find the original DNA chain, all we have to do is find a path within the graph which visits all the edges exactly once.A path like that is called an Eulerian trail and it uses each edge exactly once.It exists only if at each vertex v, the in-degree of v equals the out-degree of v.

To locate the desired path within the graph we constructed, we use the Hierholzer algorithm:

• We select a random starting node, v.
• We move from node to node until we return to v. The path we have found until now does not necessarily cover all the edges of the graph.
• As long as there is a vertex uthat belongs to the path that we have found but has edges that do not belong to the path:
o We start another path from u, using the edges that have not been used until now, until we get back to u. Then, we connect this path to the path we have already found.

If we use this algorithm on the graph above (picture), we will find the Euler path shown in the figure below:

1772_Figure1.png

Then, by traversing the path and joining the nodes, keeping their common base only once, we obtain the DNA sequence ATGGCGTGCA.Note that the sequence is cyclic: the AA node exists in the sequence cycling from the end to the beginning, so we could also take GGCGTGCAAT sequence, or any other that results from rotating the first sequence.

Also, depending on where we start to traverse the path and what choices we make when we have more than one outgoing edges, the starting and finishing points of the sequence can appear to be different.For example, if we start from the TG node, we can obtainthe TGGCAATGCG sequence, or any other that resultsfrom the rotation of this sequence.This does not bother us. The following figures show the cyclical nature of the sequence.

415_Figure2.png

The aim of this assignment is the creation of a program that will assemble DNA chains, using the fragments given.

PROGRAM REQUIREMENTS

Each student will work in his personal repository on GitHub. In order for an assignment to be properly assessed, it must meet the following conditions:

1. The assignment must be contained in a list (folder) called"assignment-2016-2"within the student's repository.

2. The program must be called "dna_assembly.py".

3. The use of ready-made graph libraries or any ready implementations of algorithms or parts of them is strictly forbidden.

4. The program must be called as follows:
python dna_assembly.py fragments_file

The "fragments_file" parameter gives the name of the file in which the fragments are stored (this does not mean that the file MUST be called "fragments_file", the user can give any name he likes).

The file that contains the fragments must have the following format:

ATG
GTG
TGG
GGC
GCG
CGT
GCA
TGC
CAA
AAT
ATG
AAT

That means it has one polymer per line.

Output Description

Caution: if the output is not exactly consistent with its description,the assignment cannot be assessed.

The program should display as an output the DNA sequence that has been identified. So for example, the output will be:

ATGGCGTGCA

or, as mentioned above, another sequence that is equivalent to "ATGGCGTGCA" (since it is cyclic), such as:

TGGCAATGCG

Or

AATGGCGTGC

Reference no: EM131030777

Questions Cloud

Problem regarding the market analysis : Problem: The investors have asked that you consider running your analysis for at least 5 years. Your current widgets weigh 300 grams and the market analysis shows that you can sell the product for $3.15/widget.
Distinguish among joint tenancy with right of survivorship : Discuss and distinguish among joint tenancy with the right of survivorship, tenancy in common, and tenancy by the entirety. Is one form of ownership more advantageous than the others? Explain.
Contrast the critical success factors : Contrast the critical success factors (CSFs) and SWOT (i.e., strengths, weaknesses, opportunities, and threats) approaches for assessing opportunities as part of a strategic IS planning process. Under what circumstances might one of these approaches ..
Problem regarding the confusing numbers : Then I remembered that you use that funny LIFO retail inventory method where you play with such confusing numbers. Will the purchase reduce our retail ratio, or whatever you call it, so that our inventory is lower and cost of goods sold higher, be..
Creation of a program that will assemble dna chains : The aim of this assignment is the creation of a program that will assemble DNA chains, using the fragments given
Determine the minimum time needed for the plate temperature : Initially, the iron is in thermal equilibrium with the ambient air at 22°C. Assuming 85 percent of the heat generated in the resistance wires is transferred to the plate, determine the minimum time needed for the plate temperature to reach 140°C.
List the state where the law is effective : List the elements of the crime and facts that establish each element. Provide a specific law for the charge. List the state where the law is effective.
Determine the minimum time needed for the plate temperature : Initially, the iron is in thermal equilibrium with the ambient air at 22°C. Assuming 85 percent of the heat generated in the resistance wires is transferred to the plate, determine the minimum time needed for the plate temperature to reach 140°C.
Determine the price of the io and po security : Determine the price of the IO and PO security, assuming that there is no prepayment. Further assume that the IO investors require a market rate of return of 4.5% and the PO investors require a market return of 6%.

Reviews

Write a Review

Python Programming Questions & Answers

  Find a python script

Find a python script

  Write a program that accepts as input a sentence

Write a program that accepts as input a sentence in which all of the words are run together but the first character of each word is uppercase. Convert the sentence to a string in which the words are separated by space and only the first word start..

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Function should return a dictionary

Write a function numOccur(s), where s is a string; the function should return a dictionary whose keys are the 26 ascii letters abcdefghijklmnopqrstuvwxyz

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Design reusable parameterised functions

create multiple icon styles that can be drawn at different sizes, it would be very repetitive if you tried to code the whole solution using ‘brute force.' Instead you are strongly encouraged to design reusable parameterised functions to draw the i..

  Aussie best car abcdeclares that based on its yearly sales

aussie best car abcdeclares that based on its yearly sales it will award a bonus as follows. the bonus will be equally

  Improve the code for the haunted house game

Improve the readability of the code by improving the function names, variables, and loops, as well as whitespace. Document these changes in your journal and define a win condition for the game, for example, collecting all items and returning to the..

  You are tasked with improving the code for the haunted

you are tasked with improving the code for the haunted house game. please read the associated hand-out and the code

  Same directory as your program

In the same directory as your program, create a file FF1, and write into it Hello (with a space at teh end). Similarly, create a file FF2, and write into it world! (with a new line, i.e., an ENTER at the end). And create a file DD and write into i..

  Q1if we knew all the ecological social and competitive

q1if we knew all the ecological social and competitive forces that regulate populations and in reality we couldnt what

  Design an application in python

Design an application in Python- The application would generate a set of 100 integers in a random manner in the range of 50 to 150 including both the numbers in the range

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd