1let s1 and s2 be two strings of lengths m and n

Assignment Help Theory of Computation
Reference no: EM13346868

1.

Let s1 and s2 be two strings of lengths m and n respectively. By de?nition, a superstring of s1 and s2 is one which contains s1 and s2 as substrings. Give a dynamic programming algorithm to compute a shortest superstring of two strings, and analyze its complexity.

Example: If s1 = aaaagatacac and s2 = acacggtttt then the following are all valid superstrings:
{aaaagatacacggtttt, aaaagatacacacggtttt, aaaagatacacacacggtttt}

However, aaaagatacacggtttt is the shortest. Note that the above formulation does not allow any variations (mismatches or gaps) within the overlapping region.

The motivating application for this problem is genome assembly, where the goal is to reconstruct an unknown supersequence by assembling smaller (known) fragments obtained from it.

2. Give a dynamic programming algorithm for the problem of ?nding an optimal local alignment between a string and itself - ie., we wish to ?nd a pair of best aligning substrings within s.

Note that we cannot directly use the Smith-Waterman local alignment algorithm because it will then only output the entire string aligned to itself as its ?nal answer, which is not the answer we want here. We are after a pair of shorter substrings within s. To make this routine more biologically relevant, you can assume that any optimally aligning pair of substrings that your algorithm detects should NOT overlap within the string - i.e., your optimal local alignment should correspond to two substrings s[i1 . . . i2] and s[j1 . . . j2], such that i2 < j1 (without loss of generality).

The motivating application for this problem is to ?nd "genomic repeats", which are repetitive substrings (with a few variations) present in different loci along a long genome.

3.

How will you modify the linear space Hirschberg technique to work for optimal local alignment computation in linear space (both opt. local alignment score & traceback path)? Explain the main steps of your new (modi?ed) algorithm. No need to expand on parts that are identical to the version of the algorithm discussed for global alignment computation in class.

4.

A q-gram1 is de?ned as a string of length q, where q > 0 is a "small" constant (say, q < 10). Given a string s, let Qq s denote the set of all q-grams in s. Given two strings s1 and s2 of lengths m and n respectively, their q-gram distance is the sum of the number of unique q-grams in each - i.e., qd(s1, s2) = |Qq s1\Qq s2 | + |Qq s2\Qq s1

(Note: the pipe symbol '|' in this expression stands for set cardinality - not absolute values; also the \ symbol stands for set difference.)

a) Give an algorithm to compute qd(s1, s2).

b) Derive a tight lower-bound for edit distance between two strings as a function of their q-gram distance.

c) Using the above results, argue how we can save time in practice in the following scenario: we are interesting in knowing the (optimal) edit distance between two strings only if it is below a certain cutoff, say τ . Otherwise, we don't care what is output.

5.

Given a pattern P of length m and a text T of length n, give an algorithm to ?nd and report all occurrences of P in T using the look-up table data-structure. You can assume that n > m > k, where k is the window length used to build the lookup table.

6.

The nested pairing problem:

Let s be a DNA sequence of length n, as illustrated in Figure 1. We de?ne "pairing" as a set of disjoint pairs of indices in s. A pairing is "proper" only if it is one of the following four pairs: {(a, t), (t, a), (c, g), (g, c)}. Any proper pairing becomes a "nested pairing" when the
string index intervals covered by no two pairs overlap. Put another way with reference to the Figure 1, no two edges should criss-cross.
The problem for this question is to ?nd an optimal nested pairing - i.e., a nested pairing with maximum cardinality.

Give an e?cient algorithm to compute an optimal nested pairing using dynamic programming and comment on its runtime and space requirements.

PS: As a reference, my solution for this problem takes O(n3) time.

496_Compute a shortest superstring.png

Figure: Illustration of a nested pairing of a string s of length n. Each pair is shown as an edge connecting those two character locations along the string. Note that the sequence shown s is *not* a circular string, as the start index (1) and end index (n) are clearly marked. It is only shown as circular for ease of illustrating the nested pairing.

Reference no: EM13346868

Questions Cloud

Leadership and transformationhow does your transformational : leadership and transformationhow does your transformational experience illustrate - or refute - a leadership
You are the nal voter in a brand new start-up league the : you are the ?nal voter in a brand new start-up league the ultra fun foosball league uffl. the directors are looking to
Calculate yield to maturity ytm and bond prices a : calculate yield to maturity ytm and bond prices. a continuous compounding convention for ytm calculations should be
Theory of interestnpv irr nominal and real amortization : theory of interestnpv irr nominal and real amortization sinking fund twrr dwrr1. given npv-1000500v3800v8 and a rate of
1let s1 and s2 be two strings of lengths m and n : 1.let s1 and s2 be two strings of lengths m and n respectively. by de?nition a superstring of s1 and s2 is one which
Create a cost-benefit analysis to evaluate the projectthe : create a cost-benefit analysis to evaluate the projectthe state of massachusetts would like to replace a national guard
Q1 figure shows a scale drawing of the temperature profile : q1 figure shows a scale drawing of the temperature profile through a composite plane wall.i identify the layer with the
Experiment can be used to investigate heat transfer : experiment can be used to investigate heat transfer associated with flow past cylindrical tubes either in isolation or
1- a region is a geographical area which possesses certain : 1- a region is a geographical area which possesses certain characteristics e.g. political economic physical which give

Reviews

Write a Review

Theory of Computation Questions & Answers

  Create a parser to check expression for allowable form

Find out its grammatical structure with respect to given formal grammar. You are needed to create a parser which will check expression for allowable form.

  What ambiguity exists in the statement

Suppose f is a function that returns the result of reversing the string of symbols given as its input, and g. What ambiguity exists in the statement x?

  Explain proof of rice-s theorem for infinite language

If you perform reduction in proof of Rice's theorem for special case of property P: "infinite language", does this reduction also show that language P L = { | N is Turing machine.

  Argue that the problem is np complete

Argue that the following prob is NP Complete. Given list of positive integers, u1,u2,...un (in binary representation) and asked if there is partition of this set into 3 subsets, each of which has same sum.

  Subset-sum problem

Calculate some number x= Sum - 2K. Create new set A by add x to the set B {b1, b2,....., bn} U {x}, where the summation now is B+x. it is possible to split the numbers in A into some subsets iff they can summing up to K:

  Show polynomial-time algorithm for gdp

Goal is to find expedition of maximum profit. Either show that there exists polynomial-time algorithm for GDP, or show that corresponding decision problem is NP-complete.

  How to construct an nfa

Give a construction that assumes you are given a DFA for L and show how to construct an NFA (with or without ε-moves) to recognize sort(L).

  Write mathematical formulation for non-terminal

Non-terminal A is useless if there is no derivation from start symbol to string of tokens in which A appears. Write a mathematical formulation of this property.

  Construct and dfa or lr items for grammar

Consider the following grammar: S S (S) | ε. Construct and DFA or LR(0) items for this grammar. Construct SLR(1) parsing table.

  Design and draw the state diagram

Design and draw the state diagram (graph-representation) of a deterministic finite-state automata that recognizes the language generated by the grammar

  Proving language to be pumping lemma

Show that the language F = {a^i b^j c^k | i, j, k greater than or equal to 0 and if i = 1 then j = k} is not regular. Show, however, that it satisfies the statement of the pumping lemma

  How to express correctness properties in ltl

Express the given correctness properties in LTL. Defne propositions/variables to model the events mentioned in the question. If a parent process calls the blocking waitpid() system call then it is blocked until child process terminates.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd