1let s1 and s2 be two strings of lengths m and n

Assignment Help Theory of Computation
Reference no: EM13346868

1.

Let s1 and s2 be two strings of lengths m and n respectively. By de?nition, a superstring of s1 and s2 is one which contains s1 and s2 as substrings. Give a dynamic programming algorithm to compute a shortest superstring of two strings, and analyze its complexity.

Example: If s1 = aaaagatacac and s2 = acacggtttt then the following are all valid superstrings:
{aaaagatacacggtttt, aaaagatacacacggtttt, aaaagatacacacacggtttt}

However, aaaagatacacggtttt is the shortest. Note that the above formulation does not allow any variations (mismatches or gaps) within the overlapping region.

The motivating application for this problem is genome assembly, where the goal is to reconstruct an unknown supersequence by assembling smaller (known) fragments obtained from it.

2. Give a dynamic programming algorithm for the problem of ?nding an optimal local alignment between a string and itself - ie., we wish to ?nd a pair of best aligning substrings within s.

Note that we cannot directly use the Smith-Waterman local alignment algorithm because it will then only output the entire string aligned to itself as its ?nal answer, which is not the answer we want here. We are after a pair of shorter substrings within s. To make this routine more biologically relevant, you can assume that any optimally aligning pair of substrings that your algorithm detects should NOT overlap within the string - i.e., your optimal local alignment should correspond to two substrings s[i1 . . . i2] and s[j1 . . . j2], such that i2 < j1 (without loss of generality).

The motivating application for this problem is to ?nd "genomic repeats", which are repetitive substrings (with a few variations) present in different loci along a long genome.

3.

How will you modify the linear space Hirschberg technique to work for optimal local alignment computation in linear space (both opt. local alignment score & traceback path)? Explain the main steps of your new (modi?ed) algorithm. No need to expand on parts that are identical to the version of the algorithm discussed for global alignment computation in class.

4.

A q-gram1 is de?ned as a string of length q, where q > 0 is a "small" constant (say, q < 10). Given a string s, let Qq s denote the set of all q-grams in s. Given two strings s1 and s2 of lengths m and n respectively, their q-gram distance is the sum of the number of unique q-grams in each - i.e., qd(s1, s2) = |Qq s1\Qq s2 | + |Qq s2\Qq s1

(Note: the pipe symbol '|' in this expression stands for set cardinality - not absolute values; also the \ symbol stands for set difference.)

a) Give an algorithm to compute qd(s1, s2).

b) Derive a tight lower-bound for edit distance between two strings as a function of their q-gram distance.

c) Using the above results, argue how we can save time in practice in the following scenario: we are interesting in knowing the (optimal) edit distance between two strings only if it is below a certain cutoff, say τ . Otherwise, we don't care what is output.

5.

Given a pattern P of length m and a text T of length n, give an algorithm to ?nd and report all occurrences of P in T using the look-up table data-structure. You can assume that n > m > k, where k is the window length used to build the lookup table.

6.

The nested pairing problem:

Let s be a DNA sequence of length n, as illustrated in Figure 1. We de?ne "pairing" as a set of disjoint pairs of indices in s. A pairing is "proper" only if it is one of the following four pairs: {(a, t), (t, a), (c, g), (g, c)}. Any proper pairing becomes a "nested pairing" when the
string index intervals covered by no two pairs overlap. Put another way with reference to the Figure 1, no two edges should criss-cross.
The problem for this question is to ?nd an optimal nested pairing - i.e., a nested pairing with maximum cardinality.

Give an e?cient algorithm to compute an optimal nested pairing using dynamic programming and comment on its runtime and space requirements.

PS: As a reference, my solution for this problem takes O(n3) time.

496_Compute a shortest superstring.png

Figure: Illustration of a nested pairing of a string s of length n. Each pair is shown as an edge connecting those two character locations along the string. Note that the sequence shown s is *not* a circular string, as the start index (1) and end index (n) are clearly marked. It is only shown as circular for ease of illustrating the nested pairing.

Reference no: EM13346868

Previous Q& A

  Create a cost-benefit analysis to evaluate the projectthe

create a cost-benefit analysis to evaluate the projectthe state of massachusetts would like to replace a national guard

  Q1 figure shows a scale drawing of the temperature profile

q1 figure shows a scale drawing of the temperature profile through a composite plane wall.i identify the layer with the

  Experiment can be used to investigate heat transfer

experiment can be used to investigate heat transfer associated with flow past cylindrical tubes either in isolation or

  1- a region is a geographical area which possesses certain

1- a region is a geographical area which possesses certain characteristics e.g. political economic physical which give

  Compares the number of comparisons used by various data

compares the number of comparisons used by various data structures for a single algorithm. the algorithm is the one

  1harveys muffler offers a full refund to anyone who is not

1.harveys muffler offers a full refund to anyone who is not satisfied with thenbsp replacement of mufflers. the owners

  Based on the ppars projectapply the core is capabilities

based on the ppars projectapply the core is capabilities framework as described by feeny and willcocks 1998 to the

  This sample webpages aim is to will serve most of the

this sample webpages aim is to will serve most of the audience including elder people and people with disabilities. it

  1 in john stossels article in praise of price gouging

1. in john stossels article in praise of price gouging stossel argues that a law banning price gouging would result in

  Write paper on inventory management systemreport

write paper on inventory management system.report containsabstractacknowledgementintroductionmotivation for the project

Reviews

Write a Review

 

Similar Q& A

  Each part of this problem that the eax register

Assume for each part of this problem that the EAX register contains 00 00 00 4F and the doubleword referenced by value contains FF FF FF 38. Determine whether each of the conditional jump statements causes a jump to dest.

  Explaining syntactically legal boolean expression

In this problem, we consider a very restricted subset of Boolean expressions. Define an operator to be one of  the four symbols: ¬, ∧, ∨, and →. Define a variable to be one of the five symbols

  Explanation of turing machine

Devise a Turing machine with input given in unary notation such that the equipments produces the following output, 0 if x is divisible by 4,

  1using suffix trees give an algorithm to nd a longest

1.using suffix trees give an algorithm to nd a longest common substring shared among three input strings. s1 of length

  Compute a shortest superstring

Dynamic programming algorithm to compute a shortest superstring.

  Front end and back end processes of office automation

Discuss the difference between the front end and back-end processes of office automation? Provide some examples in your workplace or that you come into contact with?

  Equivalence classes to construct minimal dfa for language

How many equivalence classes does this relation have and what are they? Use these equivalence classes to construct the minimal DFA for the language.

  Show polynomial-time algorithm for gdp

Goal is to find expedition of maximum profit. Either show that there exists polynomial-time algorithm for GDP, or show that corresponding decision problem is NP-complete.

  Create mealy type state machine with input and output

Create the Mealy type state machine with input X and output Y. Y must be 1 whenever sequence 110 or 101 has been detected on X on last 3 consecutive rising clock edges.

  Write algorithm for finding useless-productive nonterminals

Write down the algorithm for finding useless/productive nonterminals. Describe how this gives you the algorithm for whether language generated by grammar is empty.

  Many programs require the use of an input

Many programs require the use of an input mechanism to get data into the program and an output mechanism to present results and guidance.

  Communication process using a particular computer device

The enhancement of communication process using a particular computer device or software application by the people.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd