Calculate middle index middle between start and end index

Assignment Help Computer Engineering
Reference no: EM13910219

Suffix table

Please read the following text carefully and use it to reply to the six queries at the end of the problem.

Finding a given character string (a so-called pattern) in a longer character string (a text) is one of the basic applications of computer science. You can implement the search by going through the whole text from start to finish while checking whether the pattern you are searching for is present. This is a very slow way of doing it if the text is very large. You can make the search quicker by creating an index structure based on the text beforehand. With the help of the index structure, you can avoid going through the whole text. One simple and frequently used index structure is called a suffix table. This method has been used for efficient searching in large DNA data collections, for example, such as the human genome with some 3 billion characters.

A character string consists of characters following each other. You can refer to a character string with a character string variable. You can refer to certain characters in the string by giving the index of the character within square brackets following the character string variable. For instance, x[1] refers to the first character in the string x. The annotation x[i..j] refers to the partial character string in x that starts at index i and ends at index j. A partial character string that continues to the end of the character string is called a suffix. The starting index of the suffix is called the index of the suffix. We use <, ≤, =, ≥, > for the character strings x and y to compare their alphabetical order. For example, x ≤ y means that x is alphabetically smaller than or as large as y, and x = y means that x and y are the same character string.

Example 1.

If the character string x is "group" it is true that:
• x[1] = " g", x[2] = " r" and x[4] = " u".
• x[1..3] = " gro", x[3..3] = " o" and x[2..5] = " roup".

• The suffixes to the character string x given in order according to the suffix indexes, i.e. according to the start indexes 1, 2, 3, 4 and 5, are x[1..5] = " group", x[2..5] = " roup", x[3..5] = " oup", x[4..5] = " up" and x[5..5] = " p".

We refer to a text with the character string variable t and to the pattern we are looking for in the text with the string variable p. Furthermore, we assume that the length of the text t is n.

The suffix table S for text t is an integer table with n elements, which lists the indexes of the suffixes in the text in alphabetical order of the suffixes. The value of index i in table S, annotated as S[i], indicates from which position in the text the ith smallest index suffix starts: S[1] indicates the index of the smallest suffix (in alphabetical order) in the text, S[2] the index for the next suffix in alphabetical order, etc. You can also express this as follows: for index i = 2, . . . , n it is true that t[S[i - 1]..n] ≤ t[S[i]..n].

Example 2. The suffixes for text t = " abababba" are " abababba", " bababba", " ababba", " babba", " abba", " bba", " ba" and " a". Below to the left are the suf- fixes for text t in alphabetical order, and to the right is the suffix table S for text t. Please note that the values of the suffix table are the same as the suf- fix indexes to the left. For example, the value S[5] = 7 tells us that the fifth largest suffix alphabetically starts from index 7, i.e. t[7..8] = is " ba".

Suffix   (alphabetical order)

a

Suffix  index

8

Index i

1

S[i]

8

abababba

1

2

1

ababba

3

3

3

abba

5

4

5

ba

7

5

7

bababba

2

6

2

babba

4

7

4

bba

6

8

6

A basic suffix feature is that, if pattern p occurs anywhere in the text, then p occurs as the start of the suffix that starts at that point in the text. In that case, we say that pattern p matches the suffix in question. We can search for pattern p in the text by searching for a suffix that matches p. With the help of the suffix table, this search can be made efficient by using a so-called binary search.

The binary search maintains information on a suffix table interval that, in accordance with our current information, could contain a suffix that matches pattern p. We use the annotation start for the start index of the search interval, and end for the end index. Further, we will determine the middle index middle with the formula middle = (start + end)/2, which we will round upwards if the sum of (start + end) is uneven. At first, all the suffixes are possible, so start = 1 and end = n.

The binary search compares the pattern and suffix in the middle of the search interval t[S[middle]..n] to each other. If the pattern matches, the search can end1 . Otherwise, either p < t[S[middle]..n] or p > t[S[middle]..n] is true.

If p < t[S[middle]..n], i.e. the pattern is alphabetically smaller than the suffix in index S[middle], none of the suffixes in interval middle, . . . , end can match the pattern. This follows directly from the suffix table containing the suffixes in alphabetical order. In that case, the binary search updates the upper bound of the search interval to end = middle - 1 and compares the pattern to the suffixes in the middle of the new search interval during the next search round.

In the same way, if p > t[S[middle]..n], we can say that none of the suffixes in the interval start, . . . , middle can match the pattern. Then we can update the lower bound to start = middle + 1.

If the search interval remains empty during the search, i.e. the criterion in start > end is met, the search ends with no result: the text does not contain pattern p. Below we see the search for pattern p in text t with the help of suffix table S in more detail, step by step:

1. Set for the search interval start index start = 1 and end index end = n.
2. If start > end i.e. the search interval is empty, end the search: pattern p does not occur in the text.
3. Calculate the middle index middle between start and end index: middle = (start + end)/2, round off upwards if necessary.
4. Compare pattern p with middle suffix t[S[middle]..n] of the interval.
• If p matches the start of t[S[middle]..n], end the search with the information that pattern p was found starting at index S[middle] of the text.
• If p does not match the middle suffix t[S[middle]..n] of the interval, then:
- If p < t[S[middle]..n], update end = middle - 1 and continue searching by going back to .
- If p > t[S[middle]..n], update start = middle + 1 and continue the search by going back to 4.

Queries

Query 1. Give the steps a binary search makes when searching for the pattern p = " babaa" in the text in the example 3 t = " abbaababbababbab". Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.

Query 2. Write the suffix table for the character string t = " assignment".

Query 3.

a) Write the suffix table for the character string t = " aacatcgatagctagaacat".

b) Write the steps a binary search takes when searching for the pattern p = " cga" in the text in a). Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.

Query 4. Please write a character string consisting of characters in the English alphabet with a suffix table like the one below. This means that characters in the alphabet" a", " b", " c", . . ., " z" are acceptable.

i

S[i]

1

8

2

7

3

6

4

5

5

4

6

3

7

2

8

1

Query 5. Write a character string that contains only the characters " a" and" b" with a suffix table like the one below.

i

S[i]

1

16

2

13

3

3

4

14

5

11

6

9

7

4

8

6

9

15

10

12

11

2

12

10

13

8

14

5

15

1

16

7

Query 6. Program the algorithm given on page 3. Test your program using the example 3 and queries 2-5.

Reference no: EM13910219

Questions Cloud

Find a general formula for given expression : find a general formula for f(x)= [f(x+h)-f(x)]/h as h→0 (steps 1 through 3 performed). factor the expression in part b. with one factor equal to (sinθ)/θ. (find remaining factor).
Regulatory agencies, and governmental fac : Research web sites and identify, investigate three (3) sites that highlight consumer rights, activist groups, consumer regulatory agencies, and governmental fact based portals providing research information and data that can be helpful in formulating..
Concentration of the diluted solution : A 100 mL sample of a 6.0 mol/L solution of H2SO4 is diluted to a final volume of 500 mL. What is the concentration of the diluted solution?
Question regarding the solution of sodium chloride : Explain the steps you would follow to make 500 mL of a 0.5 mol/L solution of sodium chloride (NaCl) in the lab.
Calculate middle index middle between start and end index : Write the steps a binary search takes when searching for the pattern p = " cga" in the text in a). Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.
Determining the equation of the reaction : The thermite reaction has been used to weld railroad rails, make bombs, and ignite solid rocket fuel. The equation of the reaction is:
What is the domain for the following function : 4: What is the domain for the following function? Y = 4/x^2 + 1 A: (x ≠ 3) B: (x ≠ 0) C: (all real numbers)  D: (x ≠ - 3)
Calculate the mass of copper formed : If 10.00 g of magnesium is reacted with 95.75 g of copper (II) sulphate, magnesium sulphate and copper are formed.
Discuss the overall purpose people have for investing : Discuss the overall purpose people have for investing. Define investment. Discuss why you would expect the saving-borrowing pattern to differ by occupation for example

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd