Calculate middle index middle between start and end index

Assignment Help Computer Engineering
Reference no: EM13910219

Suffix table

Please read the following text carefully and use it to reply to the six queries at the end of the problem.

Finding a given character string (a so-called pattern) in a longer character string (a text) is one of the basic applications of computer science. You can implement the search by going through the whole text from start to finish while checking whether the pattern you are searching for is present. This is a very slow way of doing it if the text is very large. You can make the search quicker by creating an index structure based on the text beforehand. With the help of the index structure, you can avoid going through the whole text. One simple and frequently used index structure is called a suffix table. This method has been used for efficient searching in large DNA data collections, for example, such as the human genome with some 3 billion characters.

A character string consists of characters following each other. You can refer to a character string with a character string variable. You can refer to certain characters in the string by giving the index of the character within square brackets following the character string variable. For instance, x[1] refers to the first character in the string x. The annotation x[i..j] refers to the partial character string in x that starts at index i and ends at index j. A partial character string that continues to the end of the character string is called a suffix. The starting index of the suffix is called the index of the suffix. We use <, ≤, =, ≥, > for the character strings x and y to compare their alphabetical order. For example, x ≤ y means that x is alphabetically smaller than or as large as y, and x = y means that x and y are the same character string.

Example 1.

If the character string x is "group" it is true that:
• x[1] = " g", x[2] = " r" and x[4] = " u".
• x[1..3] = " gro", x[3..3] = " o" and x[2..5] = " roup".

• The suffixes to the character string x given in order according to the suffix indexes, i.e. according to the start indexes 1, 2, 3, 4 and 5, are x[1..5] = " group", x[2..5] = " roup", x[3..5] = " oup", x[4..5] = " up" and x[5..5] = " p".

We refer to a text with the character string variable t and to the pattern we are looking for in the text with the string variable p. Furthermore, we assume that the length of the text t is n.

The suffix table S for text t is an integer table with n elements, which lists the indexes of the suffixes in the text in alphabetical order of the suffixes. The value of index i in table S, annotated as S[i], indicates from which position in the text the ith smallest index suffix starts: S[1] indicates the index of the smallest suffix (in alphabetical order) in the text, S[2] the index for the next suffix in alphabetical order, etc. You can also express this as follows: for index i = 2, . . . , n it is true that t[S[i - 1]..n] ≤ t[S[i]..n].

Example 2. The suffixes for text t = " abababba" are " abababba", " bababba", " ababba", " babba", " abba", " bba", " ba" and " a". Below to the left are the suf- fixes for text t in alphabetical order, and to the right is the suffix table S for text t. Please note that the values of the suffix table are the same as the suf- fix indexes to the left. For example, the value S[5] = 7 tells us that the fifth largest suffix alphabetically starts from index 7, i.e. t[7..8] = is " ba".

Suffix   (alphabetical order)

a

Suffix  index

8

Index i

1

S[i]

8

abababba

1

2

1

ababba

3

3

3

abba

5

4

5

ba

7

5

7

bababba

2

6

2

babba

4

7

4

bba

6

8

6

A basic suffix feature is that, if pattern p occurs anywhere in the text, then p occurs as the start of the suffix that starts at that point in the text. In that case, we say that pattern p matches the suffix in question. We can search for pattern p in the text by searching for a suffix that matches p. With the help of the suffix table, this search can be made efficient by using a so-called binary search.

The binary search maintains information on a suffix table interval that, in accordance with our current information, could contain a suffix that matches pattern p. We use the annotation start for the start index of the search interval, and end for the end index. Further, we will determine the middle index middle with the formula middle = (start + end)/2, which we will round upwards if the sum of (start + end) is uneven. At first, all the suffixes are possible, so start = 1 and end = n.

The binary search compares the pattern and suffix in the middle of the search interval t[S[middle]..n] to each other. If the pattern matches, the search can end1 . Otherwise, either p < t[S[middle]..n] or p > t[S[middle]..n] is true.

If p < t[S[middle]..n], i.e. the pattern is alphabetically smaller than the suffix in index S[middle], none of the suffixes in interval middle, . . . , end can match the pattern. This follows directly from the suffix table containing the suffixes in alphabetical order. In that case, the binary search updates the upper bound of the search interval to end = middle - 1 and compares the pattern to the suffixes in the middle of the new search interval during the next search round.

In the same way, if p > t[S[middle]..n], we can say that none of the suffixes in the interval start, . . . , middle can match the pattern. Then we can update the lower bound to start = middle + 1.

If the search interval remains empty during the search, i.e. the criterion in start > end is met, the search ends with no result: the text does not contain pattern p. Below we see the search for pattern p in text t with the help of suffix table S in more detail, step by step:

1. Set for the search interval start index start = 1 and end index end = n.
2. If start > end i.e. the search interval is empty, end the search: pattern p does not occur in the text.
3. Calculate the middle index middle between start and end index: middle = (start + end)/2, round off upwards if necessary.
4. Compare pattern p with middle suffix t[S[middle]..n] of the interval.
• If p matches the start of t[S[middle]..n], end the search with the information that pattern p was found starting at index S[middle] of the text.
• If p does not match the middle suffix t[S[middle]..n] of the interval, then:
- If p < t[S[middle]..n], update end = middle - 1 and continue searching by going back to .
- If p > t[S[middle]..n], update start = middle + 1 and continue the search by going back to 4.

Queries

Query 1. Give the steps a binary search makes when searching for the pattern p = " babaa" in the text in the example 3 t = " abbaababbababbab". Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.

Query 2. Write the suffix table for the character string t = " assignment".

Query 3.

a) Write the suffix table for the character string t = " aacatcgatagctagaacat".

b) Write the steps a binary search takes when searching for the pattern p = " cga" in the text in a). Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.

Query 4. Please write a character string consisting of characters in the English alphabet with a suffix table like the one below. This means that characters in the alphabet" a", " b", " c", . . ., " z" are acceptable.

i

S[i]

1

8

2

7

3

6

4

5

5

4

6

3

7

2

8

1

Query 5. Write a character string that contains only the characters " a" and" b" with a suffix table like the one below.

i

S[i]

1

16

2

13

3

3

4

14

5

11

6

9

7

4

8

6

9

15

10

12

11

2

12

10

13

8

14

5

15

1

16

7

Query 6. Program the algorithm given on page 3. Test your program using the example 3 and queries 2-5.

Reference no: EM13910219

Questions Cloud

Find a general formula for given expression : find a general formula for f(x)= [f(x+h)-f(x)]/h as h→0 (steps 1 through 3 performed). factor the expression in part b. with one factor equal to (sinθ)/θ. (find remaining factor).
Regulatory agencies, and governmental fac : Research web sites and identify, investigate three (3) sites that highlight consumer rights, activist groups, consumer regulatory agencies, and governmental fact based portals providing research information and data that can be helpful in formulating..
Concentration of the diluted solution : A 100 mL sample of a 6.0 mol/L solution of H2SO4 is diluted to a final volume of 500 mL. What is the concentration of the diluted solution?
Question regarding the solution of sodium chloride : Explain the steps you would follow to make 500 mL of a 0.5 mol/L solution of sodium chloride (NaCl) in the lab.
Calculate middle index middle between start and end index : Write the steps a binary search takes when searching for the pattern p = " cga" in the text in a). Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.
Determining the equation of the reaction : The thermite reaction has been used to weld railroad rails, make bombs, and ignite solid rocket fuel. The equation of the reaction is:
What is the domain for the following function : 4: What is the domain for the following function? Y = 4/x^2 + 1 A: (x ≠ 3) B: (x ≠ 0) C: (all real numbers)  D: (x ≠ - 3)
Calculate the mass of copper formed : If 10.00 g of magnesium is reacted with 95.75 g of copper (II) sulphate, magnesium sulphate and copper are formed.
Discuss the overall purpose people have for investing : Discuss the overall purpose people have for investing. Define investment. Discuss why you would expect the saving-borrowing pattern to differ by occupation for example

Reviews

Write a Review

Computer Engineering Questions & Answers

  Determine the new optimum solution

Identify the new solution space, and determine the new optimum solution - Determine the new optimum solution and please don't copy and paste from Google.

  Servers have on the traditional sdlc

express in your own words the advantages of specifying pre-conditions, post-conditions, and invariants. How, specifically, do they help to increase the quality of functions.

  Drawbacks of supporting links to files that cross mount

explain the advantages and disadvantages of supporting links to files that cross mount points (that is, the file link refers to a file that is stored in a different volume).

  Design the requires and the provides interfaces of at least

as the lead software engineer for a medium-sized hospital you have been asked to spearhead an effort to improve the

  Build appropriate functions for these classes

A CollegeCourse class includes fields representing department, course number, credit hours, and tuition. Its child, LabCourse, includes one more field that holds a lab fee charged in addition to the tuition.

  What is bitlocker technology

What is BitLocker technology. Why is it used in the simulation. What are Windows Deployment Services.

  Cloud computing to the rescuewrite a two to three 4-5 page

cloud computing to the rescuewrite a two to three 4-5 page paper in which you1. describe the hardware software and

  Questionassume this loop is taken many times what is

questionassume this loop is taken many times what is steady-state cpi of this loop on the scalar pipeline discussed in

  Design your own c class keep it simple however use the

design your own c class. keep it simple but use the readings this week to include a constructor members and methods. be

  Review sectioncontemporary hardware platform trends and

review sectioncontemporary hardware platform trends and section contemporary software platform trends in of management

  Load the file into the parallel arrays

Load the file into the parallel arrays and show the list of customers` names and phone numbers in the alphabetical order.

  Assume that queue is a queue type object

assume that queue is a queue type object and the size of the array-implementing queue is 100. Also, suppose that the value of the queueFront is 25 and the value of queueRear is 25.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd