##### Reference no: EM13522

**Problem 1 - Algorithm Attributes**

Suppose you and a group of friends decide to build a pyramid of ice blocks. The base will consist of 10 rows and 10 columns of blocks. The next level will be 9 rows and 9 columns. The next level will be 8 by 8, and so on until the top level is just a single block. Here is an algorithm to calculate how many ice blocks you will need total:

Input: None.

Output: The total number of ice blocks needed for an ice pyramid with a 10 by 10 base.

1 Set total = 0

2 For i = 1 to 10

3 Set total = total + i * i

4 Print total

5 Stop

Here is an alternative algorithm that relies on a formula: if n is any positive integer, then

Input: None.

Output: The total number of ice blocks needed for an ice pyramid with a 10 by 10 base.

1 Print 10 * (10 + 1) * (2*10 + 1)/6

2 Stop

Now answer the following questions about these two algorithms.

(a) One algorithm attribute (brie?y) discussed in class was readability. Which of the two algorithms do you ?nd easier to understand? Why?

(b) Another attribute was efficiency. Which of the two algorithms is more efficient? To determine this, count the number of operations done in each. In counting operations, count each addition, multiplication, or division as one operation; don't include adding 1 to i (done implicitly by the for loop) in the operation count; report the total number of operations for each algorithm; and identify which algorithm takes fewest operations.

(c) Another attribute was generality. One problem with these algorithms are they are not general. That is, suppose that instead of a 10 by 10 base you decided to use an 8 by 8 base. You'd need to rewrite the algorithms to compute the total number of blocks. And if you decided to use a 13 by 13 base, you'd need to rewrite them again. And so on. You can solve this problem by rewriting the algorithms so the input is a positive integer n that gives the number of rows (equivalently columns) in the ice pyramid's base. Rewrite both algorithms so they compute the total number of ice blocks for an n by n base. In your answer write the entire modi?ed algorithms rather than just listing the changes.

(d) What is the time complexity for the generalized version of the ?rst algorithm? Write your answer as a big-Θ estimate in terms of n. Your big-Θ estimate should be as simple as possible; for example if the operation count was 2n2 + 17 write Θ(n^{2}).

(e) What is the time complexity for the generalized version of the ?rst algorithm? Again, write your answer as a big-Θ estimate that is as simple as possible.

**Problem 2 **

For each of the answers below, set up the problem, and then calculate the answer as a speci?c number. You may use a calculator, and may give large numbers as approximations rather than as their exact value. Additionally, classify the size of each numeric answer according to the following:

- "some": numbers less than one hundred.
- "large": numbers greater than or equal to one hundred, but less than a million.
- "colossal": numbers greater than or equal to a million, but less than a billion.
- "mammoth": numbers greater than or equal to a billion, but less than a trillion.
- "gargantuan": numbers greater than or equal to a trillion.

Suppose you are investigating a computer crime. You have partial information, but are working on establishing a communication timeline for the suspect. You know the suspect sent out 12 email messages during the time period you are investigating. Each email message went to a single person (so no cc'ing, etc.). However, because the suspect and his associates used anonymity tools you do not know which email messages were sent to whom when. Answer the following questions. Remember to classify your answers as mentioned above.

(a) Suppose the suspect had 20 known associates, and each email message went to one of these associates. How many ways are there to select 12 associates out of this group of 20 if order matters and no associate received more than one email message?

(b) How many ways are there in problem (a) if order does not matter and no associate received more than one email message?

(c) How many ways are there in problem (a) if order matters, but associates might have received more than one of the 12 email messages? (For example, it's possible all 12 went to a single associate.)

(d) Suppose that due to new evidence you limit the suspect's list of associates to 15 people. Now how many possibilities are there if order matters and each email message went to a different person?

(e) Suppose you get still further new evidence: the ?rst email went to associate X or associate Y. The next 4 emails went to people from a group of 5 associates, but some people in this group might have received more than one email message. The last 7 emails went to 7 distinct people from a group of 10 associates. Assume order matters. How many different possibilities are there?

**Problem 3 **

(a) Recall the pattern matching algorithm from HW 1. This algorithm has time complexity O(nm), since it depends both on the text string length n and the pattern length m. Explain, in your own words, why we use big-O here rather than big-Θ. If you need to refer to the pseudocode for the algorithm, it is still posted on the Moodle page.

Note this refers to the original pattern matching algorithm in HW 1, not to any modi?ed version. Note also that this question does not ask you to explain or justify the complexity - it just asks why the complexity is written in terms of big-O rather than big-Θ.

(b) In genomic analysis the sequences can be extremely long. Suppose you have an algorithm that takes 40n arithmetic ?oating point operations, where n is the sequence length. Suppose you are working on a computer that can do 5 GFLOPS per second (if you forget what a GFLOP is, refer to the machine organization notes, or ask one of the course staff). Could that computer execute the algorithm in a reasonable amount of time if the sequence length was n = 20 million?

(c) Redo part (b), except now assume the algorithm takes 40n^{2} arithmetic ?oating point operations.

**Problem 4 **

Solve each of the problems below. Remember to show enough work that it is clear how you obtained your answer.

(a) Suppose you are doing public health research studying diabetes. You have a database consisting of 10,000 records, and have a Θ(n^{2} ) program (where n is the number of records) looking for correlations in the data. Suppose that when you run the program on your

computer it takes about 7 minutes to ?nish.

(a) Suppose that you get additional records so that the database size becomes 20,000 record. Now about how long will it take to run the program?

(b) Suppose a colleague asks you to run your program for a database she has that has 100,000 records. Now about how long will it take to run the program?

(c) Suppose you have 20 databases, each of 10,000 records. You run the program on the ?rst of these, then immediately after it is done you run it on the second, and so forth until the program has been run on all 20 databases. About how long will this take?

(d) Which takes less time: running the program on 20 databases each with 10,000 records, or running it on a single database with 100,000 records? (e) Suppose a coworker with programming and algorithm experience claims he can improve your algorithm. You run the program on your 10,000 record data set and it now takes 9 minutes. Moreover, due to a mistake the coworker made, the program now is Θ(n^{3}). Now about how how long will it take if you run the algorithm on the 100,000 dataset?

(f) Going back to the Θ(n^{2}) algorithm that takes 7 minutes on a database of 10,000 records: suppose you increase the database size by a factor of 5, so you have a database of 50,000 records. But you also get a computer that is 5 times as fast. How long will it take your program to run using the new computer on the 50,000 database?

**Problem The Best of Times, the Worst of Times**

Abrupt changes in adjacent data values often indicate something signi?cant in the data. In a medical data set, for example MRI data, a large difference between adjacent data might indicate a transition between different tissue types, say bone and muscle. Or in a geologic data set a large difference might indicate the transition from one rock formation to another.

The following algorithm goes through an n-row, n-column, n-depth data set, and checks if there is a large difference between a data value and the next data value in the same column and at the same depth (so it is checking for differences in one direction, but not all directions). In the pseudocode below abs refers to the absolute value function. So, for example abs(-5) returns 5.

Input: an n-row, n-column, n-depth data set of numbers, along with a positive number threshold.

Output: a message "Large difference found" if any value differs from the next value in the same column and same depth by more than threshold, and a message "No large difference found" otherwise.

1 Get threshold, n, and the data set

2 Set i= 1

3 Set sigDiffFound = false

4 While i < n and sigDiffFound equals false

5 Set j = 1

6 While j <= n and sigDiffFound equals false

7 Set k = 1

8 While k <= n and sigDiffFound equals false

9 If abs(a[i,j,k] - a[i+1,j,k]) > threshold

10 Set sigDiffFound = true

11 Set k = k + 1

12 Set j = j + 1

13 Set i = i + 1

14 If sigDiffFound equals true

15 Print 'Large difference found'

16 Else

17 Print 'No large difference found'

18 Stop

Now answer the following questions. In each part do not make any assumptions on the value of n, so your answer might be a function of n. However, write your answer as a number or as a function of n and not, for example, as a big-Θ estimate.

(a) What is the fewest number of times Line 9 is executed?

(b) What is the largest number of times Line 9 is executed?

(c) Suppose the number of rows, column, and depths all increased from n to 2n. What is the ratio of the fewest number of times Line 9 is executed in this case to the number of times in your answer for part (a)?

(d) Suppose the number of rows, column, and depths all increased from n to 2n. What is the ratio of the largest number of times Line 9 is executed in this case to the number of times in your answer for part (b)