What is the running time of the algorithm that creates tree

Assignment Help Data Structure & Algorithms
Reference no: EM131308397

Assignment: Classi?cation trees

1. Introduction

A classification tree is used for classifying inputs into one of several categories. Each node in the classification tree consists of a criterion that is applied to the input to classify it into one of (potentially) many sets. The following tree illustrates this for classifying 12 TV (s1,...,s12) shows into various categories:

156_Tree.jpg

As can be seen there are 3 shows that are half-hour mysteries, and two shows that are hour comedies. The tree allows one to find all shows that satisfy various criteria, by starting at the top of the tree, and descending through the tree based upon supplied criteria, eventually arriving at the appropriate category (shown in rectangles) at the leaves of the tree.

In this project you will build (by hand) a binary classification tree for a fixed set of inputs and analyze its behavior.

2. Binary classi?cation trees

Each criterion in a binary classification tree classifies the input into either one of two categories. It can therefore be implemented as a binary tree in which the interior nodes contained the classification criterion and each leaf contains all elements that satisfy the all the criteria along the path from the root to the leaf. Any classification tree can be built using a binary classification tree, by splitting any non-binary criteria into a sequence of binary criteria.

For this project, the leaves in the binary tree will contain single elements. This means that the criteria nodes must uniquely select a single element from the set of elements that comprise the leaves in the tree. Ideally a search tree will be balanced, in the sense that for every node in the tree, the heights of its left and right subtrees are with k| of one another, for some small k| (ideally, 1). call such a tree a balanced binary classification search tree, or a search tree for short.

3. Constructing a search tree

Let us assume that keys we are interested in are 2 characters long, e.g.

CD
ac
kc
ol

Choose 10 such codes, all different. Write each code out in binary. Now perform this recursive algorithm:

1. If you have but one code, append = to it and return the augmented pattern.

2. Total each column, counting the number of 1's. Call the leftmost column 0. Let b[i] designate this number for column-i and let a[i] = 10-b[i] designate the number of zeros.

3. Find a collection of a's or b's, such that the sum of the collection is closest to n/2 where n is the number of elements in your list.

a. If your collection is of b's, then write a bit pattern that consists of 16 bits, where the bit-i is set to 1 if b[i] is in your collection. Append to your result an operator of & (the bitwise "and").

b. If your collection is of a's, then write a bit pattern that consists of 16 bits, where the bit-i is set to 1 if b[i] is in your collection. Append to your result an operator of ^ (the bitwise "exclusive or").

4. Apply your chosen operator one-by-by to your bit patterns. Those that return "true" (i.e., a non-zero number) remove and place into a separate collection. These will form the starting block for the left-hand side of the tree. Those that are not selected will form the starting block fro the right-hand side of the tree.

5. You now have two starting sets. Recursively apply steps 1-5 to form two subtrees, with root being the bit pattern + operator you formed in step 3 above.

Here is an example

HEX

Binary

CD

43

44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ac

61

63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc

6B

63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

ol

6F

6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

b-Totals

3

1

3

0

2

1

3

4

3

1

3

0

1

2

2

2

a-Totals

1

3

1

4

2

3

1

0

1

3

1

4

3

2

2

2

In our case n=4, so we want to form a subset of size n/2=2 of either a's exclusively, or b's, exclusively.

For b's we note there are several columns that are candidates for inclusion in our subset of (approximately) size n/2.

b[1]
b[1]
b[4]
b[5]
b[9]
b[12]
b[13]
b[14]
b[15]

These are all 2 or less.

The goal is to form a subset from the list above that will match (as close to) two of the four input rows. Clearly one could pick either b[4], b[13], b[14] or `b[15]. because matching a 1 in any of these columns would pick two rows. Say we pick b[14]. Then we form the selection pattern:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

When this pattern is matched against the four inputs (using an &), it will select exactly two:

                                                     *
ac 61 63 1 0 1 0 0 0 0 1  1 0 1 0 0 0 1 1
kc 6B 63 1 0 1 0 1 0 1 1  1 0 1 0 0 0 1 1

Had b[4] been chosen, then we would form the pattern:

0 0 0 0 1 0 0 0  0 0 0 0 0 0 0 0

When this pattern is matched against the four inputs, it will select exactly two:

kc  6B  63  1 0 1 0 1 0 1 1   1 0 1 0 0 0 1 1
ol  6F   6C  1 0 1 0 1 1 1 1   1 0 1 0 1 1 0 0

In the case where there is no n/2 in the b-totals, then we can pick combinations of bits, which, when applied will choose close to n/2 inputs. This is done as follows.

Choose b[1], which will pick one row:

Pattern:      0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CD  43  44  0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 0

Having chosen that row, eliminate it from consideration and recalculate the b-totals (which can be computed by subtracting CD from the b-totals).

ac  61  63   1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 1
kc  6B  63   1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1
ol  6F  6C    1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0
b-Totals     3 0 3 0 2 1 2 3 3 0 3 0 1 1 2 2

Now choose another column with a 1 in it (say b[12]) and "or' this with the previous pattern:

Pattern: 0 1 0 0 0 0 0 0  0 0 0 0 1 0 0 0

Now and-ing this against the original four rows

CD

43

44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ac

61

63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc

6B

63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

ol

6F

6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

will choose any row with a 1 in column 1 or column 10, giving:
=>

CD  43  44  0 1 0 0 0 0 1 1  0 1 0 0 0 1 0 0
ol    6F  6C  1 0 1 0 1 1 1 1  1 0 1 0 1 1 0 0

Of course, selecting different combinations of b-totals may choose different rows, but that does not matter, because the goal is to choose n/2 rows, which is accomplished.
Writing our pattern in hex we have:

Pattern:  0 1 0 0 0 0 0 0  0 0 0 0 1 0 0 0
Pattern:      4         0           0          8

At this point we have formed:

1085_Tree1.jpg

The designation &:4008 in the root node means that when deciding which side of the tree a candidate key would be found, perform key & 4008. If it matches, take the T branch, otherwise take the F branch.

Having split the set in half, the process is repeated recursively on both subsets.

The left set b-totals are:

CD  43  44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ol    6F  6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

b-totals:

1

1

1

0

1

1

2

2

1

1

1

0

1

2

0

0

In this case n/2=1, so we choose any b-total with a 1, say (b[1]), forming the pattern:

Pattern:   0 1 0

0

0

0 0

0

0

0 0

0

0

0 0

0

Pattern:               4

 

 

0

 

 

0

 

 

0

 

which chooses:

which chooses:

 

 

 

 

 

 

 

 

 

 

The b-totals for the left-hand set are:

ac  61 63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc  6B 63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

b-totals:

2

0

2

0

1

0

1

2

2

0

2

0

0

0

2

2

Again we choose any b-total with a 1, say b[4],

Pattern:   0 0 0

0

1

0 0

0

0

0 0

0

0

0 0

0

Pattern:               0

 

 

8

 

 

0

 

 

0

 

which chooses:

kc  6B 63  1 0 1

0

1

0 1

1

1

0 1

0

0

0 1

1

Our search tree now looks like:

825_Tree2.jpg

Note that the nodes contain patterns to be matched against a search key. If the search key is key = "ol" then key & 0x4008 matches so the left branch is selected. key & 0x4000 does not match (evaluates to 0), so the right branch is taken. Last, key =

"ol" matches at a leaf node, so ol is in the set.

Searching for "oZ' will follow the same path, but the final comparison of "oZ" = "ol" fails, so oz is not in the set.

Analysis

Now that you have worked through an example, it is time analyze both the algorithm for building the tree and the algorithm for searching the tree.

Build a tree

To demonstrate that you understand the algorithm, choose 6 two-character keys, translate them into binary (using ASCII mapping), and build a balanced search tree. Draw your tree.

Analyze the construction algorithm

Suppose we have n| records and that each record contains a key of k| bytes and information associated with the key of r| bytes. Assume also that the tree is built using 4-byte pointers.

1. The algorithm for building the tree proceeds by splitting the list in two each time, based on the b-counts to try and create a balanced tree. However, what would happen in the following pathological case?

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

a. There are 8 members of this set. Can the b-counts be used to split the set into two equally-sized subsets?

b. Suggest a way of using the a-counts in a situation like this. Specifically,

c. How would you use the a-counts to create a pattern that would split the list in two almost equal-size sets? List a particular pattern that would accomplish this for the 8 rows above.

d. What operation(s) would need to be applied to match your pattern to select 4 rows? Which four rows would be selected?

2. Suppose the b-counts are all the same value c|. This then means that all the values for the a-counts are n-c|. For what value c| would this cause the most uneven splitting of the set? (This is tricky. Spend some time on it).

3. How much space will the tree consume? Your answer should be in terms of n|, k| and r|. You may assume the r| bytes for records are stored in the leaves of the search tree.

a. What is the best possible case? (Hint consider a perfectly balanced search tree. Further hint: Your discrete math background should be sufficient to know that count of interior nodes given the count of leaf nodes. If it makes things easier for you, assume n| is a power of 2.)

b. Bonus: What is the worst case? (Hint: consider how badly the tree might be unbalanced. Calculate the number of nodes required in such a case. Consider your answer to the problem labeled "tricky" above.)

4. What is the running time of the algorithm that creates the tree? where the measure of the size of the program is the number of rows (n|) and the number of bytes (k|) in the key, and the size of the record information (r|) in the situation in which the list can be split perfectly in half each time? This is a very hard problem and it is likely that few will get it right. It is your thinking processes that will be judged rather than perfect mathematical analysis. Big-O answer without constant coefficients is good enough.

a. How many steps will it take to calculate the b-counts from a collection of rows?
(Note that this depends on k| and n|.)

b. How many steps to find the pattern? (Worst case: you have to form the pattern from a collection of 1's.)

c. How many steps does it take to reduce the b-counts for each selected pattern?

d. How many times will three steps above have to be repeated in a perfectly split binary tree?

e. Combine your results together to give your estimate of the running time of the tree-construction algorithm.

5. What is the running time of the search routine for a perfectly balanced search tree?

6. What is the running time of the search routine for the worst-balanced search tree?

Reference no: EM131308397

Questions Cloud

Promote participation that is more consistent : To encourage teamwork and a culture of excellence, a new director of an ambulatory care center in a hospital has begun holding bi-monthly management team meetings, consisting of several physicians, nurses, physician assistants, financial managers,..
Show the encoding for the integer 1456 using ber : Show the encoding for the object identifier 1.3.6.1.2.1.7.1 (the udpInDatagram variable in udp group) using BER.
Describe two specific products from different vendors : You are to take one of the five identified categories of tools and identify two specific products from different vendors. Based on two products, please research the differences and similarities between the two products.
Introduce the company zero-tolerance harassment policy : Introduce the objective of the presentation. Introduce the company's zero-tolerance harassment policy, and explain why it must be adhered to.
What is the running time of the algorithm that creates tree : What is the running time of the algorithm that creates the tree? where the measure of the size of the program is the number of rows (n|) and the number of bytes (k|) in the key.
Statements is correct in relation to independent projects : Which one of the following statements is correct in relation to independent projects?
Defines interface numbers through which ip should be sent : Assume the table has four rows at the moment with the destination IP addresses (201.14.67.0), (123.16.0.0), (11.0.0.0), and (0.0.0.0). Show how SNMP can access all four instances of the second column, called ipRouteIfIndex, which defines the inter..
Opinion and ideal information system : There are new approaches for system building in the digital firm era. What is your opinion and ideal information system that works for global management?
Drawing the flowchart for displays payroll information : Drawing the flowchart for displays payroll information.Write the code by using Visual Basic programming language.

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Create greedy algorithm to find market to buy apples

Assume we drive pickup truck from city A to city B. Along high way, we will go through n apple markets, labeled with 1, 2, ..., n, where you can buy or sell apples. which means you buy and sell apples at the same market i.

  Website creation

Construct a basic, generic structure of a web site. Name it something generic etc. Demonstrate some basic layout of the content of the pages.

  Find fraction of time during which queue grows

Suppose now there are three users. Find the probability that at a given time, all three users are transmitting simultaneously. Find the fraction of time during which the queue grows.

  Formula to compute number of address bus conductors

If an address bus needs to be able to address 8-devices, how many conductors will be needed? What if each of those devices also requires to be able to talk back to the I/O control device?

  Discuss fault tolerance approaches that systems managers use

Discuss fault tolerance approaches that systems managers use to assure continuity of operations

  Draw a binary search tree for an array

Draw a binary search tree for an array of element from 0 to 20

  Running time analyses of all the methods

You need to give the running time analyses of all the methods in terms of the Big O notation. Include your running time analyses in the source file of the CompressedSuffixTrie class and comment out them.

  Sharing a large computer file

Assume you are sitting at desk at office and using your laptop computer. The boss calls an emergency meeting for you and many colleagues, and asks everyone to bring his or her laptop computer.

  Question about pure aloha

A group of N stations share a 56-kbps pure ALOHA channel. Every station outputs a 1000-bit frame on an average of once every one-hundred secs, even if the previous one has not yet been sent.

  Astronaut.data must be read into a 1-d array

Data from the Astronaut.data must be read into a 1-D array of structures(or classes) named ASTRONAUT and thereafter all processing must be performed on the array of structures.

  Complete pseudo code for the insert hash table operations

The task is to complete the pseudo code for the following hash table operations: Insert and Remove. Assume Hashtable is a simple array of size 8, with indices 0..7.

  Implement the array-based stack class

Implement the array-based stack class - Use it in the client code to convert an infix expression into post-fix expression, and compute the result.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd