Find the maximal frequent sequences

Assignment Help Data Structure & Algorithms
Reference no: EM132380532

Exercises 1:

(1994). The tidlist intersection based Eclat method is described in Zaki et al. (1997), and the dEclat approach that uses diffset appears in Zaki and Gouda (2003). Finally, the FPGrowth algorithm is described in Han, Pei, and Yin (2000). For an experimental comparison between several of the frequent itemset mining algorithms see Goethals and Zaki (2004). There is a very close connection between itemset mining and association rules, and formal concept analysis; see Ganter, Wille, and Franzke, 1997. For example, association rules can be considered to be partial implications with frequency constraints; see Luxenburger (1991).

Q1. Given the database in Table 8.2.

(a) Using minsup = 3/8, show how the Apriori algorithm enumerates all frequent patterns from this dataset.

(b) With minsup = 2/8, show how FPGrowth enumerates the frequent itemsets.

Q2. Consider the vertical database shown in Table 8.3. Assuming that minsup = 3, enumerate all the frequent itemsets using the Eclat method.

Table 8.2. Transaction database for Q1

tid itemset
t1 ABCD
t2 ACDF
t3 ACDEG
t4 ABDF
ts BCG
t6 DFG
t7 ABG
t8 CDFG

Table 8.3. Dataset for Q2

A B C D E
1 2 1 1 2
3 3 2 6 3
5 4 3
4
6 5 5
5

6 6

Q3. Given two k-itemsets Xa, = {x1,...,xk-1,xa} and Xb, = {x1,...,xk-1, xb} that share the common (k - 1)-itemset X = {x1,.x2,...,xk-1} as a prefix, prove that

sup(Xab) = sup(Xa) - |d(Xab)| where Xab = Xa U Xb and d(Xab)) is the diffset of Xab.

Q4. Given the database in Table 8.4. Show all rules that one can generate from the set ABE.

Table 8.4. Dataset for Q4

tid itemset
t1 ACD
t2 BCE
t3 ABCE
t4 BDE
ts ABCE
t6 ABCD

Q5. Consider the partition algorithm for itemset mining. It divides the database into k partitions, not necessarily equal, such that D = Ui=1kDi, where Di is partition i, and for any i ≠ j, we have Di ∩ Dj = Ø. Also let ni = |Di| denote the number of transactions in partition Di. The algorithm first mines only locally frequent itemsets, that is, itemsets whose relative support is above the minsup threshold specified as a fraction. In the second step, it takes the union of all locally frequent itemsets, and computes their support in the entire database D to determine which of them are globally frequent. Prove that if a pattern is globally frequent in the database, then it must be locally frequent in at least one partition.

Exercises 2: Summarizing Itemsets

The concept of closed itemsets is based on the elegant lattice theoretic framework of formal concept analysis in Ganter, Wille, and Franzke (1997).The Charm algorithm for mining frequent closed itemsets appears in Zaki and Hsiao (2005), and the GenMax method for mining maximal frequent itemsets is described in Gouda and Zaki (2005). For an Apriori style algorithm for maximal patterns, called MaxMiner, that uses very effective support lower bound based itemset pruning see Bayardo Jr (1998). The notion of minimal generators was proposed in Bastide et al. (2000); they refer to them as key patterns. Nonderivable itemset mining task was introduced in Calders and Goethals (2007).

Q1. True or False:
(a) Maximal frequent itemsets are sufficient to determine all frequent itemsets with their supports.
(b) An itemset and its closure share the same set of transactions.
(c) The set of all maximal frequent sets is a subset of the set of all closed frequent itemsets.
(d) The set of all maximal frequent sets is the set of longest possible frequent itemsets.

Q2. Given the database in Table 9.1

(a) Show the application of the closure operator on AE, that is, compute c(A£). Is AE closed?

(b) Find all frequent, closed, and maximal itemsets using minsup = 2/6.

Q3. Given the database in Table 9.2, find all minimal generators using minsup = 1. Exercises 257

Table 9.1. Dataset for Q2

tid itemset
t1 ACD
t2 BCE
t3 ABCE
t4 BDE
ts ABCE
t6 ABCD

Table 9.2. Dataset for Q3

tid itemset
1 ACD
2 BCD
3 AC
4 ABD
5 ABCD
6 BCD

1341_closed itemset.jpg

Figure 9.7. Closed itemset lattice for Q4.

Q4. Consider the frequent closed itemset lattice shown in Figure 9.7. Assume that the item space is I = {A, B, C, D, E}. Answer the following questions:

(a) What is the frequency of CD?

(b) Find all frequent itemsets and their frequency, for itemsets in the subset interval [B, ABD].

(c) Is ADE frequent? If yes, show its support. If not, why?

Q5. Let C be the set of all closed frequent itemsets and M the set of all maximal frequent itemsets for some database. Prove that M ⊆ C.

Q6. Prove that the closure operator c = i o t satisfies the following properties (X and Y are some itemsets):

(a) Extensive: X ⊆ c(X)

(b) Monotonic: If X ⊆ c(Y) then x(X) ⊆ c(Y)

(c) Idempotent: c(X) = c(e(X))

Exercises 3:

time algorithm. The total time over the entire database of N sequences is thus O(Nn), ifn is the longest sequence length.

The level-wise GSP method for mining sequential patterns was proposed in Srikant and Agrawal (1996). Spade is described in Zaki (2001), and the PrefixSpan algorithm in Pei et al. (2004). Ukkonen's linear time suffix tree construction method appears in Ukkonen (1995). For an excellent introduction to suffix trees and their numerous applications see Gusfield (1997); the suffix tree description in this chapter has been heavily influenced by it.

Q1. Consider the database shown in Table 10.2. Answer the following questions:

(a) Let minsup = 4. Find all frequent sequences.

(b) Given that the alphabet is ∑ = {A,C,G,T}. How many possible sequences of length k can there be?

Table 10 2. Sequence database for Q1

Id

Sequence

s1

AATACAAGAAC

s2

GTATGGTGAT

s3

AACATGGCCAA

S4

AAGCGTGGTCAA

Q2. Given the DNA sequence database in Table 10.3, answer the following questions using minsup = 4

(a) Find the maximal frequent sequences.

(b) Find all the closed frequent sequences.

(c) Find the maximal frequent substrings.

Attachment:- Exercises.rar

Reference no: EM132380532

Questions Cloud

Explain the gaap financial reporting requirements : Explain the GAAP financial reporting requirements and choices for a Canadian privately held corporation. The response paper should be in APA format.
What is business writing : What is business writing? How do you make constructive recommendations that will be accepted and not shunned by the organization's leadership
What was the annual average compound growth rate : Suppose that the NASDAQ Composite index hit a level of 4,684 in February of 2005 In February of 1990 it was at a level of 2,028
Calculate the net income for 2020 under the accrual basis : In its first year of operations, 2020, Whispering Winds Corp. invoiced $189,000 in service revenue. Of that amount, $23,000 was still owing.
Find the maximal frequent sequences : Find all frequent itemsets and their frequency, for itemsets in the subset interval - Find the maximal frequent sequences - how the Apriori algorithm enumerates
What is comet prides 2013 revenue using accrual accounting : What is Comet Pride's 2013 revenue using accrual accounting? Received $4,800 in advance for services to be performed in 2014.
What is the value of inventory listed on the firms balance : Burt's TVs has current liabilities of $24.2 million. Cash makes up 33 percent of the current assets and accounts receivable makes up another.
What is the year-end 2008 balance in retained earnings : TriCycle, Corp. began the year 2008 with $27 million in retained earnings. The firm earned net income of $8.2 million in 2008 and paid $1.12 million.
Prepare martinezs entry to record the exchange : Martinez Company traded a used truck for a new truck. The used truck cost $35,700 and has accumulated depreciation of $32,130.

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Implement an open hash table

In this programming assignment you will implement an open hash table and compare the performance of four hash functions using various prime table sizes.

  Use a search tree to find the solution

Explain how will use a search tree to find the solution.

  How to access virtualised applications through unicore

How to access virtualised applications through UNICORE

  Recursive tree algorithms

Write a recursive function to determine if a binary tree is a binary search tree.

  Determine the mean salary as well as the number of salaries

Determine the mean salary as well as the number of salaries.

  Currency conversion development

Currency Conversion Development

  Cloud computing assignment

WSDL service that receives a request for a stock market quote and returns the quote

  Design a gui and implement tic tac toe game in java

Design a GUI and implement Tic Tac Toe game in java

  Recursive implementation of euclids algorithm

Write a recursive implementation of Euclid's algorithm for finding the greatest common divisor (GCD) of two integers

  Data structures for a single algorithm

Data structures for a single algorithm

  Write the selection sort algorithm

Write the selection sort algorithm

  Design of sample and hold amplifiers for 100 msps by using n

The report is divided into four main parts. The introduction about sample, hold amplifier and design, bootstrap switch design followed by simulation results.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd