Find the maximal frequent sequences

Assignment Help Data Structure & Algorithms

Reference no: EM132380532

Exercises 1:

(1994). The tidlist intersection based Eclat method is described in Zaki et al. (1997), and the dEclat approach that uses diffset appears in Zaki and Gouda (2003). Finally, the FPGrowth algorithm is described in Han, Pei, and Yin (2000). For an experimental comparison between several of the frequent itemset mining algorithms see Goethals and Zaki (2004). There is a very close connection between itemset mining and association rules, and formal concept analysis; see Ganter, Wille, and Franzke, 1997. For example, association rules can be considered to be partial implications with frequency constraints; see Luxenburger (1991).

Q1. Given the database in Table 8.2.

(a) Using minsup = 3/8, show how the Apriori algorithm enumerates all frequent patterns from this dataset.

(b) With minsup = 2/8, show how FPGrowth enumerates the frequent itemsets.

Q2. Consider the vertical database shown in Table 8.3. Assuming that minsup = 3, enumerate all the frequent itemsets using the Eclat method.

Table 8.2. Transaction database for Q1

tid	itemset
t₁	ABCD
t₂	ACDF
t₃	ACDEG
t₄	ABDF
t_s	BCG
t₆	DFG
t₇	ABG
t₈	CDFG

Table 8.3. Dataset for Q2

A	B	C	D	E
1	2	1	1	2
3	3	2	6	3
5	4	3		4
6	5	5		5
	6	6

Q3. Given two k-itemsets X_a, = {x1,...,x_k-1,x_a} and X_b, = {x1,...,x_k-1, x_b} that share the common (k - 1)-itemset X = {x₁,.x₂,...,x_k-1} as a prefix, prove that

sup(X_ab) = sup(X_a) - |d(X_ab)| where X_ab = X_a U X_b and d(X_ab)) is the diffset of X_ab.

Q4. Given the database in Table 8.4. Show all rules that one can generate from the set ABE.

Table 8.4. Dataset for Q4

tid	itemset
t₁	ACD
t₂	BCE
t₃	ABCE
t₄	BDE
t_s	ABCE
t₆	ABCD

Q5. Consider the partition algorithm for itemset mining. It divides the database into k partitions, not necessarily equal, such that D = U_i=1^kD_i, where D_i is partition i, and for any i ≠ j, we have D_i ∩ D_j = Ø. Also let n_i = |D_i| denote the number of transactions in partition D_i. The algorithm first mines only locally frequent itemsets, that is, itemsets whose relative support is above the minsup threshold specified as a fraction. In the second step, it takes the union of all locally frequent itemsets, and computes their support in the entire database D to determine which of them are globally frequent. Prove that if a pattern is globally frequent in the database, then it must be locally frequent in at least one partition.

Exercises 2: Summarizing Itemsets

The concept of closed itemsets is based on the elegant lattice theoretic framework of formal concept analysis in Ganter, Wille, and Franzke (1997).The Charm algorithm for mining frequent closed itemsets appears in Zaki and Hsiao (2005), and the GenMax method for mining maximal frequent itemsets is described in Gouda and Zaki (2005). For an Apriori style algorithm for maximal patterns, called MaxMiner, that uses very effective support lower bound based itemset pruning see Bayardo Jr (1998). The notion of minimal generators was proposed in Bastide et al. (2000); they refer to them as key patterns. Nonderivable itemset mining task was introduced in Calders and Goethals (2007).

Q1. True or False:
(a) Maximal frequent itemsets are sufficient to determine all frequent itemsets with their supports.
(b) An itemset and its closure share the same set of transactions.
(c) The set of all maximal frequent sets is a subset of the set of all closed frequent itemsets.
(d) The set of all maximal frequent sets is the set of longest possible frequent itemsets.

Q2. Given the database in Table 9.1

(a) Show the application of the closure operator on AE, that is, compute c(A£). Is AE closed?

(b) Find all frequent, closed, and maximal itemsets using minsup = 2/6.

Q3. Given the database in Table 9.2, find all minimal generators using minsup = 1. Exercises 257

Table 9.1. Dataset for Q2

tid	itemset
t₁	ACD
t₂	BCE
t₃	ABCE
t₄	BDE
t_s	ABCE
t₆	ABCD

Table 9.2. Dataset for Q3

tid	itemset
1	ACD
2	BCD
3	AC
4	ABD
5	ABCD
6	BCD

1341_closed itemset.jpg

Figure 9.7. Closed itemset lattice for Q4.

Q4. Consider the frequent closed itemset lattice shown in Figure 9.7. Assume that the item space is I = {A, B, C, D, E}. Answer the following questions:

(a) What is the frequency of CD?

(b) Find all frequent itemsets and their frequency, for itemsets in the subset interval [B, ABD].

Q5. Let C be the set of all closed frequent itemsets and M the set of all maximal frequent itemsets for some database. Prove that M ⊆ C.

Q6. Prove that the closure operator c = i o t satisfies the following properties (X and Y are some itemsets):

(a) Extensive: X ⊆ c(X)

(b) Monotonic: If X ⊆ c(Y) then x(X) ⊆ c(Y)

Exercises 3:

time algorithm. The total time over the entire database of N sequences is thus O(Nn), ifn is the longest sequence length.

The level-wise GSP method for mining sequential patterns was proposed in Srikant and Agrawal (1996). Spade is described in Zaki (2001), and the PrefixSpan algorithm in Pei et al. (2004). Ukkonen's linear time suffix tree construction method appears in Ukkonen (1995). For an excellent introduction to suffix trees and their numerous applications see Gusfield (1997); the suffix tree description in this chapter has been heavily influenced by it.

Q1. Consider the database shown in Table 10.2. Answer the following questions:

(a) Let minsup = 4. Find all frequent sequences.

(b) Given that the alphabet is ∑ = {A,C,G,T}. How many possible sequences of length k can there be?

Table 10 2. Sequence database for Q1

Id	Sequence
s₁	AATACAAGAAC
s2	GTATGGTGAT
s3	AACATGGCCAA
S4	AAGCGTGGTCAA

Q2. Given the DNA sequence database in Table 10.3, answer the following questions using minsup = 4

(a) Find the maximal frequent sequences.

(b) Find all the closed frequent sequences.

Attachment:- Exercises.rar

Reference no: EM132380532

Questions Cloud

Explain the gaap financial reporting requirements : Explain the GAAP financial reporting requirements and choices for a Canadian privately held corporation. The response paper should be in APA format.

What is business writing : What is business writing? How do you make constructive recommendations that will be accepted and not shunned by the organization's leadership

What was the annual average compound growth rate : Suppose that the NASDAQ Composite index hit a level of 4,684 in February of 2005 In February of 1990 it was at a level of 2,028

Calculate the net income for 2020 under the accrual basis : In its first year of operations, 2020, Whispering Winds Corp. invoiced $189,000 in service revenue. Of that amount, $23,000 was still owing.

Find the maximal frequent sequences : Find all frequent itemsets and their frequency, for itemsets in the subset interval - Find the maximal frequent sequences - how the Apriori algorithm enumerates

What is comet prides 2013 revenue using accrual accounting : What is Comet Pride's 2013 revenue using accrual accounting? Received $4,800 in advance for services to be performed in 2014.

What is the value of inventory listed on the firms balance : Burt's TVs has current liabilities of $24.2 million. Cash makes up 33 percent of the current assets and accounts receivable makes up another.

What is the year-end 2008 balance in retained earnings : TriCycle, Corp. began the year 2008 with $27 million in retained earnings. The firm earned net income of $8.2 million in 2008 and paid $1.12 million.

Prepare martinezs entry to record the exchange : Martinez Company traded a used truck for a new truck. The used truck cost $35,700 and has accumulated depreciation of $32,130.

User Account

All Pages