How Cross Validation tests model derived from training data

Assignment Help Computer Engineering
Reference no: EM132242428

Data Mining Assignment -

Suppose that the following table of instances (cases) were recorded for an insurance company's promotions for its life assurance product. The attributes are self-explanatory, and the values in the two product promotion attributes should be read as follows: a Yes means that the individual was offered that particular promotion only if s/he would take out the insurance and No not offered the promotion.

ID

Income Range

Gender

Age Range

Holiday Promotion

Wine Promotion

Life Insurance Take-up

1

40-50K

Male

30-40

No

Yes

Yes

2

30-40K

Female

30-40

No

Yes

No

3

40-50K

Male

30-40

No

No

No

4

30-40K

Male

30-40

Yes

Yes

Yes

5

50-60K

Female

20-30

No

No

No

6

20-30K

Female

40-50

No

No

No

7

30-40K

Male

20-30

Yes

No

No

8

20-30K

Male

20-30

No

Yes

Yes

9

30-40K

Male

30-40

No

Yes

Yes

10

30-40K

Female

30-40

No

No

Yes

11

40-50K

Female

30-40

No

No

No

12

20-30K

Male

20-30

No

Yes

Yes

13

50-60K

Female

20-30

No

No

No

14

40-50K

Male

40-50

No

Yes

No

15

20-30K

Female

20-30

Yes

Yes

No

16

40-50K

Female

30-40

No

No

No

17

50-60K

Male

40-50

Yes

Yes

Yes

18

20-30K

Female

30-40

No

Yes

No

19

20-30K

Male

40-50

Yes

Yes

Yes

20

30-40K

Female

20-30

Yes

Yes

No

Questions -

1. Use the ID3 decision tree induction method available in the Weka package (with the default setting) to derive a classifier (decision tree) from this set of data. The class attribute is Life Assurance Take-up.

2. What should be the class value for the following unseen case based on the derived tree? Justify your answer.

Income Range

Gender

Age Range

Holiday Promotion

Wine Promotion

Life Insurance Take-up

40-50K

Male

20-30

No

Yes

?

How would you deal with such cases in general? Outline your solution algorithmically using the structure given below:

algorithm DT-based Classification

# traversing the tree to reach a leaf node N

if N's class value is null then

:

: write your pseudo code to implement your solution here

:

else

return the class value

end

3. A decision tree derived from data can be used not only to predict class values for unseen cases, but also to summarize data for analysis. Based on the tree derived in 1), comment on whether the company has conducted its promotion effectively.

4. In the default setting in Weka, there is a setting of "Cross-Validation Folds 10" in the test options. Briefly explain how Cross Validation tests a model derived from training data and why we use it for testing.

5. Now perform the following tests: you vary "fold" from 2 to 10, run ID3 and observe classification accuracy for each setting. You then change the test options setting to "Use training set" and run ID3 and observe classification accuracy. You can record and present these test results as a table or a bar chart. Comment on your test results: which method (cross validation or using training set) is better for testing your derived tree and why?

6. Use the JRip rule induction method available in the Weka package (with the default setting) to derive a classifier (classification rules) from this set of data.

7. What observations do you have on the two classifiers you have obtained in terms of using them for business analysis (as in 3) and for classification of an unseen case (as in 2)?

Attachment:- Assignment Files.rar

Reference no: EM132242428

Questions Cloud

Mission statement is leadership and or managerial tool : A mission statement is a leadership and or a managerial tool which gives the ability to direct the behavior in a company (Campbell, 1993).
How lean staffing models could positively contribute : The purpose of this assignment is to consider factors that impact efficient staffing models and work practices within health care organizations.
What revised workflow would you implement : Demonstrate how the change will affect current workflows. What revised workflow would you implement? Identify resources (human, time, material, etc.).
Anticipating the future of the pharmaceutical industry : Peter Johnson found himself looking forward to the Senior Management Forum that he was scheduled to moderate at the end of the month.
How Cross Validation tests model derived from training data : Data Mining Assignment - Briefly explain how Cross Validation tests a model derived from training data and why we use it for testing
What is the value of social capital : What Is the Value of Social Capital?
Identify your role as the community health nurse : List strategies for a specified population that would promote empowerment in your own community. Identify your role as the community health nurse.
Provided the appropriate return on investment : Training evaluations are important to determine if the selected training solution was effective and provided the appropriate return on investment
Develop the lr equation and chart : Your written (in Word) analysis should discuss the logic and rationale used to develop the LR equation and chart.

Reviews

len2242428

2/25/2019 9:37:09 PM

Criteria for assessment - Credit will be awarded against the following criteria. The classifier derived using ID3 for Q1 [5 marks] Convincing arguments and solution for Q2 [25 marks] Valid analysis for Q3 [15 marks] Clarity of explanation Q4 [20 marks] Experiment results and analysis for Q5 [10 marks] The classifier derived using JRip for Q6 [5 marks] Clarity of your observations for Q7 [20 marks].

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd