Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

Minimum degree of t=1 for a b-tree, Why don't we permit a minimum degree of...

Why don't we permit a minimum degree of t=1 for a B-tree? According to the definition of B-Tree, a B-Tree of order n means that every node in the tree has a maximum of n-1 keys

How can we use ordered lists, Q. How can we use Ordered Lists? Lists ha...

Q. How can we use Ordered Lists? Lists having numbered items are termed as ordered lists. They are used when items in the list have a natural order. They can also be used when

Object-oriented program and cell controlled class architectu, Object-Orient...

Object-Oriented Program And Cell Controlled Class Architecture To the variety of scenarios, generic cell control architecture can be applied. An object oriented programming lan

Describe about the embedded applications assembly, Describe about the Embed...

Describe about the Embedded applications assembly Embedded applications assembly and C programs are developed since embedded programs aren't large. For all others high-level an

Design requirements of sequential circuit, What are the Requirements to des...

What are the Requirements to design Sequential circuit Ans . Design Requirements of Sequential circuit: (i) The circuit specifications translated into a state diagram

How to create a rollover image, Q. How to Create a Rollover Image? A ro...

Q. How to Create a Rollover Image? A rollover image is an image whose display changes when pointer passes ('rolls') over it. You will use Dreamweaver's Insert Rollover Image co

What is pattern, What is pattern? A pattern is a proven solution to a g...

What is pattern? A pattern is a proven solution to a general problem. Lots of patterns are used. There are patterns for analysis, architecture, design and execution. Patterns c

Central processing unit - computer architecture, Central processing unit (C...

Central processing unit (CPU) : The part of the computer which executes program instructions is known as the processor or central processing unit (CPU). CPU is over single ele

Disadvantages of pipeline - computer architecture, Disadvantages of pipelin...

Disadvantages of pipeline: Pipeline architecture has 2 major disadvantages.  First is its complexity and second is the inability to constantly run the pipeline at full speed,

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd