Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

Determine about the raster-scan systems, Determine about the raster-scan sy...

Determine about the raster-scan systems Most of the present raster-scan systems contain a separate processor called as display processor. This processor performs graphics func

Define a formal system, Q. Define a Formal System? A Formal System is o...

Q. Define a Formal System? A Formal System is one which is planned in advance and is used according to schedule. In this system procedures and policies are documented well in a

Evaluate sop expression, Q. For function F(x,y,z) = ∑m(0,1,2,6,7) using TRU...

Q. For function F(x,y,z) = ∑m(0,1,2,6,7) using TRUTH TABLE only.   1. Find SOP expression 2. Implement this simplified expression using two level AND-to-OR gate network 3. I

Target _self, TARGET = "_self" "_self" puts the new document in the s...

TARGET = "_self" "_self" puts the new document in the same window and frame as current document. "_self" works the same as if you hadn't used TARGET at all.

Nix commands, reate a directory "Unix" under your home directory. Command(...

reate a directory "Unix" under your home directory. Command(s): ………………………………………….

Compare electromechanical and electronic switching system, Compare electrom...

Compare electromechanical switching system with electronic switching system. Comparison of electromechanical switching system with electronic switching systemgiven below:

Memory management unit, A computer system with 16K of memory, a Memory Mana...

A computer system with 16K of memory, a Memory Management Unit with a page size of 2000, and the following page translation table :      Logical Address    Physical Start 0000

What are the components of i-way infrastructure, What are the components of...

What are the components of I-way Infrastructure? There are three components of the I-way infrastructure: • Consumer access equipment • Local on-Ramps • Global informat

What is dma operations, What is DMA operations? State its advantages. ...

What is DMA operations? State its advantages. In order to transfer bulk amount of data among memory and I/O device without involvement of CPU, the Direct Memory Access metho

Show the reset and submit buttons in html, Reset and Submit are special typ...

Reset and Submit are special types of input buttons. Submit is used to send data to the server and Reset resets/clears the form.

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd