Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

Show basic concepts of permutation, Q. Show Basic concepts of permutation? ...

Q. Show Basic concepts of permutation? Let us look at the basic concepts of permutation with respect to interconnection network.  Let us say the network has set of n input node

Task and parallel task, Task A logically discrete sector of a computati...

Task A logically discrete sector of a computational effort. A task is naturally a program or program-like set of instructions that is implemented by a processor.  Parallel

Give the meaning of spc, SPC stands (A)   Standard Protocol Control ...

SPC stands (A)   Standard Protocol Control (B)   Stored Program Control (C)  Signaling and switching Centre (D)  Signaling Process Center Ans: SPC repres

Define micro routine and microinstruction, Define micro routine and microin...

Define micro routine and microinstruction. A sequence of control words corresponding to the control sequence of a machine instruction represents the micro routine for that ins

Determine octant to hexadecimal conversion, What is the Octant to hexadecim...

What is the Octant to hexadecimal conversion of 734 ? Ans. (734) 8      = (1 D C) 16 0001 ¦ 1101 ¦ 1100 1         D         C

Determine bhca rating for processor and call completion rate, A call proces...

A call processor in an exchange requires 120 ms to service a complete call. What is the BHCA rating for the processor? If the exchange is capable of carrying 700 Erlangs of traffic

What are qualified associations, What are qualified associations? A qua...

What are qualified associations? A qualified association is an association in which an attribute known as the qualifier disambiguates the object many associated end.

What is time out mechanism, What is time out mechanism. If one unit is ...

What is time out mechanism. If one unit is faulty the data transfer will not be done. Such an error can be detected using time out mechanism which makes an alarm if the data tr

State the implementation of a security policy, State the implementation of ...

State the implementation of a security policy The implementation of a security policy should invariably cover all parameters of security such as physical access to the server,

Define a structure, Define a structure. A structure having an ordered g...

Define a structure. A structure having an ordered group of data objects. Unlike the elements of an array, the data objects within a structure can have varied data types. Every

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd