Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

Lifo under perpetual inventory procedure, Q. LIFO under perpetual inventory...

Q. LIFO under perpetual inventory procedure? LIFO under perpetual inventory procedure observes Exhibit to see the LIFO method using perpetual inventory procedure. In this proce

Explain about the term- reports, Explain about the term- Reports Repor...

Explain about the term- Reports Reports are produced as a result of questions like "how many times has a type of car broken down" or "which cities have the highest house price

Produce a sequence diagram for the task add, This assignment is based on th...

This assignment is based on the 'Swansea Docklands Heritage Society' case study attached.  The assignment is the third of three related submissions, all based on the Swansea Dockla

What are condition code flags, What are condition code flags? The proce...

What are condition code flags? The processor keeps track of information about the results of several operations for use by subsequent conditional branch instructions. This is a

Enumerate about the specialised hardware, Enumerate about the Specialised h...

Enumerate about the Specialised hardware Specialised hardware such as protected memory or cryptographic memory module for storing and protecting the keys proves to be a good s

Explain about the term business-to-customer, Explain about the term busines...

Explain about the term business-to-customer. B2C (business-to-customer): "Electronic commerce" is usually understood mostly as selling goods or services to people ("last

Eliminating data hazards - computer architecture, Eliminating data hazards:...

Eliminating data hazards: Forwarding NOTE: In the following instance, computed values are in bold, whereas Register numbers are not. Forwarding involves adding output

Function of osi transport layer, Q. Function of OSI Transport Layer? - ...

Q. Function of OSI Transport Layer? - It takes the information to be sent as well as breaks it into individual packets that are sent and reassembled into a complete message by

What does "wire_read: unexpected eof" mean in gimp, This error message shou...

This error message should say something like "the plug-in (or the major GIMP app) I was talking to has existed before returning any results, so I suppose that it has crashed."

Chemistry, what are the applications of photochemistry?

what are the applications of photochemistry?

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd