Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

What is the advantage of caching in a web browser, What is the advantage of...

What is the advantage of caching in a web browser? Like other application browsers utilize a cache to enhance document access. The browser places a copy of all items it retriev

Why does microprocessor contain rom chips?, Microprocessor have ROM chip be...

Microprocessor have ROM chip because it have instructions to implement data. It have the monitor program which not only include implementation instruction but also interfacing

Engineering and managerial economics, write a short note on good blend of e...

write a short note on good blend of engineering and managerial economics

How do you traverse a btree in backward in-order, Put the node in the right...

Put the node in the right subtree Then, Put the root  Put the node in the left subtree

Computer graphics, what isAdvantages of scan line algorithm, Computer Graph...

what isAdvantages of scan line algorithm, Computer Graphics?

Explain about decimal numbers, Q. Explain about Decimal Numbers? Deci...

Q. Explain about Decimal Numbers? Decimal number system has 10 digits signified by 0,1,2,3,4,5,6,7,8 and 9. Any decimal number can be signified as a string of these digits an

Define the most common biometrics, What is most common biometrics? Explain ...

What is most common biometrics? Explain in brief. The most common biometrics is as given below: Face geometry (Photo): The computer captures the picture of your face and m

Multisim simulation file, Build the circuit using the Asynchronous Counter ...

Build the circuit using the Asynchronous Counter Technique with JK FF and relevantgates capable of executing the counting sequence as {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}Simulate the cir

Syntax and semantics - first-order logic, Syntax and Semantics: Propos...

Syntax and Semantics: Propositional logic is prohibited in its expressiveness: so just to represent true and false facts for the world. By a type of extending propositional lo

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd