Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

What are the benefits of micro programmed control, a. What are the benefits...

a. What are the benefits of hardwired and micro programmed control? b. Sketch neat diagram of single bus organization of CPU showing ALU, all types of registers and the data pat

Telecommunication, name the switching schemes used in a digital exchange. h...

name the switching schemes used in a digital exchange. ho call processing takes place?

Define about hyper-threading technology, Q. Define about Hyper-threading te...

Q. Define about Hyper-threading technology? Hyper-threading technology enables a single microprocessor to behave as two separate threaded processors to operating system and app

What is organizational structure, What is Organizational Structure? A b...

What is Organizational Structure? A business organization may be structured in many dissimilar ways, depending upon the environment within which it handles.  There is always

Why we need the need of parallel computation, THE NEED OF PARALLEL COMPUTAT...

THE NEED OF PARALLEL COMPUTATION   With the growth of computer science, computational pace of the processors has also increased many a times. Though, there are definite constr

What is the conclude of the force of gravity on an object, Q. What is the...

Q. What is the conclude of the force of gravity on an object? Answer:- Force is the vector product of mass as well as acceleration F = ma. Weight is an unusual case of t

How to convert binary to octal and hexadecimal, Q. How to convert Binary to...

Q. How to convert Binary to Octal and Hexadecimal? Rules for these conversions are simple. For converting binary to octal binary number is splitted in groups of three, that are

Define about exe programs, Q. Define about EXE Programs? An EXE program...

Q. Define about EXE Programs? An EXE program is stored on disk with extension .exe. EXE programs are longer than COM programs as every EXE program is related with an EXE header

Explain the message passing interface, Q. Explain the Message Passing Inter...

Q. Explain the Message Passing Interface? The Message Passing Interface (MPI) is a universal benchmark for providing communication among multiple simultaneous processes on a di

What are handshaking signals, a. Explain the hardware mechanism for handlin...

a. Explain the hardware mechanism for handling multiple interrupt requests. b. What are handshaking signals? Describe the handshake control of data transfer during input and out

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd