Functions to encode the categorical attribute values

Assignment Help MATLAB Programming
Reference no: EM131251512

Part I -

Q1. In this question, we are going to build a neural network (NN) classifier to predict approval of credit card application. For confidentiality reason no attribute names are given in the CSV dataset "credit_data.csv". The last column is the approval status ("+" = approve, "-" = reject). Use the first 75% of the dataset as the training data, and the remainiing 25% as the testing data.

a. The categorical attribute values (e.g. columns 1 and 6) need to be converted to numeric values before they can be utilized for neural network training and classification. This process is called encoding. Implement specific MATLAB functions to encode the categorical attribute values to numeric values. For example, column 1 consists of only categorical values "a" and "b", and so they can be encoded as "1" and "2" respectively. The function source codes must be submitted as MATLAB function files. (.m file).

b. The NN classifier is created using MATLAB function feedforwardnet the following parameters:

Number of hidden layers: 1

Number of neurons: 10

Use default settings for other parameters. Train the classifier using the training dataset. Show the training performance by pasting the performance curve in your answer. Submit your MATLAB script file for this training.

Note: While feedforwardnet is similar to patternnet, you can still use patternnet for this task, however, in order to use this function, every training class label needs to be encoded as a column vector. For example, the "+" class is represented as37_Figure.pngand the "-" class is represented as856_Figure1.png. On the other hand, the predicted values are also in the form of column vectors, and so decoding is needed to convert them to class labels. You might find the functions ind2vec and vec2ind useful for encoding and decoding respectively.

c. Use the NN classifier to predict the approval decisions of the samples in the testing dataset. Obtain and show both the confusion matrix and the receiver operating characteristic (ROC) curve. What is the accuracy of the classifier? Submit your MATLAB script file for this testing and evaluation.

Please submit your MATLAB source codes for parts (a) - (c) in separate MATLAB function/script files. No marks will be given to your answer unless the relevant source codes are submitted.

Q2. We are going to apply K-means clustering on a set of clinical data from 216 patients. This (built-in) dataset can be loaded into the MATLAB workspace using the following MATLAB commands:

load overiancancer;

The clinical data is stored in the matrix obs and the cell array grp indicates whether a patient has overian cancer or not.

a. Cluster the clinical data using MATLAB K-means clustering with K = 2. Submit your MATLAB script file for this process.

b. Using the clustering results and the labels in grp, count and fill in the number of patients for each of the four categories in the table below:

 

Cancer

Normal

Cluster 1

 

 

Cluster 2

 

 

If one is going to predict whether these patients have overian cancer or not by applying classification method on this dataset, is it a good idea? State reasons.

Part II -

Q3. The transactions for a restaurant are shown below:

Transaction Id

Food Ordered

T001

spaghetti, chicken salad, pizza, sandwiches

T002

soft drink, ice-cream, hamburger

T003

lemon tea, hamburger, sushi, noodle soup, curry chicken rice

T004

chicken salad, fruit juice, beef pie, kebab, spaghetti, coffee

T005

noodle soup, ice-cream, sushi, lemon-tea, fruit juice

T006

beef pie, coffee, kebab, noodle soup

T007

hamburger, soft drinks, kebab

T008

chicken salad, fruit juice, pizza, spaghetti

T009

noodle soup, ice-cream, hamburger, sushi, lemon tea, soft drinks

T010

kebab, chicken salad, coffee, pizza, beef pie, spaghetti   

T011

soft drinks, beef pie, curry chicken rice, kebab

T012

sushi, noodle soup, lemon tea, pizza

a. By using either Apriori or FP-tree algorithm, obtain all frequent itemsets for the above transactions using minimum support of 25%. You must show all the manual calculations.

b. Generate the association rules for the maximal frequent itemsets in (a) using a confidence of 75%. You must show all the manual calculations.

Q4. a. Discuss what factors you would consider when choosing an appropriate clustering method for a dataset.

b. Suggest a dataset property that makes it hard to decide an appropriate clustering method?

c. Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.

Dataset Reference:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [https://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Reference no: EM131251512

Questions Cloud

Calculate the players best response functions : Given p and t, calculate the players' best-response functions and the Nash equilibrium of the effort-selection subgame.
What factors appear to be influential : Verify that the eight runs correspond to a design. What is the alias structure?
Choice of hamburger or a veggie burger : A fast food chain offers a special menu for children for which there is a choice of hamburger or a veggie burger or chicken sandwich with either French fries or salad plus one of four beverages. how many special menu possibilities are there?
Analysis of an individual comic strip or political cartoon : Choose two political cartoons or comic strips created by the same person. Which techniques does the creator of the political cartoons use? How does the creator make his/her point in the political cartoons?
Functions to encode the categorical attribute values : Implement specific MATLAB functions to encode the categorical attribute values to numeric values. For example, column 1 consists of only categorical values "a" and "b", and so they can be encoded as "1" and "2" respectively. The function source co..
Mean height for the sample : The probability that the mean height for the sample is greater than 64inches is_______.(Round to four decimal places as needed.)
Find the amount of penalty : Alex was late on his property tax payment to the county. He owed $6,474 and paid the tax 6 months late. The county charges a penalty of 10% simple interest. Find the amount of penalty.
Explain reasoning without explicitly solving the game : Under what conditions on c and t will (H, H) be played in a subgame perfect equilibrium of this game? - Do you think (H, H) can be supported in the negotiation equilibrium of this game? Explain your reasoning without explicitly solving this game.
How are expectations and requirements communicated : What and how are expectations and requirements communicated? When expectations or key deadlines are missed, what happens? What factors contribute to the problem (politics, structure, decision-making factors, etc.)?

Reviews

len1251512

10/21/2016 7:24:50 AM

Please put ALL your assignment answers except MATLAB source codes into a MS Word .doc/.docx file and put this Word file and all MATLAB source codes (.m files) into a zip file for submission. Files in other file formats will NOT be marked.

Write a Review

MATLAB Programming Questions & Answers

  Construct a 2-d random variable

Construct a 2-D random variable, call it (X, Y), where X relates to the company and Y relates to the system health. Recall that this random variable is not well-defined unless its sample space is also given

  Create a project to detect the iris and pupil

Create a Project to do the following: Detect the Iris and pupil, Measure the location "Centroid" of iris/pupil and Measure the diameter of the iris and pupil.

  Implement and demosaic function in matlab

you need to implement and demosaic function in matlab. Load a image in a .png file and process it into a color image. I am required to use 'rggb' with linear interpolation. Can someone help me?

  Write matlab function that used to solve quadratic equation

Write a MATLAB function that can be used to solve a quadratic equation if answers are real numbers (you have to do following steps): 2ax2 - 3bx + 4c = 0. The inputs to the function are the three coefficients a, b, and c.

  Implement copy-move image forgery detection in matlab

Implement Copy-move image forgery detection in Matlab. Paper - Copy-move image forgery detection based on Gabor magnitude. With advancement of media editing software, even people who are not image processing experts can easily alter digital images

  Prepare a project for signal and system

Prepare a project for signal and system course: do the Implementation of an AM, FM and PM Transmitter and Receiver using MATLAB. The project deliverables must include all your MATLAB codes.

  Image segmentation by matlab hi therewhat i need in this

hi there ltbrgt ltbrgtwhat i need in this order is that quotimage segmentationquot. choose any two obvious photos and

  Use regression algorithms

The proposal which are two pages and here is the demands - Use Regression Algorithms or any type to achieve the target In Data Mining matter dealing with E-Learning Students' Data.

  Natural frequencies of vibration of a uniform beam

Natural frequencies of vibration of a uniform beam clamped at one end and free at the other are solutions of the equation - output explain how you know your program has actually found the correct "physically meaningful" roots within the expected err..

  Determine the optimal linear predictor

Determine the optimal linear predictor for a given order N and DPCM simulation and write a graphical user interface that has least the following functions

  Calculate the vertical stress increase

Calculate the vertical stress increase expected 2.0 m under the ground surface, under the centre of a rigid 2m by 2m footing resting on the soil surface if the total contact load the footing places on the soil is 80kN.

  Create a subsection where you present your simulation result

Simulation Results a. For each research question, create a subsection where you present your simulation results. i. As you conduct different computational exercises.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd