Functions to encode the categorical attribute values

Assignment Help MATLAB Programming
Reference no: EM131251512

Part I -

Q1. In this question, we are going to build a neural network (NN) classifier to predict approval of credit card application. For confidentiality reason no attribute names are given in the CSV dataset "credit_data.csv". The last column is the approval status ("+" = approve, "-" = reject). Use the first 75% of the dataset as the training data, and the remainiing 25% as the testing data.

a. The categorical attribute values (e.g. columns 1 and 6) need to be converted to numeric values before they can be utilized for neural network training and classification. This process is called encoding. Implement specific MATLAB functions to encode the categorical attribute values to numeric values. For example, column 1 consists of only categorical values "a" and "b", and so they can be encoded as "1" and "2" respectively. The function source codes must be submitted as MATLAB function files. (.m file).

b. The NN classifier is created using MATLAB function feedforwardnet the following parameters:

Number of hidden layers: 1

Number of neurons: 10

Use default settings for other parameters. Train the classifier using the training dataset. Show the training performance by pasting the performance curve in your answer. Submit your MATLAB script file for this training.

Note: While feedforwardnet is similar to patternnet, you can still use patternnet for this task, however, in order to use this function, every training class label needs to be encoded as a column vector. For example, the "+" class is represented as37_Figure.pngand the "-" class is represented as856_Figure1.png. On the other hand, the predicted values are also in the form of column vectors, and so decoding is needed to convert them to class labels. You might find the functions ind2vec and vec2ind useful for encoding and decoding respectively.

c. Use the NN classifier to predict the approval decisions of the samples in the testing dataset. Obtain and show both the confusion matrix and the receiver operating characteristic (ROC) curve. What is the accuracy of the classifier? Submit your MATLAB script file for this testing and evaluation.

Please submit your MATLAB source codes for parts (a) - (c) in separate MATLAB function/script files. No marks will be given to your answer unless the relevant source codes are submitted.

Q2. We are going to apply K-means clustering on a set of clinical data from 216 patients. This (built-in) dataset can be loaded into the MATLAB workspace using the following MATLAB commands:

load overiancancer;

The clinical data is stored in the matrix obs and the cell array grp indicates whether a patient has overian cancer or not.

a. Cluster the clinical data using MATLAB K-means clustering with K = 2. Submit your MATLAB script file for this process.

b. Using the clustering results and the labels in grp, count and fill in the number of patients for each of the four categories in the table below:




Cluster 1



Cluster 2



If one is going to predict whether these patients have overian cancer or not by applying classification method on this dataset, is it a good idea? State reasons.

Part II -

Q3. The transactions for a restaurant are shown below:

Transaction Id

Food Ordered


spaghetti, chicken salad, pizza, sandwiches


soft drink, ice-cream, hamburger


lemon tea, hamburger, sushi, noodle soup, curry chicken rice


chicken salad, fruit juice, beef pie, kebab, spaghetti, coffee


noodle soup, ice-cream, sushi, lemon-tea, fruit juice


beef pie, coffee, kebab, noodle soup


hamburger, soft drinks, kebab


chicken salad, fruit juice, pizza, spaghetti


noodle soup, ice-cream, hamburger, sushi, lemon tea, soft drinks


kebab, chicken salad, coffee, pizza, beef pie, spaghetti   


soft drinks, beef pie, curry chicken rice, kebab


sushi, noodle soup, lemon tea, pizza

a. By using either Apriori or FP-tree algorithm, obtain all frequent itemsets for the above transactions using minimum support of 25%. You must show all the manual calculations.

b. Generate the association rules for the maximal frequent itemsets in (a) using a confidence of 75%. You must show all the manual calculations.

Q4. a. Discuss what factors you would consider when choosing an appropriate clustering method for a dataset.

b. Suggest a dataset property that makes it hard to decide an appropriate clustering method?

c. Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.

Dataset Reference:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science.

Reference no: EM131251512

What is the max and minimum value of the given vector

What is the max and minimum value of the above vector? In what elements are these values found? Find the square root, cube root and 7th root of this vector using a single oper

Generate and plot a continuous time signal

1.Generate and plot a continuous-time signal and a train of unit impulses using MATLAB 2. Generate samples of a given signal using a train of unit impulses using  MATLAB 3. In

Write a function that takes a list of integers

Using ML write a function that takes a list of integers as argument and returns a pair consisting of the sum of the even position and the sum of the odd positions of the list.

Use matalb to plot the computed eigenvector

Write a Matlab code implementing the inverse power method to compute approximations of the smallest eigenvalue λ1 = π2 of L and corresponding eigenfunctions. Use matalb to plo

Write a user defined function

Write a user defined function FindFrequency which inputs are a time and a wave elevation vector and output the approximated wave frequency (fappr=FindFrequency(t,eta)).

Natural frequencies of vibration of a uniform beam

Natural frequencies of vibration of a uniform beam clamped at one end and free at the other are solutions of the equation - output explain how you know your program has actual

Calculate the response spectra for column - matlab program

I want a program in MATLAB that can calculate the response spectra for the columns. Then we should take a sine function for translation and a different sine function for torsi

Write a program to implement this process

A computer cannot store an entire function in its memory. Typically it stores the values of the function at a finite number of points. Pick values of a, b, v, N, and Δt, and



10/21/2016 7:24:50 AM

Please put ALL your assignment answers except MATLAB source codes into a MS Word .doc/.docx file and put this Word file and all MATLAB source codes (.m files) into a zip file for submission. Files in other file formats will NOT be marked.

Write a Review

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd