Reference no: EM132368077
Computer Exercise #1
Computer Exercise 1.1 (Generative Bayesian Classification Using GMMs):
We have seen how a Gaussian mixture model of the form
p(x) = ∑Kk=1 ΠkN (x|µk, Σk)
may be used to approximate any density to arbitrary accuracy. Finding the parameters of the Gaussian densities, µk and Σk, along with the mixing coefficients, Πk, from a set of training samples, X = [x1, x2, . . . , xN ]T is not an easy problem. The reason is that it is not known from which Gaussian in the mixture a data point xn comes from, and therefore which samples should be used to estimate the means and covariances of the Gaussian density. The Expectation Maximization (EM) algorithm, however, provides an iterative approach to find these parameters.
In this exercise, we look at binary classification using a Bayes classifier. The probability density functions for each class will be modeled as a mixture of Gaussians, and the Gaussian Mixture Model (GMM) parameters will be estimated using the EM algorithm. In the following, the individual steps necessary to write a python program to design the classifier will be given. On Piazza, all of the commands found below are available in a Jupyter notebook, so all you need to do is fill in the missing pieces.
(a) To design and implement your Bayes' classifier, there are a number of standard imports (li- braries) that you will need. So the first statement in your python code should be the following:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split from scipy.stats import multinomial
from sklearn.mixture import GaussianMixture as GMM
(b) Create a data set of 500 two-dimensional vectors, X1, that are samples associated with class ω1 that will be used for training and testing. These vectors are drawn from a mixture of three Gaussians densities, N (x|µi, Σi) for i = 1, 2, 3 where the means and covariances are
µ1 = [1.25, 1.25]T , Σ1 = 
µ2 = [2.75, 2.75]T , Σ2 = 
µ3 = [2.00, 6.00]T , Σ3 = 
and the mixing coefficients are Π1 = 0.4, Π2 = 0.4, and Π3 = 0.2.
The following python code may be used to generate this data set and put it into the array X1.:
#
# Define the three Gaussian means, covariances, and mixing coefficients #
mu_true = np.array([[1.25,1.25],[2.75,2.75],[2,6]])
sigma_true = np.array([[[0.1,0],[0,0.1]],[[0.2,0],[0,0.2]],[[0.3,0],[0,0.3]]])
lambda_true = np.array([.4, .4, .2]) #
# Create an nx3 vector that has a one in the kth position with probabilities given # in the array lambda_true.
#
z = multinomial.rvs(1, lambda_true, size=n) #
# Draw samples from the Gaussian mixture and
# create an array, y, containing the class label for each sample (in this case, 1) n = 500
X1 = np.array([np.random.multivariate_normal(mu_true[np.argmax(i)], sigma_true[np.argmax(i)]) for i in z])
y=np.ones((n,), dtype=int)
Note that X1 is an array of size n × 2 where n = 500 as you may verify by typing X.shape.
It should be pointed out that you may think that the Gaussian mixture could be generated by creating three sets of samples, each one drawn from one of the three Gaussian densities, with the number of samples drawn in proportion to the probability weights, and then combining the three sets of data samples. However, these samples would not be generated according to the model, which assumes that there a random variable, Y , that is sampled to determined which of the three Gaussian densities are used to generate a data sample.
Shown in Fig. ?? is what your dataset might look like (the blue '+'s). Also shown are the contours of the GMM. Those points labeled with orange 'x's are the training samples for class ω2 that are generated in part (e).
(c) The next step is to partition the data set X1 into a training set and a test set. This is a necessary step in any machine learning project since you never want to use data used for testing in your training set. This partitioning of your data set may be done with the train_test_split class as follows:
X1_train, X1_test,y1_train,y1_test=train_test_split(X1,y1,test_size=0.25)
where, in this case, one fourth of the samples are held out for testing.
(d) You now are ready to model the data in the training set X1_train using a GMM. Assuming that you do not know anything about the data sets, use the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) to estimate the appropriate value for the number of Gaussians to use in the mixture. An example showing how to find and plot the AIC for data stored in the array data over a range from one to 20 Gaussians is shown below (also read the documentation on the GaussianMixture class):
n_components = np.arange(1,20)
models = [GMM(n, covariance_type='full', random_state=0) for n in n_components]
aics = [model.fit(data).aic(data) for model in models] plt.plot(n_components, aics);
What size mixture should you consider using? Discuss.
(e) For this part, set the number of components in your GMM equal to three. Note that it is assumed that the model corresponds exactly to the way that the data was generated and this will obviously not be the case, in practice. It does, however, allow us to see how well we are able to estimate the true parameters of the model.
plot1.pdf
Figure 1: Mixture of three Gaussian for two classes ω1 and ω2.
To find the GMM, the class GaussianMixture may be used, which uses the expectation- maximization (EM) algorithm to find the GMM parameters for a set of data samples. For ex- ample, the following will fit a sum of three Gaussians to the data stored in the array X1_train::
# Fit a Gaussian mixture with EM using three components #
gmm = GMM(n_components=3,covariance_type='full') gmm.fit(X1_train)
You have a choice on what covariance parameters to use, as specified by covariance_type. The options are full, tied, diag, and spherical.
Discussion: Read the documentation to see what the differences are and determine which one would be most appropriate for this problem, and why. Discuss whether or not there are any differences, from a computational point of view, between the various options.
The weights, means and covariances are attributes of the gmm clalss and are given by gmm.weights_, gmm.means_ and gmm.covariances_. You may, for example, print the GMM mixing coeffi- cients with the command
print(gmm.weights_)
or use them in any method that requires these parameters.
Discussion: How well does your GMM model the data, i.e., how close are the estimated parameters to the true parameters? Are some more difficult to estimate than others? Do different initializations lead to different parameters? If so, do they differ much?
(f) Repeat parts (b)-(e) to create a second data set X2 that consists of samples associated with class ω2 and are drawn from a mixture of three Gaussians that have means and covariances
µ1 = [1.25, 2.75]T , Σ1 = 
µ2 = [2.75, 1.25]T , Σ2 = 
µ3 = [4.00, 6.00]T , Σ3 = 
and mixing coefficients Π1 = 0.2, Π2 = 0.3 and Π3 = 0.5. This data should look something like that shown in the previous figure represented by the orange 'x's.
(g) Having Gaussian mixture models p(x|ω1) and p(x|ω2) for both classes, you are now to design and implement a Bayes classifier that will classify a new sample x as either being from class ω1 or class ω2. You may assume that the two classes are equally likely.
Building the Bayes Classifter: There are a couple of different ways that you may proceed. One is to note that with the assumption that Pr{ω1} = Pr{ω2} , the Bayes classifier has the form
ω1
p(x|ω1) p(x|ω2)
This method score_samples may be used to compute the weighted log probability of a sample. For example, for a test set X_test
Z=gmm.score_samples(X_test)
uses the GMM parameters found with the gmm object, and returns an arry containing the weighted log probabilities of each sample in X_test. By comparing the two log probabilities associated with GMMs for ω1 and ω2, a decision can be made about which class the samples belong.
(h) Use the test sets X1_test and X2_test to find the classification error for your classifier. (i)Your results so far assume that you use the correct number of Gaussians in the GMM. Examine what happens if, instead of K = 3 you were to use K = 2 or K = 5. Discuss your findings.