Introduction and Purpose
In the lectures and tutorials you have been introduced to a number of probability distributions. You have also learned that modelling any system (such as communication systems), requires the analysis of the input data to be fed to the model. Random variables play an important role in the development of a model, as they represent the input(s) to the modelled system. In order to use random variables in the modelling process, they need to be analysed and tested to verify that they represent a close fit to the real-world input. In order to do that, a 'Goodness of Fit' test is applied to a data sample in order to accept or reject a certain hypothesis. A hypothesis, generally states that a particular data sample conforms to a certain probability distribution.
The purpose of this assignment is to determine via statistical analysis the probability distributions of the numerical data contained in two *.csv (comma separated) files.
Scenario and advice
There are two data files, Data(a).csv and Data(b).csv. The files have been logged by a communications device and represent the input to a system. In order to determine the effects on the output of the system, we need to be able to determine their probability distributions. If you read the files in a text editor, it should be apparent that one maybe a continuous distribution and the other possibly a discrete distribution (however the type of distributions are unknown. To resolve this it is suggested that you carry out the following:
Minimum objectives
1. For both files, read them into MATLAB (one at a time) using the csvread('filename') function and save them to a vector.
2. Calculate the mean and standard deviation using either custom functions or the functions built into MATLAB.
3. Create a q-q plot of the data using the qqplot(x) function (this should help you to guess the data's distribution).
4. Guess a probability distribution (create a null hypothesis) and create a dataset with approximately the same number of elements as the data provided in the data files.
5. Create a custom Chi-Square function in MATLAB
6. Carry out a Chi-Square analysis of the data (reference tables are on chi-square-table.pdf)
7. Complete a regression analysis (if appropriate) and determine the equation of the line (please refer to lecture 4 for an example)
8. Report on all your findings.
Additional objectives
1. Using the results obtained above select a data distribution and sample it using the students t distribution (please refer to slide 21 in lecture 2)
2. Demonstrate graphically how, as the number of degrees of freedom increases, the student's t distribution approximates the normal distribution and plot the variance between the curves.
3. Verify your findings in 6 above using the Kolmogorov-Smirnov test (kstest(x)).
4. Report on all your findings.
Deliverables
The following must be submitted by the date outlined above:
An individual report (times new roman pt 12, single lined spaced) that outlines your solution and the development of your MATLAB simulation. It should include as a minimum an explanation of your MATLAB implementation, screen captures of the MATLAB plots, and also a summary of the input analysis including a plot of the quantile-quantile graph, a plot of the sample distribution and any functions you have developed in appendices. The report should approximate the following structure: Title page, Contents page, List of figures, Introduction, MATLAB Implementation, Input Analysis, Conclusions and Appendices (make sure all MATLAB code is clearly available in the appendices)
Note about Plagiarism
§ Any sentences, including any definitions that are copied word for word are in quotation marks and cite the source(s);
§ Any figures copied include citations to sources;
§ Any code that is taken from any source (text book, WWW, journals etc) is fully acknowledged.