What is the probability of getting a royal flush

Assignment Help Applied Statistics

Reference no: EM132275486

Modelling for Data Analysis Assignment -

This assignment contains 6 questions.

1. Probabilities in Cards

Have a regular deck of cards with no jokers (13 cards per suit, 4 suits) giving 52 cards. Suppose we draw a 5 card hand, so 5 cards without replacement. For each answer write out the full calculation in R to show working.

1.1 A special flush

What is the probability of getting a royal flush but where the cards ordered by rank have alternate color? Note in a proper royal flush, it is all the one suit, but we have changed that to alternate colour. So, for example: red 10, black J, red Q, black K, red A. Note the order in which they are drawn from the pack is not considered.

1.2 No repeats

What is the probability that in the sequence of cards, as they are drawn, no rank occurs twice in a row? So ignoring the suit, the following are allowed: A, 10, 4, J, 10 or A, 10, A, 4, A, but the following are not allowed: A, A, 10, 4, A (A repeated in positions 1 and 2), A, 4, 10, 10, J (10 repeated in positions 3 and 4).

2. PDF and Expectations

Let X have the PDF given by a function with a different negative and positive part.

f(x) = 12/7 (1 + x)² for -1 < x ≤ 0

= 12/7 (1 - x)³ for 0 < x < 1

= 0 otherwise

You can use Wolfram Alpha to do the definite integrals.

2.1 Plot

Draw the plot in R.

2.2 Mean

Find E(X). Why is it not zero?

2.3 Variance

Find variance, V ar(X).

2.4 Skewness

Find skewness, using the formula in the lecture notes. Interpret the value.

3. Distributions

One study has evaluated a number of leukaemia records in a rural area. The population of the area was 35,000. In a year there were 16 leukaemia cases identified, of which 4 where not local residents but tourists or new immigrants (of which there are not many). In a general population, the annual rate of leukaemia is typically about one in 10,000.

3.1 Model

Describe the model you recommend to use for the counts, and estimate the parameters using suitable point estimates.

3.2 Checking

Also, consider the hypothesis, "the annual rate of leukaemia in the area is 1/10,000?" Assume this is the rate for the residents only. Plot the distribution over counts under this hypothesis. Where does your data lie, and do you think it is consistent with the hypothesis?

4. Entropy

In this question, we will use a modified version of the Titanic dataset from the Kaggle competition, Titanic: Machine Learning from Disaster? The dataset includes information about passenger characteristics as well as whether they survived from the disaster.

Import the Titanic data using the following R code:

df <- read.csv("Titanic.csv",header=TRUE, sep=",")

Now Survived is Boolean so convert to a truth value with:

df[['Survived']] <- df[['Survived']]==1

4.1 Conditional probabilities

Compute tables for the frequency estimates of P(Survived), P(Survived|Pclass = val) and P(Survived|Gender = val), for different vals. Do the computation in R. But it's OK to present the final table as a separate Word table (since it might be hard to layout in R). What does this tell you about survival?

4.2 Entropies

Calculate the entropy (log₂()) of Survived, H(Survived) and the conditional entropy of Survived given Pclass, H(Survived|Pclass), and of Survived given Gender, H(Survived|Gender). Do not use an entropy function but write the code yourself. Use R functions table() and prop.table() to gather stats and form probabilities from the data frame. What do these three entropies tell you about Survived?

4.3 Coding

Consider the joint space (Survived, Pclass) which has six outcomes, (True, 1), (True, 2), (True, 3), (False, 1), (False, 2), (False, 3). Develop an efficient binary prefix code to transmit these outcomes. Would it be adequate to just provide the codelengths, or is a code needed too? Justify your answer.

5. Maximum likelihood estimation of parameters

One of the central problems of sensory neuroscience is to separate the recordings of background physiological processes that are irrelevant (noise), from neural responses that are of experimental interest (signal). This is by no means an easy task, as the signals that neurons produce when they fire are extremely weak and more random. It is therefore of particular interest to examine the randomness of neuro signals as this allows researchers to study the brain at a cellular level.

Let's assume that we have conducted one experiment and recorded the spike signals from one particular neuron for a duration of 15 seconds. After some data processing, we can obtain spike signals with data given by a time in seconds and a spike size, similar to the following data and figure.

5.1 Model

Let us assume that the rate of signals remains constant over time, and the size of each signal is independent of time too. If the rate of the signals remains constant over time, which distribution would most suit to model the probability distribution for the number of spike signals over 15 seconds? Why? Briefly answer this question in a sentence or 2. Also, while we don't know enough to suggest a distribution for spike sizes, but what properties should it have?

5.2 Maximum likelihood fitting

Using the model above, what is the log-likelihood function for number of spike signals for the period of experiment time, and what is the maximum likelihood estimate for its parameters?

You're told that a candidate distribution for spike sizes is the Weibull with shape given by 0.7 and unknown scale, between 0.5 and 2. This is supported in R using the [dpqr]weibull() functions. One can do maximum likelihood fitting using the Weibull density on the unknown parameter. Use the optimize() function for that, so something like

'optimize(fn, c(minvalue,maxvalue), maximum = TRUE, tol = .Machine$double.epsˆ0.25)'

6. Central Limit Theorem

Assume that we draw random integers from a Poisson distribution with rate one of λ₁ = 1, λ₂ = 5, or λ₃ = 20.

6.1 Sampling distribution

According to Central Limit Theorem what is the sample mean and sample standard deviation, for the three rates λ₁, λ₂, λ₃, when we have sample size of 10, 100, 1000 and 10000? Give the theory the compute the values in R.

6.2 Simulation

Experimentally justify the result in the CLT that says the sample mean has a mean given by the population mean and a variance given by the population variance divided by sample size. See the CLT Theorem in Lecture 4. Use simulation given sample a size of 10, 100 and 1000. For each given sample size use 50000 simulations to compute samples and their means. From these means compute the mean and variance of the sample means, and discuss how results reflect the CLT. Plot the results (3 sample sizes and 3 rates with mean and SD) to demonstrate any effects you want to discuss.

6.3 Plotting normality

When rate λ₁ = 1 and λ₂ = 5 and sample size is 10 or 100, obtain the z scores of the sampling means (from 50000 simulations). Plot the distributions in a histogram with the theoretical Gaussian curve overlaid. Note for sample size 100, the plots overlay very nicely. But what happens with sample size 10? Explain the differences between the four plots.

For each simulation: the z score of the mean can be calculated as:

(X¯ - µ)/σ/√n,

where X¯ is the mean of the sample, µ is the population mean and σ is the population SD.

Attachment:- Assignment File.rar

Reference no: EM132275486

Questions Cloud

Why did you choose this experience to share : CMR244 - Introduction to Global Business : What additional research have you conducted about this global perspective or cultural differences?

First step of the networking interview process : Which of the following should be included in the one-minute self-sell? Which of the following is the first step of the networking interview process?

What gains does such a miner obtain by exhibiting : (a) What gains does such a miner obtain by exhibiting such a behavior?

Describe the key elements of evidence-based public health : Healthy People 2020 calls for the various sectors in public health to "to strengthen policies and improve practices that are driven by the best available.

What is the probability of getting a royal flush : FIT 5197 Modelling for Data Analysis Assignment, Monash University, Australia. What is the probability of getting a royal flush but where the cards ordered

How would you argue for the use of philosophical reasoning : Write a 4-6-page essay that addresses three key concepts related to the philosophical and historical development of modern science.

Relationships for purpose of politicking and socializing : ________ is the ongoing process of building interconnected relationships for the purpose of politicking and socializing.

Calculate the stock price : Assuming semi-annual compounding, what is the price of a zero coupon bond that matures in 3 years if the market interest rate is 5.5 percent?

Low compared to traditional payment system : Bitcoin transaction rate is 7 transactions per second which is very low compared to traditional payment system. (a) Provide 2 suggestions for changes

Reviews

len2275486

4/4/2019 11:51:30 PM

Marks - This assignment contains 6 questions. There are 25 marks in total for the assignment and it counts for 25% of your overall grade for the unit. Also, 3 of the 25 marks are awarded for code quality and 2 of the marks awarded for presentation of results, for instance use of plots. That leaves 20 marks for individual answers. You must show all working, so we can see how you obtain your answer. Marks are assigned for working as well as the correct answer. Your solutions - Please put your name or student number on the first page of your solutions. Do not copy questions in your solutions. Only include question numbers. If you use any external resources for developing your results, please remember to provide the link to the source.

4/4/2019 11:51:23 PM

Special consideration and late submissions - Please contact for special consideration as per Monash policy. If an extension has been given then submission after the due date is allowed with no penalty being incurred. If no extension has been given then assignments submitted after the due date, there will be penalised 5% per day up to a maximum of 10 days late.

4/4/2019 11:51:17 PM

Submitting your assignment on Moodle - Please submit your assignment through Moodle via upload a Word document as well as R markdown you used to generate results. If you choose to use R markdown file to directly knit Word document. You would need to type in Latex equations for Question 1,2 and 5. Find more information about using latex in R markdown files here. You may also find the R markdown cheatsheet useful. You can also work with Word and R markdown separately. In this case you would need to type your answers in Word and also copy R code (using the format: Courier New), results and figures to the Word document. We will mark your submission mostly using your Word document. However, you need to make sure your R markdown file is executable in case we need to check your code.

4/4/2019 11:51:09 PM

Code quality marks - Your R code will be reviewed for conciseness, efficiency, explainability and quality. Inline documentation, for instance, should demarcate key sections and explain more obtuse operations, but shouldn’t be over verbose. Out of the 25 marks, 3 will be awarded for code quality. Presentation marks - Your presentation of results using R will be reviewed. How well do you use plots or other means of ordering and conveying results. Out of the 25 marks, 2 will be awarded for presentation using R.

Write a Review

Required(*) Message

User Account

All Pages