Write python code to generate three lists of random numbers

Assignment Help Computer Engineering
Reference no: EM131404588

Homework 6 - R assignment

1. Can we detect when a marketing campaign has been successful?

a. On homework 4, you simulated data from the TableFarm salad chain before and after the implementation of a new marketing campaign.  Read the combined data (both before and after) into R.  (You could do this by saving the data as a .csv file and using read.csv(), or by copying the data into a text file, separating the values by commas, and enclosing the data in c( ... ) to make a vector.)  Homework 4TableFarm  is below.

Average monthly revenue at each store in the TableFarm salad chain is $100,000, with a standard deviation of $12,000. An advertising firm claims they can increase monthly revenue to $120.000, but the standard deviation will be increased as well, to $25,000.

Write Python code to generate three lists of random numbers which model potential revenue: one list with 12 months of revenue using the current mean and standard deviation, another list with 12 months of revenue using the predicted mean and standard deviation, and a third list combining your first two lists. You can assume a normal distribution. Round each number to the nearest $1.000.

b. Make a scatterplot of the data.  Add a vertical line to mark the month in which the new marketing campaign began, and add a legend to your plot.

c. Make side-by-side boxplots of the revenue before and after implementing the marketing campaign.  Write a few sentences describing and comparing the boxplots, and relating them to the underlying model you used to simulate the data.

d. Based on the way you simulated the data, you know that the marketing campaign was successful; that is, the data after implementing the marketing campaign was simulated from an underlying model with a higher mean than before the marketing campaign.  However, in real life we probably wouldn't know this.  Based on the scatterplot and boxplots, would you be confident in claiming that the marketing campaign was successful?  Why or why not?

e. Write the null and alternative hypotheses for a test of whether the marketing campaign was successful.  (I.e., whether the mean revenue with the marketing campaign is higher than the mean revenue before the marketing campaign.)

f. In a few sentences, explain why a 2-sample, 1-sided t-test is appropriate for testing the hypotheses in part e.

g. Conduct a 2-sample, 1-sided t-test in R.  Include the R output and state your conclusion in the context of the problem.

2. Can we detect an association between chocolate consumption and Nobel prizes?  Homework 4 problems reffered to are below:

Researchers have observed a (presumably spurious) correlation between per capita chocolate consumption and the rate of Nobel prize laureates: see Chocolate Consumption. Cognitive Function, and Nobel Laureates. In this problem, we will create some sample data to simulate this relationship.

Write Python code to produce a list of 50 ordered pairs (c, n), where c represents chocolate consumption in kg/year/person and n represents the number of Nobel laureates per 10 million population. The values for c should be random numbers (not necessarily integers!) between 0 and 15. You may assume that c and n are related by

n = 0.4 · c - 0.8.

However, it is not possible for a nation to have a negative number of Nobel laureates, so if your predicted value of n is less than 0, replace that value by 0.

Report your values of c and n to 2 decimal places. Print your list of ordered pairs.

Problem - Error Term

Our list of data from part (a) is not a good simulation of real-world data, because it is perfectly linear. Starting with the c and n values you generated in part (a), generate new n values, using the following formula:

ne = n + c.

Here c should be a random variable with normal distribution, mean 0, and standard deviation 1. Using the list of ordered pairs generated in 3(a), create a new list of 50 ordered pairs (c, ne).

Again, your simulated data should not predict negative numbers of Nobel laureates. Again, do not generate a new list; make sure to use the list of ordered pairs already generated in 3(a).

Print your new list of ordered pairs.

a. On homework 4, you simulated data on countries' per-capita chocolate consumption and number of Nobel Prize winners, using an error term ? (representing random "noise").  Read these data into R and make a scatterplot of the number of Nobel Prize winners versus chocolate consumption.

b. Fit a linear model to the data.  What is the equation of the line of best fit?  How does it compare to the theoretical model you used to simulate the data?  Graph the line of best fit with the scatterplot.

c. State the null and alternative hypotheses for a test of whether the number of Nobel Prize winners (per 10 million population) is associated with per-capita chocolate consumption.

d. State your conclusion about the hypotheses in part c, in the context of the problem.

e. Graph the diagnostic plots for the regression. Explain what they tell us.

3. In homework 5, you counted the frequencies of letters in two encrypted texts.  In this problem, you will use statistical analysis to identify the language in which the text was written, and decrypt it.

a. Read the letter frequencies from encryptedA into R and attach the data.  Use the following code to make a barplot of the letter frequencies, with the letters listed in order of increasing frequency:  (Here I've assumed that your columns were named "key" and "count".)

encrypt_order = order(count)

barplot( count[encrypt_order], names.arg = key[encrypt_order] )

Be sure you understand what this code does.

b. The file Letter Frequencies.csv contains data on the frequencies of letters in different languages.  (Source:  https://www.sttmedia.com/characterfrequency-englishand https://www.sttmedia.com/characterfrequency-welsh, accessed 21 August 2015.  Used by permission of Stefan Trost.)  Read these data into R. 

c. In a single graphing window, display two bar plots:  A plot on top showing the encrypted frequencies, and a plot below it showing the frequencies of letters in English.  Each plot should be sorted in order of increasing frequency.  Each plot should also have a title telling whether it is from the encrypted text or from plain English.

d. Based on the shape of the plots, do you think it is likely that the encrypted text came from English?  Explain.

e. We want to conduct a hypothesis test to be more precise about whether it is plausible that the text came from English.  To do this, we will pair up each letter in the encrypted text with a letter in English, based on the order of frequency.  So, encryptedA "r" is paired with English "e", encryptedA "c" is paired with English "t", etc.  Then we will test whether the resulting letter frequencies plausibly come from a random sample of English words.

To pair up the letters, sort the vector of counts from the encrypted text in order of increasing frequency, and store it as a new vector.  Then do the same thing with the vector of frequencies from English.

f. To pair up the letters, we need the data (the counts of letters from encryptedA.txt) and the probability model (the theoretical frequencies from Letter Frequencies.csv) to have the same number of letters.  Depending on how you formatted your output from Python, your letter counts may include 20 or 26 letters.  This is due to the fact that some letters did not appear in the encrypted text, so they appeared 0 times.  If necessary, prepend 6 zeroes to the count vector to make it the same length as the theoretical frequencies:

count = c( rep(0, 6), count )

g. State the null and alternative hypotheses for a chi-squared Goodness of Fit test of this question.

h. To satisfy the assumptions of a Goodness of Fit test, we need the expected counts of each category to be greater than or equal to 5.  Find the total number of letters in the encrypted text.  Then multiply this number by the probabilities from Letter Frequencies.csv to get the expected counts. 

i. Combine categories (letters) to get expected counts that are greater than or equal to 5.  For example, if you decided to combine the first two categories, you could use the code

sortEnglish_combined = c( sum(sortEnglish[1:2]), sortEnglish[3:26] )

Combine the same categories in the encrypted counts.

j. Use R to conduct the chi-squared Goodness of Fit test. 

k. State your conclusion in the context of the problem.

l. Repeat stepsh-k for Welsh, and then repeat for both languages for encryptedB.  Based on the hypothesis tests, which text do you think came from which language?  How confident are you in your assessment?

m. Optional:  Try to decrypt the English text.  Simon Singh's Black Chamber website (https://www.simonsingh.net/The_Black_Chamber/substitutioncrackingtool.html) will automatically substitute letters for you, so you can test different possibilities for what English plaintext letter is represented by each letter in the ciphertext.  Start by substituting the letter E for the most common letter in the ciphertext.  Then use frequencies of letters in the ciphertext, common patterns of letters, and experimentation to determine other substitutions.

Attachment:- Assignment Files.rar

Reference no: EM131404588

Questions Cloud

What is the direction of the association : Draw a scatterplot of y = August temperature versus x = latitude.- Is the pattern linear or curvilinear? What is the direction of the association?
Describe purpose and content of each of the given reports : Explain the purpose and content of each of these reports. Express your thoughts on other types of financial statement reports such as Qualified Opinions, Adverse Opinions, and Disclaimer of Opinions.
Discuss about the ethical and environmental responsibility : Ownership of "commons'-some argue that natural resources such as water or air should be publicly-held (i.e., by governments and citizens), others argue that privatizing them (i.e., shifting ownership to private companies) will actually contribute ..
Write a memo to the budget managers in an organization : Write a memo to the budget managers in an organization, assuming you are the CEO. Your memo should direct budget managers not to utilize this approach, but also offer a logical rationale, and perhaps an alternative solution.
Write python code to generate three lists of random numbers : DS 710 Homework 6 - R assignment. Write Python code to generate three lists of random numbers which model potential revenue: one list with 12 months of revenue using the current mean and standard deviation, another list with 12 months of revenue u..
Which procedures is least effective in gathering information : Which of the following procedures is the least effective in gathering information about the nature of the processing and potential problems?
Which of the two lines is better for given data : Determine the sum of squared errors (SSE) for each of the following two lines:- By the least squares criterion, which of the two lines is better for these data? Why is it better?
Explain your promotional strategy for your product : Prepare a 15- to 20-slide Microsoft® PowerPoint® presentation illustrating your promotional strategy.Compile the information presented in your previous papers. (The previous papers has been on the a new product launch that will allow consumers to ..
View completion of videos is greatest on which device : One benefit of focusing on delivering content to a core audience is that it can result in higher conversion rates. What should be considered when determining how long a podcast should be?Many professional or corporate blogs fail because they do not a..

Reviews

len1404588

2/23/2017 5:20:49 AM

Submit a single .docxor .pdf file to GitHub containing your R code, R output and graphs, and your written interpretations and explanations.Include your name at the top of the file. Keep all portions of a problem together (don’t put all the R code at the end of the file). Note: The order of the letters along the horizontal axis of each plot will be quite different, because one plot shows the frequencies in plain English, and the other shows the frequencies in the encrypted text. So, you should ignore what letter is written below each bar when answering this question. Instead, look at things like the relative frequency of the most-common letter and the second-most common.

Write a Review

Computer Engineering Questions & Answers

  How a first responder will identify and preserve evidence

the video South Tower Falls, Shot Front of Trinity Church, describe your concerns as a first responder. (For the purpose of your response assume that you do NOT know whether this is a chemical, biological, radiological, nuclear, or simple explosiv..

  Express a mechanism to access the telephone customer

express a mechanism to access the telephone customer.describe your selection and discuss the speed of the search operation. Keep in mind that a million records cannot be kept in one array. There is no need to code in C++.

  Carlisle carpets wants a program to calculate carpet costs

carlisle carpets wants a program to compute carpet costs. create the logic for a program that prompts the user for two

  Apple inc is known for its state-of-the-art designs for

apple inc. is known for its state-of-the-art designs for products such as the iphone but most are unaware that apple

  Generate a work breakdown structure code

Generate a Work Breakdown Structure Code

  Describe three interfaces you interact with on a daily basis

Describe three interfaces you interact with on a daily basis. Analyze each interface you identified in Question one and assess how it adheres to Mandel's five golden rules.

  Software is required for a simple house burglar alarm

software is required for a simple house burglar alarm system.house burglar alarm specificationa house has two rooms

  Companies sometimes purchase expensive applications

Companies sometimes purchase expensive applications, and after that under-utilize or misuse these. In a sense, the economic downturn has been beneficial, as many companies are beginning to more regularly scrutinize the programs they are investing i..

  Question 1 alice and bob are sending a message m to each

question 1. alice and bob are sending a message m to each othera give one method to achieve confidentiality for m.b

  Describe how metrics could be used during testing to find

explain how metrics could be used during testing to determine the effectiveness of the testing process and to predict

  Sapient is an international company based in massachusetts

sapient is an international company based in massachusetts. it has developed a unique and innovative agile methodology

  Give a two to three paragraphs explaining soap

Simple Object Access Potoccol. How is SOAP linked to XML and HTTP? Assume that your readers are familiar with browsing the internet, but they have no technical knowledge of how it works behind the scene.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd