Calculate and plot risk and return profile of stocks

Assignment Help Applied Statistics
Reference no: EM132378111

Assignment - Inferential Statistics

Task 1: Climate change and temperature anomalies

If we wanted to study climate change, we can find data on the Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies in the Northern Hemisphere at [NASA's Goddard Institute for Space Studies].

To define temperature anomalies you need to have a reference, or base, period which NASA clearly states that it is the period between 1951-1980.

You have two objectives in this section:

1. Select the year and the twelve month variables from the 'weather' dataset. We do not need the others (J-D, D-N, DJF, etc.) for this assignment. Hint: use 'select()' function.

2. Convert the dataframe from wide to 'long' format. Hint: use 'gather()' or 'pivot_longer()' function. Name the new dataframe as 'tidyweather', name the variable containing the name of the month as 'month', and the temperature deviation values as 'delta'.

Inspect your dataframe. It should have three variables now, one each for

1. year,

1. month, and

1. delta, or temperature deviation.

Plotting Information - Let us plot the data using a time-series scatter plot, and add a trendline. To do that, we first need to create a new variable called 'date' in order to ensure that the 'delta' values are plot chronologically.

In the following chunk of code, I used the 'eval=FALSE' argument, which does not run a chunk of code; I did so that you can knit the document before tidying the data and creating a new dataframe 'tidyweather'. When you actually want to run this code and knit your document, you must delete 'eval=FALSE', not just here but in all chunks were 'eval=FALSE' appears

'''{r scatter_plot, eval=FALSE, warning=FALSE}

tidyweather <- tidyweather %>%

mutate(date = ymd(paste(as.character(Year), Month, "1")),

month = month(date, label=TRUE),

year = year(date))

ggplot(tidyweather, aes(x=date, y = delta))+

geom_point()+

geom_smooth(color="red") +

theme_bw() +

labs (

title = "Weather Anomalies"

)

'''

Is the effect of increasing temperature more pronounced in some months? Use 'facet_wrap()' to produce a seperate scatter plot for each month, again with a smoothing line. Your chart should human-readable labels; that is, each month should be labeled "Jan", "Feb", "Mar" (full or abbreviated month names are fine), not '1', '2', '3'.

'''{r facet_wrap, echo=FALSE, warning=FALSE}

It is sometimes useful to group data into different time periods to study historical data. For example, we often refer to decades such as 1970s, 1980s, 1990s etc. to refer to a period of time. NASA calcuialtes a temperature anomaly, as difference form the base periof of 1951-1980. The code below creates a new data frame called 'comparison' that groups data in five time periods: 1881-1920, 1921-1950, 1951-1980, 1981-2010 and 2011-present.

We remove data before 1800 and before using 'filter'. Then, we use the 'mutate' function to create a new variable 'interval' which contains information on which period each observation belongs to. We can assign the different periods using 'case_when()'.

'''{r intervals, eval=FALSE}

comparison <- tidyweather %>%

filter(Year>= 1881) %>% #remove years prior to 1881

#create new variable 'interval', and assign values based on criteria below:

mutate(interval = case_when(

Year %in% c(1881:1920) ~ "1881-1920",

Year %in% c(1921:1950) ~ "1921-1950",

Year %in% c(1951:1980) ~ "1951-1980",

Year %in% c(1981:2010) ~ "1981-2010",

TRUE ~ "2011-present"

))

'''

Inspect the 'comparison' dataframe by clicking on it in the 'Environment' pane.

Now that we have the 'interval' variable, we can create a density plot to study the distribution of monthly deviations ('delta'), grouped by the different time periods we are interested in. Set 'fill' to 'interval' to group and colour the data by different time periods.

'''{r density_plot, eval=FALSE,, warning=FALSE}

ggplot(comparison, aes(x=delta, fill=interval))+

geom_density(alpha=0.2) + #density plot with tranparency set to 20%

theme_bw() + #theme

labs (

title = "Density Plot for Monthly Temperature Anomalies",

y = "Density" #changing y-axis label to sentence case

)

'''

So far, we have been working with monthly anomalies. However, we might be interested in average annual anomalies. We can do this by using 'group_by()' and 'summarise()', followed by a scatter plot to display the result.

'''{r averaging, warning=FALSE, eval=FALSE}

#creating yearly averages

average_annual_anomaly <- tidyweather %>%

group_by(Year) %>% #grouping data by Year

# creating summaries for mean delta

# use 'na.rm=TRUE' to eliminate NA (not available) values

summarise(annual_average_delta = mean(delta, na.rm=TRUE))

#plotting the data:

ggplot(average_annual_anomaly, aes(x=year, y= annual_average_delta))+

geom_point()+

#Fit the best fit line, using LOESS method

geom_smooth() +

#change to theme_bw() to have white background + black frame around plot

theme_bw() +

labs (

title = "Average Yearly Anomaly",

y = "Average Annual Delta"

)

'''

Hypothesis Test -

A one-degree global change is significant because it takes a vast amount of heat to warm all the oceans, atmosphere, and land by that much. In the past, a one- to two-degree drop was all it took to plunge the Earth into the Little Ice Age.

Your task is to determine (test) whether the difference in average temperature deviation (delta) since 2011 is (statistically) significantly different from 1.5 degrees.

First, state what you are doing. What is your null hypothesis? What is your alternative hypothesis?

Confidence Interval for delta -

Let us construct a confidence interval for the average annual delta since 2011. Recall that the dataframe 'comparison' has already grouped temperature anomalies according to time intervals; we are only interested in what is happening between 2011-present.

'''{r, calculate_CI_by_hand}

formula_ci <- comparison %>%

# choose the interval 2011-present

# what dplyr verb will you use?

# calculate summary statistics for temperature deviation (delta)

# calculate mean, SD, count, SE, lower/upper 95% CI

# what dplyr verb will you use?

#print out formula_CI

formula_ci

'''

### t-stat for observed delta

In hypothesis testing, we want to calculate a **t-stat**, namely how far away is what we obsevred (the actual mean delta since 2011), from what we assumed, but expressed not in degrees Celsius, but rather in standard errors. Given the 'formula_ci' numbers calculated earlier, how far away is the observed (actual) mean delta from 1.5 degrees?

What is the data showing us? Please Type your answer after (and outside!) this blockquote. You have to explain what you have done, the result of your test, and the interpretation of that result. One paragraph max, please!

Task 2: IMDB ratings: Differences between directors

I would like you to explore whether the mean IMDB rating for Steven Spielberg and Tim Burton are the same or not. I have already calculated the confidence intervals for the mean ratings of these two directors and as you can see they overlap.

<center>

![](images/directors.png)

</center>

You should use both the 't.test' command and the 'infer' package to simulate from a null distribution, where you assume zero difference between the two.

Before anything, write down the null and alternative hypotheses, as well as the resulting test statistic and the associated t-stat or p-value. At the end of the day, what do you conclude?

You can load the data and examine its structure

'''{r load-movies-data, message=FALSE, warning=FALSE}

movies <- read_csv(here::here("Data", "movies.csv"))

glimpse(movies)

'''

Your R code and analysis should go here. If you want to insert a blank chunk of R code you can just hit 'Ctrl/Cmd+Alt+I'

'''{r}

'''

Task 3: Calculate and plot risk/return profile of stocks

We will use the 'tidyquant' package to download historical data of stock prices, calculate returns, and examine the distribution of returns.

The 'tidyquant' package allows us to download historical prices for many financial assets, most of them coming through Yahoo Finance. We must first identify which stocks we want to download data for, and for this we must know their **ticker** symbol; Apple is known as AAPL, Microsoft as MSFT, McDonald's as MCD, etc.

In September 2017, Samir Khan from the [investexcel.net website](https://investexcel.net/all-yahoo-finance-stock-tickers/) got a list of all Yahoo finance tickers, and we will use a modified version of that list of tickers.

'''{r get_tickers, warning=FALSE, message=FALSE}

tickers <- read_csv(here::here("Data","yahoo_finance_tickers.csv"))

'''

The 'tickers' dataframe contains 207,533 tickers of various instruments, the 'name' of the instrument, the 'exchange' it is traded at, which market sector it belongs to ('category_name'), and the 'type' of the instrument, namely stocks, market indices, or ETFs.

Based on this dataset, I want you to create two bar plots:

A bar plot that shows the top 25 countries with respect to tickers. The bars should be arranged with the first one being the largest, etc.

Similary, a bar plot with the top 25 market sectors ('category_name'), again arranged in descending order.

'''{r bar_plots_counry_category}

# YOUR CODE GOES HERE

'''

Next, choose around a dozen stocks, preferably from your country, or a sector ('category_name') that interests you. If I had chosen AAPL and MSFT, I would create a variable 'my_tickers <- c("AAPL", "MSFT")' and would then use tidyquant to download the last 3 years worth of data.

'''{r get_price_data, message=FALSE, warning=FALSE}

 

my_tickers <- c("AAPL", "MSFT") # enter chosen tickers here-- *NOT* AAPL, or MSFT, sorry

myStocks <- my_tickers %>%

tq_get(get = "stock.prices",

from = "2016-07-01",

to = "2019-09-01") %>%

group_by(symbol)

glimpse(myStocks) # examine the structure of the resulting data frame

'''

Financial performance and CAPM analysis depend on returns; If I buy a stock today for 100 and I sell it tomorrow for 101.75, my one-day return, assuming no transaction costs, is 1.75%. So given the adjusted closing prices we donwloaded, our first step is to calculate daily and monthly returns.

'''{r calculate_returns, message=FALSE, warning=FALSE}

#calculate monthly returns

myStocks_returns_monthly <- myStocks %>%

tq_transmute(select = adjusted,

mutate_fun = periodReturn,

period = "monthly",

type = "arithmetic",

col_rename = "monthly_returns",

cols = c(nested.col))

'''

Create a table where you summarise monthly returns for each of the stocks; min, max, median, mean, SD.

'''{r summarise_monthly_returns}

# YOUR CODE GOES HERE

'''

Plot a faceted density plot, using 'geom_density()', for each of the stocks.

'''{r density_monthly_returns}

# YOUR CODE GOES HERE

'''

What can you infer from this plot? Which stock has the highest/lowest volatility?

TYPE YOUR ANSWER AFTER (AND OUTSIDE!) THIS BLOCKQUOTE.

Finally, make a plot that shows the expected monthly return (mean) of a stock on the Y axis and the risk (standard deviation) in the X-axis. You can use different colours for different tickers, but more importantly please print the label of each ticker next to the stock, using 'geom_text(aes(label = ticker))'.

'''{r risk_return_plot}

# YOUR CODE GOES HERE

'''

What can you infer from this plot? Are there any stocks which, while being riskier, do not have a higher expected return?

TYPE YOUR ANSWER AFTER (AND OUTSIDE!) THIS BLOCKQUOTE.

Challenge 1: Ridge plots

Using your newfound visualisation skills (and referencing [the 'ggridges' vignette](https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html)), make a ridge plot showing either

- the distribution of temperature anomalies from the NASA dataset over different periods, or

- the distribution of IMDB ratings by genre, as shown below.

<center>

![](images/imdb_ggridges.png)

</center>

Save the plot you will create as a PNG file in your 'images' folder with 'ggsave()'

Attachment:- Assignment File.rar

Reference no: EM132378111

Questions Cloud

Determine child custody : What is the legal standard that courts use to determine child custody? What are the pros and cons of this current legal standard for child custody cases?
Determine child custody : What is the legal standard that courts use to determine child custody? What are the pros and cons of this current legal standard for child custody cases?
Who may also provide child custody evaluations : Parenting Plans are often developed with the help of "mediators" (who may also provide child custody evaluations).
Federal government to dissseminate contractor past : Federal Government to dissseminate contractor past performance data in support of agency acquisition missions. Is this true or false and why?
Calculate and plot risk and return profile of stocks : Assignment - Inferential Statistics - Task 3: Calculate and plot risk/return profile of stocks - examine the distribution of returns
Developing successful business teams assignment problem : Developing Successful Business Teams Assignment help and solution, Pearson BTEC Level 7 Extended Diploma in Strategic Management and Leadership
Share the personal economic decision and its consequences : List some approaches that a business may use to help balance the needs of the customer, the suppliers, and the company shareholders.
Prepare a project about a trip for a couple and two kids : Prepare a project about a trip for a couple and two kids for a week from Perth city to Europe and in the risks talk about sick maybe kids get sick and etc
Company Performance Analysis Assignment Problem : HI5002 Finance for Business Group Assignment - Company Performance Analysis - Perform a scenario analysis with data provided

Reviews

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd