Reference no: EM132271838
Statistics for Political Science Assignment - STATA Problem Set
Problem 1 - Using the dataset BormannGolder2013.dta, perform the following tasks:
1. Rename the variable tier1_avemag as averagma.
2. Label the variable as "Average district magnitude".
3. Ask STATA for basic summary statistics and indicates the mean, the standard deviation, the median and the relevant quartiles.
4. Look at the key values of this variable. If you think that this variable needs to be "clean", please, do appropriate changes in the variable to accurately reflect the information provided by averagma.
5. Plot a Box plot of averagma and briefly comment the key features of this variable.
6. Plot a histogram of averagma and discuss the shape and spread of this variable.
7. Now, look at the variable mixed_type. Ask STATA to show a frequency table of this variable.
8. Using the codebook of this dataset, create a value label for mixed_type to describe what each category of this variable captures.
9. Create a bar graph of mixed_type and describe this variable.
10. Create a bar graph showing the average district magnitude shown in your averagma variable for each type electoral system as shown in your variable mixed_type.
Problem 2 - Using the dataset AVdataset.dta perform the following tasks:
1. Using variable dq11, create a new variable named fptp_tradition with four categories. Category 1 should contain values in dq11 referring to agreeing. Category 2 should contain values in dq11 referring to neither agree nor disagree. Category 3 should contain values referring to disagreeing. Category 4 should reflect those who don't know. Provide labels for the new categories if appropriate. Label the new variable fptp_tradition as "FPT is part of British tradition". Show a distribution of the new variable.
2. Show the confidence intervals for each category of fptp_tradition explain the meaning of the CI for category1.
3. Rename variable dq20 as like_conservatives and show the 95% confidence interval of the mean of this variable. Explain the confidence interval.
4. Using variable dq104, create a new variable named class with three categories. Category 1 should refer to people with less than £ 20,000 and be labeled as "Lower class". Category 2 should refer to people declaring an income of £20,000 but less than £ 80,000 and be labeled as "Middle class". Category 3 should refer to people declaring an income of £ 80,000 or more and be labeled as "Upper class". The rest of the values of variable dq104 should be treated as missing.
5. Show the mean value of the variable like_conservative for every value of the variable class. Looking at this information, we would like to see how social class determines the likes for the conservative party. To do so we test, for every category of the variable class, the hypothesis that
H0 : μ = 4:4
When is the null hypothesis rejected?
6. We would like to compare the mean of like_conservatives from individuals belonging to the lower class with the mean of the rest of people. The hypotheses we would like to test is the following
H0 : μr = μl
Ha : μr > μl
where μl refers to the population mean of those who belong to the lower class and μr is the population mean of those who belong to the rest of social classes. Is the null-hypotheses rejected?
Problem 3 - Using the dataset world.dta perform the following tasks:
1. Provide descriptive statistics for variables hdi2001 and eth_het.
2. Create a new variable named hdi which indicates the value of hdi2001 multiplied by 100. Add an appropriate label to this variable.
3. Create a new variable named ethnicity which indicates the value of eth_het multiplied by 100. Add an appropriate label to this variable.
4. Draw a scatterplot showing the relationship between hdi and ethnicity. Add to the scatterplot a line assuming that there is a liner relationship between the two variables. Describe this relationship also using the value of the correlation.
5. Estimate the following regression
hdi = β0 + β1ethnicity
6. Discuss the main information of the regression including the meaning of β0 and β1, statistical significance and R-sq. (3 lines max)
N.B. Save the dataset to make sure that your new variables are preserved for later.
Problem 4 - Using the dataset world.dta as saved at the end of Problem 3, perform the following tasks:
1. Create a new variable named dem_area which indicates the value of the variable dem_oth multiplied by 100. Label the new variable dem_area as "% of democracies in same area". Look at the variable rural and create a new variable named rural_cat where 1 indicates that more than 50% of the population is rural and 0, otherwise. Provide relevant labels.
2. Estimate the following regression model
hdi = β0 + β1ethnicity + β2dem_area + δ1rural_cat
3. Discuss the main information of the regression including the meaning of β1, β2 and δ1, statistical significance and R-sq. (5 lines max).
4. Show in a graph how hdi changes as ethnicity increases from 0 to 100 in increments of 10. Briefly describe the graph.
5. Show in a graph how hdi changes as dem_area increases from 0 to 100 in increments of 10. Briefly describe the graph.
Attachment:- Assignment Files.rar