Define a program called univar

Assignment Help Advanced Statistics

Reference no: EM132299244

EPIDEMIOLOGY STATA PROGRAMMING AND DATA MANAGEMENT Assignment -

Write a .do file that performs the tasks described below. Your .do file should follow conventions for .do file structure described in class. Make sure your script will run on our machines, even if we are using a different version of Stata. Do not submit your log files as part of the assignment.

Evaluation - For Question 1, use the dataset hw2_pra_hist.dta and hw2_hosp.dta to perform the required tasks. Your .do file will be run on a different dataset with more visits.

For the other questions, simply define your program. You do not need to run the programs you write in your .do file. The graders will run your programs using a dataset that will not be released to you.

Question 1 -

Context: You are conducting a study that examines the regional variation in the distribution of panel-reactive antibody (PRA). So far, you recruited 73 patients (px_id = 1, ...., 73) from 10 hospitals (hosp_id=1, ...,10) in 3 regions (region=A, B, C, ... ), and measured PRA four times: visit 0 (baseline), visit 1, visit 2, and visit 3. You hear that the organization that funds your research plans to extend the funding for several more visits (visit 4, visit 5, ..., visit N). Since you do not know how many more visits there will be, you decide to write a .do file that can work regardless of how many visits the dataset has.

Codebook

Variable	Description	Values/Range
hw2_pra_hist.dta
hosp_id	Hospital ID	Integers: 1 - 10
px_id	Patient ID	Integers: 1 - 73
pra_vX	PRA value at visit X	Integers: 0 - 100 Visit 0 indicators baseline.
hw2_hosp.dta
hosp_id	Hospital ID	Integers: 1-10
Region	Region	Alphabets

Note: the study might add more patients, hospitals, and regions in the future, so hosp_id, px_id, and region might include more values.

i) Load hw2_pra_hist.dta. Print a table as shown in attached file, which displays the number of patients with a valid PRA value greater than 80 (i.e., between 81 and 100) for all outcome variables (pra_v0, pra_v1, ..., pra_vN). N and XX should be replaced with the correct values from the dataset.

ii) Create a new variable peak_pra, which contains the highest value among valid PRA measurements in each participant. Print the median (IQR) of peak_pra as shown in attached file. XX.X should be replaced with the correct values from the dataset and formatted with one digit after the decimal point (e.g., 12.0). (Hint: the rowmax function in egen might be helpful.)

iii) Another dataset provided to you, hw2_hosp.dta, has information on which region each hospital is located in. Merge the current dataset in memory with hw2_hosp.dta. Use the command list to list the ID of the patient with the highest peak_pra value for each region as shown in attached file.

X should be replaced with the correct values from the dataset. If there are ties (i.e., multiple patients with the highest value), print all tied patients. If region C has ties (while A and B does not), the table will look like in attached file. If any regions don't have any patients in hw2_pra_hist, don't list these regions.

Question 2 -

Define a program called univar. This program runs a series of univariable (simple) linear or logistic regressions between each of the independent variables and the dependent variable.

For example, if the user runs univar var1 var2 var3 var4, outcome(var5)

this program will quietly run four univariable linear regressions on var5,

regress var5 var1

regress var5 var2

regress var5 var3

regress var5 var4

and return the following output, assuming that var2 and var4 were significantly (p<0.05) associated with var5. P-values should be formatted with three digits after the decimal point.

Significantly associated with var5:

var2 (p=x.xxx)

var4 (p=x.xxx)

Similarly, if the user runs this program with the logistic option univar var1 var2 var3 var4, outcome(var5) logistic

this program will quietly run four univariable logistic regressions on var5,

logistic var5 var1

logistic var5 var2

logistic var5 var3

logistic var5 var4

and return the following output, assuming that var2 and var4 were significantly (p<0.05) associated with var5. P-values should be formatted with three digits after the decimal point.

Significantly associated with var5:

var2 (p=x.xxx)

var4 (p=x.xxx)

This program should not alter the dataset in the memory: i.e., if you need to alter the dataset, restore to the original status after completing your procedures.

Hint: The program model in lecture 4 has some similarities with this question. The p-value after regress and logistic can be obtained using the following code:

Command	Code for p-value (change var1 as appropriate)
regress	ttail(e(df_r), abs(_b[var1]/_se[var1]))*2
logistic	(1-normal(abs(_b[var1]/_se[var1])))*2

Question 3 -

Print the following text: "Question 3: I estimate that it took me xxxx hours to complete this assignment."

For example, if it took you six hours of active work time (not counting when you ate/slept/did other things), your .do file will contain the line. Give an honest answer; this is just for our data collection purposes. However, this question is worth some points, so don't skip it!

Question 4 -

A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Write prime, a program that takes any real number as an option n and determines whether the number is a prime number or not. The program will also display an error message when the user enters any number that is not a natural number greater than 1.

For example: If the user types prime, n(100), your program will display "100 is NOT a prime number."

If the user types prime, n(109), your program will display "109 is a prime number."

If the user types prime, n(1), your program will display "Invalid input: enter a natural number greater than 1."

If the user types prime, n(3.14), your program will display "Invalid input: enter a natural number greater than 1."

Note - Need Q2 and Q4 only - to be solved using STATA.

Attachment:- Assignment Files.rar

Reference no: EM132299244

Questions Cloud

Digital worldwide should release to consumers : Which product do you believe Digital Worldwide should release to consumers? Why did you make this choice?

Each stage of the destruction during the invasion : Think about how the IBM Security Immune System could prevent and help at each stage of the destruction during the invasion.

Describe the importance of workforce planning : Describe the importance of workforce planning in a rapidly changing healthcare system?

Create a program that allows a user to view the impedance : CSE1PES: Programming For Engineers and Scientists Assignment, La Trobe University, Australia. Create a program that allows a user to view the impedance

Define a program called univar : EPIDEMIOLOGY 340.600: STATA PROGRAMMING AND DATA MANAGEMENT Assignment. Define a program called univar

Compare and contrast monopolistic competition market : Compare and contrast monopolistic competition market structureCompare and contrast monopolistic competition market structure with perfect competition

Dichotomy between the government and the market : Is there a dichotomy between the government and the market? Explain

Prepare for and respond to financial crises abroad : Provide recommendations for how managers of firms in the United States can prepare for and respond to financial crises abroad.

Explain the individual incidents of crises : If yes, what are common characteristics of boom-bust cycle? If no, how can we explain the individual incidents of crises?

Reviews

len2299244

5/4/2019 5:24:56 AM

Instructions: Q2 and Q4 only - to be solved using STATA. Direction - Write a .do file that performs the tasks described below. Your .do file must be called assignment2_yourname.do (e.g.: assignment2_allanmassie.do). Your .do file should follow conventions for .do file structure described in class. Make sure your script will run on our machines, even if we are using a different version of Stata. Do not submit your log files as part of the assignment.

5/4/2019 5:24:49 AM

Evaluation - For Question 1, use the dataset hw2_pra_hist.dta and hw2_hosp.dta (download from CoursePlus) to perform the required tasks. Your .do file will be run on a different dataset with more visits. For the other questions, simply define your program. You do not need to run the programs you write in your .do file. The graders will run your programs using a dataset that will not be released to you.

5/4/2019 5:24:42 AM

Another script called assignment2_test.do can be downloaded from CoursePlus. We will use a script that looks like this to grade your responses. Use this script to test your code yourself. Partial credit will be awarded if the output is wrong, so have your script do something for every question. Make sure the output includes the question number as indicated. Make sure to follow the coding guidelines from class; for example, your script should include comments. EXTRA CREDIT CHALLENGE - You can earn full score on Assignment 2 without answering the following questions. You will earn extra 3 points for each correctly answered extra credit challenge question. Extra bonus points to the program that runs the fastest!!!

Write a Review

Required(*) Message

User Account

All Pages