Define a program called univar

Assignment Help Advanced Statistics
Reference no: EM132299244

EPIDEMIOLOGY STATA PROGRAMMING AND DATA MANAGEMENT Assignment -

Write a .do file that performs the tasks described below. Your .do file should follow conventions for .do file structure described in class. Make sure your script will run on our machines, even if we are using a different version of Stata. Do not submit your log files as part of the assignment.

Evaluation - For Question 1, use the dataset hw2_pra_hist.dta and hw2_hosp.dta to perform the required tasks. Your .do file will be run on a different dataset with more visits.

For the other questions, simply define your program. You do not need to run the programs you write in your .do file. The graders will run your programs using a dataset that will not be released to you.

Question 1 -

Context: You are conducting a study that examines the regional variation in the distribution of panel-reactive antibody (PRA). So far, you recruited 73 patients (px_id = 1, ...., 73) from 10 hospitals (hosp_id=1, ...,10) in 3 regions (region=A, B, C, ... ), and measured PRA four times: visit 0 (baseline), visit 1, visit 2, and visit 3. You hear that the organization that funds your research plans to extend the funding for several more visits (visit 4, visit 5, ..., visit N). Since you do not know how many more visits there will be, you decide to write a .do file that can work regardless of how many visits the dataset has.

Codebook

Variable

Description

Values/Range

hw2_pra_hist.dta

hosp_id

Hospital ID

Integers: 1 - 10

px_id

Patient ID

Integers: 1 - 73

pra_vX

PRA value at visit X

Integers: 0 - 100 Visit 0 indicators baseline.

hw2_hosp.dta

hosp_id

Hospital ID

Integers: 1-10

Region

Region

Alphabets

Note: the study might add more patients, hospitals, and regions in the future, so hosp_id, px_id, and region might include more values.

i) Load hw2_pra_hist.dta. Print a table as shown in attached file, which displays the number of patients with a valid PRA value greater than 80 (i.e., between 81 and 100) for all outcome variables (pra_v0, pra_v1, ..., pra_vN). N and XX should be replaced with the correct values from the dataset.

ii) Create a new variable peak_pra, which contains the highest value among valid PRA measurements in each participant. Print the median (IQR) of peak_pra as shown in attached file. XX.X should be replaced with the correct values from the dataset and formatted with one digit after the decimal point (e.g., 12.0). (Hint: the rowmax function in egen might be helpful.)

iii) Another dataset provided to you, hw2_hosp.dta, has information on which region each hospital is located in. Merge the current dataset in memory with hw2_hosp.dta. Use the command list to list the ID of the patient with the highest peak_pra value for each region as shown in attached file.

X should be replaced with the correct values from the dataset. If there are ties (i.e., multiple patients with the highest value), print all tied patients. If region C has ties (while A and B does not), the table will look like in attached file. If any regions don't have any patients in hw2_pra_hist, don't list these regions.

Question 2 -

Define a program called univar. This program runs a series of univariable (simple) linear or logistic regressions between each of the independent variables and the dependent variable.

For example, if the user runs univar var1 var2 var3 var4, outcome(var5)

this program will quietly run four univariable linear regressions on var5,

regress var5 var1

regress var5 var2

regress var5 var3

regress var5 var4

and return the following output, assuming that var2 and var4 were significantly (p<0.05) associated with var5. P-values should be formatted with three digits after the decimal point.

Significantly associated with var5:

var2 (p=x.xxx)

var4 (p=x.xxx)

Similarly, if the user runs this program with the logistic option univar var1 var2 var3 var4, outcome(var5) logistic

this program will quietly run four univariable logistic regressions on var5,

logistic var5 var1

logistic var5 var2

logistic var5 var3

logistic var5 var4

and return the following output, assuming that var2 and var4 were significantly (p<0.05) associated with var5. P-values should be formatted with three digits after the decimal point.

Significantly associated with var5:

var2 (p=x.xxx)

var4 (p=x.xxx)

This program should not alter the dataset in the memory: i.e., if you need to alter the dataset, restore to the original status after completing your procedures.

Hint: The program model in lecture 4 has some similarities with this question. The p-value after regress and logistic can be obtained using the following code:

Command

Code for p-value (change var1 as appropriate)

regress

ttail(e(df_r), abs(_b[var1]/_se[var1]))*2

logistic

(1-normal(abs(_b[var1]/_se[var1])))*2

Question 3 -

Print the following text: "Question 3: I estimate that it took me xxxx hours to complete this assignment."

For example, if it took you six hours of active work time (not counting when you ate/slept/did other things), your .do file will contain the line. Give an honest answer; this is just for our data collection purposes. However, this question is worth some points, so don't skip it!

Question 4 -

A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Write prime, a program that takes any real number as an option n and determines whether the number is a prime number or not. The program will also display an error message when the user enters any number that is not a natural number greater than 1.

For example: If the user types prime, n(100), your program will display "100 is NOT a prime number."

If the user types prime, n(109), your program will display "109 is a prime number."

If the user types prime, n(1), your program will display "Invalid input: enter a natural number greater than 1."

If the user types prime, n(3.14), your program will display "Invalid input: enter a natural number greater than 1."

Note - Need Q2 and Q4 only - to be solved using STATA.

Attachment:- Assignment Files.rar

Reference no: EM132299244

Questions Cloud

Digital worldwide should release to consumers : Which product do you believe Digital Worldwide should release to consumers? Why did you make this choice?
Each stage of the destruction during the invasion : Think about how the IBM Security Immune System could prevent and help at each stage of the destruction during the invasion.
Describe the importance of workforce planning : Describe the importance of workforce planning in a rapidly changing healthcare system?
Create a program that allows a user to view the impedance : CSE1PES: Programming For Engineers and Scientists Assignment, La Trobe University, Australia. Create a program that allows a user to view the impedance
Define a program called univar : EPIDEMIOLOGY 340.600: STATA PROGRAMMING AND DATA MANAGEMENT Assignment. Define a program called univar
Compare and contrast monopolistic competition market : Compare and contrast monopolistic competition market structureCompare and contrast monopolistic competition market structure with perfect competition
Dichotomy between the government and the market : Is there a dichotomy between the government and the market? Explain
Prepare for and respond to financial crises abroad : Provide recommendations for how managers of firms in the United States can prepare for and respond to financial crises abroad.
Explain the individual incidents of crises : If yes, what are common characteristics of boom-bust cycle? If no, how can we explain the individual incidents of crises?

Reviews

len2299244

5/4/2019 5:24:56 AM

Instructions: Q2 and Q4 only - to be solved using STATA. Direction - Write a .do file that performs the tasks described below. Your .do file must be called assignment2_yourname.do (e.g.: assignment2_allanmassie.do). Your .do file should follow conventions for .do file structure described in class. Make sure your script will run on our machines, even if we are using a different version of Stata. Do not submit your log files as part of the assignment.

len2299244

5/4/2019 5:24:49 AM

Evaluation - For Question 1, use the dataset hw2_pra_hist.dta and hw2_hosp.dta (download from CoursePlus) to perform the required tasks. Your .do file will be run on a different dataset with more visits. For the other questions, simply define your program. You do not need to run the programs you write in your .do file. The graders will run your programs using a dataset that will not be released to you.

len2299244

5/4/2019 5:24:42 AM

Another script called assignment2_test.do can be downloaded from CoursePlus. We will use a script that looks like this to grade your responses. Use this script to test your code yourself. Partial credit will be awarded if the output is wrong, so have your script do something for every question. Make sure the output includes the question number as indicated. Make sure to follow the coding guidelines from class; for example, your script should include comments. EXTRA CREDIT CHALLENGE - You can earn full score on Assignment 2 without answering the following questions. You will earn extra 3 points for each correctly answered extra credit challenge question. Extra bonus points to the program that runs the fastest!!!

Write a Review

Advanced Statistics Questions & Answers

  Relationship between speed, flow and geometry

Write a project proposal on relationship between speed, flow and geometry on single carriageway roads.

  Logistic regression model

Compute the log-odds ratio for each group in Logistic regression model.

  Logistic regression

Foundations of Logistic Regression

  Probability and statistics

The tubes produced by a machine are defective. If six tubes are inspected at random , determine the probability that.

  Solve the linear model

o This is a linear model. If your model needs a different engine, then you need to rethink your approach to the model. Remember, there are no IF, Max, or MIN statements in linear models.

  Plan the analysis

Plan the analysis

  Quantitative analysis

State the hypotheses that you are going to test.

  Modelise as a markov chain

modelise as a markov chain

  Correlation and regression

What are the degrees of freedom for regression

  Construct a frequency distribution for payment method

Construct a frequency distribution for Payment method

  Perform simple linear regression

Perform simple linear regression

  Quality control analysis

Determining the root causes

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd