-
Notifications
You must be signed in to change notification settings - Fork 4
/
43-adh_sol.Rmd
253 lines (180 loc) · 7.6 KB
/
43-adh_sol.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
# Replicating ADH
## Aggregate Facts
TBD
## Descriptive Stats
Install the ones you do not have yet.
```{r, warning=FALSE}
library("readr")
library("tibble")
library("dplyr")
library("Hmisc")
```
### Load Data
Like always, we are going to load the data and save it as a tibble
```{r}
df = read_csv("data/adh_data.csv") %>% as_tibble
```
### Compute Simple Grouped Mean
1. Find which years (`yr`) are reflected in the data.
```{r}
unique(df$yr)
```
2. Compute the average number of chinese imports per worker (`l_tradeusch_pw` & `d_tradeusch_pw`) for each "year".
```{r}
df_yr = group_by(df, yr)
df_yr %>% summarise(l_tradeusch_pw_avg = mean(l_tradeusch_pw),
d_tradeusch_pw_avg = mean(d_tradeusch_pw)
)
```
### Computed Weighted Group Means and Standard Deviations
For the rest of the exercise, weight the mean by population count per region instead (`l_popcount`) and compare it with the numbers in the table.
3. Repeat step 2 with weights.
```{r}
df_yr %>% summarise(l_tradeusch_pw = weighted.mean(l_tradeusch_pw),
d_tradeusch_pw = weighted.mean(d_tradeusch_pw))
```
4. Now also compute the weighted standard deviations for both variables. Hint: Use the `Hmisc` package and find the relevant function.
```{r}
df_yr %>% summarise(l_tradeusch_pw_avg = weighted.mean(l_tradeusch_pw),
d_tradeusch_pw_avg = weighted.mean(d_tradeusch_pw),
l_tradeusch_pw_sd = sqrt(wtd.var(l_tradeusch_pw, l_popcount)),
d_tradeusch_pw_sd = sqrt(wtd.var(d_tradeusch_pw, l_popcount))
)
```
5. Now compute the mean and standard deviation of the average household wage and salary (`l_avg_hhincwage_pc_pw`, `d_avg_hhincwage_pc_pw`)
```{r}
df_yr %>% summarise(l_avg_hhincwage_pc_pw_avg = weighted.mean(l_avg_hhincwage_pc_pw),
d_avg_hhincwage_pc_pw_avg = weighted.mean(d_avg_hhincwage_pc_pw),
l_avg_hhincwage_pc_pw_sd = sqrt(wtd.var(l_avg_hhincwage_pc_pw, l_popcount)),
d_avg_hhincwage_pc_pw_sd = sqrt(wtd.var(d_avg_hhincwage_pc_pw, l_popcount))
)
```
6. And once more for share not in labor force (`l_sh_nilf`, `d_sh_nilf`)
```{r}
df_yr %>% summarise(l_sh_nilf_avg = weighted.mean(l_sh_nilf, , l_popcount),
d_sh_nilf_avg = weighted.mean(d_sh_nilf, , l_popcount),
l_sh_nilf_sd = sqrt(wtd.var(l_sh_nilf, l_popcount)),
d_sh_nilf_sd = sqrt(wtd.var(d_sh_nilf, l_popcount))
)
```
## Regression
Let's first load the necessary packages to read data and do fancy regressions:
```{r}
library("readr")
library("tibble")
library("dplyr")
library("sandwich")
library("lmtest")
library("lfe")
```
And let's load the data like we always do:
```{r}
df = read_csv("data/adh_data.csv")
```
### OLS regression
The core of the paper is looking at what happened to laborer's when theres an increase in us imports from china.
Let's try and replicate part of Table 9 - namely the estimate from panel A column 2.
Their y variable is `relchg_avg_hhincwage_pc_pw`.
The important x variable is decadal trade between the us and china `d_tradeusch_pw`.
1. Run that simple regression
```{r}
lm_1 = lm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw, data = df)
summary(lm_1)
```
2. Now add heteroskedasticity robust standard (HC1). Hint: Use the `sandwich` and `lmtest` packages
```{r}
coeftest(lm_1, vcov = vcovHC(lm_1, type="HC1"))
```
Now we will start to add extra x variables.
3. Start by adding `t2` - a dummy variable for whether observation is in the second decade.
Fit again with HC1 robust standard errors.
```{r}
lm_2 = lm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2, data = df)
coeftest(lm_2, vcov = vcovHC(lm_2, type="HC1"))
```
### Clustering
Let us now use clustertered standard errors instead. ADH cluster by `statefip`.
Hint: use the `felm` command from the `lfe` package
1. Run the basic regression with clustering
```{r}
felm_1 = felm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2 | 0 | 0 | statefip, data = df)
summary(felm_1)
```
2. Add the following controls to your last regression:
- `l_shind_manuf_cbp`
- `l_sh_popedu_c`
- `l_sh_popfborn`
- `l_sh_empl_f`
- `l_sh_routine33`
- `l_task_outsource`
```{r}
felm_2 = felm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2 +
l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
l_sh_empl_f + l_sh_routine33 + l_task_outsource
| 0 | 0 | statefip, data = df)
summary(felm_2)
```
3. Add region fixed effects to your regression.
- First find all variables in the dataset that start with `reg_`
- Add these to your last regression
```{r}
names(select(df, starts_with("reg_")))
felm_3 = felm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2 +
l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
l_sh_empl_f + l_sh_routine33 + l_task_outsource +
reg_midatl + reg_encen + reg_wncen + reg_satl +
reg_escen + reg_wscen + reg_mount + reg_pacif
| 0 | 0 | statefip, data = df)
summary(felm_3)
```
### Instrument Variables
1. Instrument `d_tradeusch_pw` with `d_tradeotch_pw_lag` in your last regression
```{r}
felm_4 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
l_sh_empl_f + l_sh_routine33 + l_task_outsource +
reg_midatl + reg_encen + reg_wncen + reg_satl +
reg_escen + reg_wscen + reg_mount + reg_pacif
| 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip,
data = df)
summary(felm_4)
```
2. Weight your regression by `timepwt48`
The `felm` function is a bit picky on the order of the weights. Let us first try to define weights at the end after the `data` argument like so:
```{r, error = TRUE}
felm_5 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
l_sh_empl_f + l_sh_routine33 + l_task_outsource +
reg_midatl + reg_encen + reg_wncen + reg_satl +
reg_escen + reg_wscen + reg_mount + reg_pacif
| 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip,
data = df,
weights = timepwt48)
summary(felm_5)
```
Felm didn't find `timepwt48` because it only assumes that columns are in `df` before you define `data = df`. We can solve this in two ways.
1. A good rule is to have `data = df` as the last argument.
```{r, error = TRUE}
felm_5 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
l_sh_empl_f + l_sh_routine33 + l_task_outsource +
reg_midatl + reg_encen + reg_wncen + reg_satl +
reg_escen + reg_wscen + reg_mount + reg_pacif
| 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip,
weights = timepwt48,
data = df)
summary(felm_5)
```
2. Alternatively, you can define weights after `data = df`, but then you have to define the weights as `df$timepwft48` like so:
```{r}
felm_5 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
l_sh_empl_f + l_sh_routine33 + l_task_outsource +
reg_midatl + reg_encen + reg_wncen + reg_satl +
reg_escen + reg_wscen + reg_mount + reg_pacif
| 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip,
data = df,
weights = df$timepwt48)
summary(felm_5)
```
And now we have the numbers reported in Column 2 of Panel A of Table 9 of the paper.