In 1962 Arthur Okun regressed quarterly changes in U.S. unemployment rate on output growth. The data over 1948q1-1960q4 revealed a strong negative relationship. The economic intuition behind this relationship is as follows: when output falls, firms lay off workers, and unemployment rises. This relationship has a high status in economics and is known as Okun’s law. In this exercise we replicate the original Okun’s findings and check whether they hold in other countries and time periods.
First, we replicate Okun’s original findings for U.S. data over 1948q1-1960q4. In order to do that, we download data for GDP and unemployment into Stata, do some data cleaning, and fit a regression. For all your Stata work, it is strongly advised you make use of a do-file with a sensible name.
Before you start working with data, it is important to know where any output files (including your do-files) get saved on your computer. Such a folder is called your working directory or current directory.
If you use UU’s SolisWorkspace, you can go to File -> Change working directory, and find your local disk (and a suitable folder on your local disk). Otherwise you’ll only need to find a suitable folder on your local disk. Make sure the folder you choose has been given a sensible name so you can easily find it at a later point. You can also change your current directory by using a cd command.
. * cd C:\User\Location\FolderCourse\SubfolderCase
If you use Stata with SolisWorkspace, you need to install freduse (and any other package) every time you open Stata. Otherwise you only need to install each package once.
. * ssc install freduse
Get the data codes of real GDP and unemployment by searching the FRED’s website: https://fred.stlouisfed.org/. We use the following datasets:
You can copy the series codes from the URL of the dataset (see the links above), and use them to specify which series to download using the freduse command. By specifying “clear”, you’re overwriting any data currently in Stata’s memory when you run this code.
. freduse GDPC96 UNRATE, clear (282 observations read) (895 observations read)
If we browse the data, first, we see two variables indicating time: date and daten. The former is a string, the latter is numeric. We use the latter.
Second, we see that unemployment rate is recorded monthly while GDP quarterly. We should bring both variables to the same, lower frequency, that is, quarterly. For that we first need a variable that specifies time in quarters, as a numeric that Stata understands. Currently our time variable uses the first day of each month, so it is in daily format (even though the actual observations are monthly and quarterly). In order to get which quarter each date stands for, we can use the qofd function, which stands for “quarter of day”.
. g qt=qofd(daten)
Now we have our quarterly dates, but the rows of the data still correspond to months (so every quarter spans 3 rows). We can make each row correspond to a single quarter using the collapse command, specifying the grouping variable, in this case the quarterly date we calculated earlier.
Unemployment rate is currently a monthly variable. To calculate the quarterly unemployment, we take the average unemployment from the 3 months in the quarter. So we can collapse the variable UNRATE to quarterly by taking the mean.
GDP is a quarterly variable: the value for the quarter appears as the observation for the first month in the quarter, while the values for the other two months are missing. So we can collapse GDPC96 to quarterly by taking the first observation in each quarter.
. collapse (mean) UNRATE (first) GDPC96, by(qt)
Stata recognizes data as time-series data if you specify the time dimension using tsset. The optional q argument stands for “quarterly” and specifies the unit of the time variable.
. tsset qt, q time variable: qt, 1947q1 to 2022q3 delta: 1 quarter
Since Okun’s law describes a relationship between output growth and changes in unemployment rate we need to generate these variables. l. stands for lag - the observation of the previous time period. d. stands for difference - basically a shortcut for u-l.u.
. g growth=(GDPC96/l.GDPC96 - 1)*100 (22 missing values generated) . g du=d.UNRATE (5 missing values generated)
Regress du on growth for 1948q1-1960q4:
. reg du growth if qt<tq(1961q1) Source │ SS df MS Number of obs = 51 ─────────────┼────────────────────────────────── F(1, 49) = 72.86 Model │ 10.2020311 1 10.2020311 Prob > F = 0.0000 Residual │ 6.86101944 49 .140020805 R-squared = 0.5979 ─────────────┼────────────────────────────────── Adj R-squared = 0.5897 Total │ 17.0630505 50 .341261011 Root MSE = .37419 ─────────────┬──────────────────────────────────────────────────────────────── du │ Coef. Std. Err. t P>|t| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── growth │ -.3216277 .0376796 -8.54 0.000 -.3973478 -.2459077 _cons │ .3298173 .0618275 5.33 0.000 .2055703 .4540643 ─────────────┴────────────────────────────────────────────────────────────────
Second, we extend Okun’s original sample to 2011. We can use the same cleaned dataset as before, and only change the time period specified in the reg command as follows:
. reg du growth if qt<tq(2012q1) Source │ SS df MS Number of obs = 255 ─────────────┼────────────────────────────────── F(1, 253) = 251.61 Model │ 20.0689886 1 20.0689886 Prob > F = 0.0000 Residual │ 20.1801884 253 .07976359 R-squared = 0.4986 ─────────────┼────────────────────────────────── Adj R-squared = 0.4966 Total │ 40.249177 254 .158461327 Root MSE = .28242 ─────────────┬──────────────────────────────────────────────────────────────── du │ Coef. Std. Err. t P>|t| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── growth │ -.2855234 .0180004 -15.86 0.000 -.320973 -.2500738 _cons │ .2491008 .0228656 10.89 0.000 .2040695 .294132 ─────────────┴────────────────────────────────────────────────────────────────
Third, we check whether the law holds in other countries.
For these models we use data from the World Bank databank, which we can access via the wbopendata package. Remember, if you use SolisWorkspace, you need to install the package every time you open Stata.
. * ssc install wbopendata
Get the data codes of real GDP and unemployment by searching the World Bank’s website: https://data.worldbank.org/.
We use the following datasets:
You can copy the indicator codes from the URL of the dataset (see the links above).
wbopendata arguments:
The World Bank’s default is wide format, where data for every year is a different column. Each row corresponds to one country-indicator combination. Long format has a variable “year”, and a separate variable per indicator. Then each row corresponds to one country-year combination. Long format is usually more convenient.
. wbopendata, indicator(NY.GDP.MKTP.KD; SL.UEM.TOTL.NE.ZS) clear long Metadata for indicator NY.GDP.MKTP.KD ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Name: GDP (constant 2015 US$) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Collection: 2 World Development Indicators ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Description: GDP at purchaser's prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in constant 2015 prices, expressed in U.S. dollars. Dollar figures for GDP are converted from domestic currencies using 2015 official exchange rates. For a few countries where the official exchange rate does not reflect the rate effectively applied to actual foreign exchange transactions, an alternative conversion factor is used. ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Note: World Bank national accounts data, and OECD National Accounts data files. ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Topic(s): 3 Economy and Growth ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Metadata for indicator SL.UEM.TOTL.NE.ZS ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Name: Unemployment, total (% of total labor force) (national estimate) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Collection: 2 World Development Indicators ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Description: Unemployment refers to the share of the labor force that is without work but available for and seeking employment. Definitions of labor force and unemployment differ by country. ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Note: International Labour Organization, ILOSTAT database. Data as of June 2022. ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Topic(s): 10 Social Protection and Labor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
. rename (ny_gdp_mktp_kd sl_uem_totl_ne_zs) (gdp u)
You need this numeric identifier to specify the cross-sectional dimension of your panel data. Strings (like countryname or countrycode) can’t be used as such an identifier.
. encode countryname, g(id)
Stata recognizes data as time-series data if you specify the time dimension using tsset. If you have panel data, you first need to specify the cross-sectional dimension (in this case id, which differentiates between countries). The optional y argument stands for “yearly” and specifies the unit of the time variable.
. tsset id year, y panel variable: id (strongly balanced) time variable: year, 1960 to 2021 delta: 1 year
We create the same variables as with the FRED data.
. g growth=(gdp/l.gdp - 1)*100 (4,298 missing values generated) . g du=d.u (12,035 missing values generated)
We regress du on growth per country, meaning we run a separate regression for every country in the sample.
statsby runs the code specified after the colon separately for each sub-sample specified in by() and collects output specified after statsby in a new dataset. The new dataset replaces data in memory. We collect the following output:
Each row in the new dataset shows results for a single country.
. statsby _b e(N), clear by(id): reg du growth (running regress on estimation sample) command: regress du growth _eq2_stat_1: e(N) by: id Statsby groups ────┼─── 1 ───┼─── 2 ───┼─── 3 ───┼─── 4 ───┼─── 5 xxx..xx.x..............x......x..xx..x...x.xx...xx 50 x........xx........xx.xx.....x..x.xxx...x.....xxxx 100 x.......xx....x........xxx....x.....xxxxxx.xx...xx 150 .x..xx..x.....x...x..xx.x..x...x.x.xxx.x.x......x. 200 ....x......x.x..xxx...x..x.xxxxx........xxx...xxx. 250 .......xx.x....x
Keep results from estimations with a sample size of at least 40.
. drop if missing(_eq2_stat_1) | _eq2_stat_1<40 (235 observations deleted)
Plot coefficients per country:
. graph hbar _b_growth, over(id, sort(1))
Suggested reading: