SSCECON307 Workshop II: Gravity Model

Introduction

Since the 1960s economists have translated Newton’s universal law of gravitation to economic and social interactions including trade, whereby the flow between two units is directly proportional to the product of their economic sizes and inversely proportional to the distance between them. Briefly: when supply and demand increase in locations A and B, respectively, the flow of goods from A to B increases. Also, as the distance between A and B decreases the flow of traded goods from A to B increases. In this exercise we explore the gravity model of international trade, extend it to account for geopolitical and cultural parameters, and check the effects of distance over time.

Modeling gravity

First, we build a simple gravity model. To do so, we download data for trade flow, GDP of and distance between origin and destination countries, do some data transformation, and fit a regression. For all your Stata work, it is strongly advised you make use of a do-file with a sensible name.

Set working directory

Before you start working with data, it is important to know where any output files (including your do-files) will be saved on your computer. Such a folder is called your working directory or current directory.

If you use UU’s SolisWorkspace, you can go to File -> Change working directory, and find your local disk (and a suitable folder on your local disk). Otherwise you’ll only need to find a suitable folder on your local disk. Make sure the folder you choose has been given a sensible name, so you can easily find it at a later point.

. * cd C:\User\Location\FolderCourse\SubfolderCase

Download data

For this exercise you will need to download the data file from here: https://www.dropbox.com/s/6apelk086izbenc/col_regfile09.dta?dl=0. If you would like to explore an extended and updated dataset you will find one on the CEPII website following the link below: http://www.cepii.fr/cepii/en/bdd_modele/bdd_modele.asp. To import the data into Stata it will be necessary for it to be saved in a location Stata can access. For this reason, download the suitable file to the folder you specified in your working directory above.

. use col_regfile09, clear

Using loops

Following Head (2003) we estimate the gravity equation in logarithmic form. To do that, we take a natural logarithm of the following variables: trade flow between countries (flow), GDP of origin (gdp_o) and destination (gdp_d) countries, and distance (dist). To automate the process of applying the same transformation to multiple variables we use a loop. Loops allow Stata to execute one or more commands, specified in the braces, to each element listed before the braces. In this case, for each variable listed a new variable will be generated calculating the natural logarithm of each observation.

. foreach x in flow gdp_o gdp_d dist {
  2.            gen log`x'=log(`x')
  3. }
(495,098 missing values generated)
(171,252 missing values generated)
(119,822 missing values generated)

Model fitting

Previously, we have used simple regression models to illustrate Okun’s law. In this exercise, we will extend our methods to multiple regression by regressing flow on more than one explanatory variable (gdp_o, gdp_d, and dist).

. reg log*

      Source │       SS           df       MS      Number of obs   =   624,145
─────────────┼──────────────────────────────────   F(3, 624141)    >  99999.00
       Model │  3855252.63         3  1285084.21   Prob > F        =    0.0000
    Residual │   3713093.1   624,141  5.94912544   R-squared       =    0.5094
─────────────┼──────────────────────────────────   Adj R-squared   =    0.5094
       Total │  7568345.73   624,144  12.1259609   Root MSE        =    2.4391

─────────────┬────────────────────────────────────────────────────────────────
     logflow │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    loggdp_o │   .8713428   .0013879   627.80   0.000     .8686225    .8740631
    loggdp_d │   .7062829   .0013229   533.88   0.000       .70369    .7088758
     logdist │  -1.198455   .0037274  -321.52   0.000     -1.20576   -1.191149
       _cons │  -4.172614    .035314  -118.16   0.000    -4.241828   -4.103399
─────────────┴────────────────────────────────────────────────────────────────

The asterisk in log* signals to Sata to include all variables whose name begins with “log”. In this case, the model includes four variables: the dependent variable logflow and three independent variables loggdp_o, loggdp_d, and logdist.

Saving output

Install outreg2

If you use Stata with SolisWorkspace, you need to install outreg2 (and any other package) every time you open Stata. Otherwise you only need to install each package once.

. ssc install outreg2
checking outreg2 consistency and verifying not already installed...
all files already exist and are up to date.

Export output table

Export the regression output to the folder specified in your working directory above. The optional argument “word” specifies the file type the output should be saved as, while replace overwrites existing text files with the same name.

. outreg2 using gravity, word replace
gravity.rtf
dir : seeout

Continuing to an advanced gravity model

Second, we extend the model above by controlling for the presence of a shared border, a common official language, colonial history, regional trade agreements, GATT/WTO membership, and a common currency.

. reg log* contig comlang_off col_hist rta gatt_o gatt_d comcur

      Source │       SS           df       MS      Number of obs   =   624,145
─────────────┼──────────────────────────────────   F(10, 624134)   =  68672.82
       Model │  3964868.57        10  396486.857   Prob > F        =    0.0000
    Residual │  3603477.16   624,134  5.77356331   R-squared       =    0.5239
─────────────┼──────────────────────────────────   Adj R-squared   =    0.5239
       Total │  7568345.73   624,144  12.1259609   Root MSE        =    2.4028

─────────────┬────────────────────────────────────────────────────────────────
     logflow │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    loggdp_o │   .8739599   .0014697   594.67   0.000     .8710794    .8768404
    loggdp_d │   .7202029   .0014101   510.75   0.000     .7174392    .7229667
     logdist │  -1.017497   .0043098  -236.09   0.000    -1.025945    -1.00905
      contig │   .6663002    .019088    34.91   0.000     .6288885     .703712
 comlang_off │   .4030361   .0085301    47.25   0.000     .3863174    .4197549
    col_hist │   1.784972   .0198994    89.70   0.000      1.74597    1.823974
         rta │   .6136453   .0155947    39.35   0.000     .5830801    .6442105
      gatt_o │  -.0629879   .0071939    -8.76   0.000    -.0770877   -.0488882
      gatt_d │  -.2496916   .0071023   -35.16   0.000    -.2636119   -.2357714
      comcur │   .8100091   .0261462    30.98   0.000     .7587634    .8612548
       _cons │  -5.847367   .0400869  -145.87   0.000    -5.925936   -5.768798
─────────────┴────────────────────────────────────────────────────────────────

. outreg2 using gravity, word
gravity.rtf
dir : seeout

Open the gravity.rtf file in Word to see what outreg2 does.

Check how the effect of distance evolves over time

We now regress flow on GDP of and distance between origin and destination countries for each year separately. This means we need to run as many regressions as there are years in the sample.

statsby runs the code specified after the colon separately for each sub-sample specified in by() and collects output specified after statsby in a new dataset. The new dataset replaces data in memory. We collect the following output:

Each row in the new dataset shows results for a single year.

. statsby _b, clear by(year): reg log*
(running regress on estimation sample)

      command:  regress log*
           by:  year

Statsby groups
────┼─── 1 ───┼─── 2 ───┼─── 3 ───┼─── 4 ───┼─── 5 
x.................................................    50
.........

Plot the effect of distance on flow over time

We can visualise changes in the effect of distance on flow over time using a time series line graph. Stata recognizes data as time-series data if you specify the time dimension using tsset.

. tsset year, yearly
        time variable:  year, 1948 to 2006
                delta:  1 year

Now, we can plot a graph of the effect of distance between origin and destination countries on trade flow over time.

. tsline _b_logdist

Suggested reading: