Econometrics

There are two sets of tools for econometrics: statsmodels, quantecon, and (for bayesians) stan.

Stats with StatsModels

statsmodels is the go-to library for doing econometrics (linear regression, logit regression, etc.).

You can find a good tutorial here, and a brand new book built around statsmodels here (with lots of example code here).

The most important things are also covered on the statsmodel page here, especially the pages on OLS here and here.

(If you want to do machine learning, by the way, or want to know the difference between statsmodels and the machine learning library scikit-learn, head over to Machine Learning with scikit-learn)

Here are some simple illustrative examples of standard OLS:

On with the show:

# Load pandas and statsmodels
In [1]: import pandas as pd

In [2]: import statsmodels.formula.api as smf

# Load a csv dataset of World Development Indicators
In [3]: my_data = pd.read_csv('wdi_indicators.csv')

# Look at first three lines
In [4]: my_data.head(3)
Out[4]: 
   year country_name country_code   gdp_per_cap  literacy_rate  \
0  2011  Afghanistan          AFG   1712.588720      31.741117   
1  2011      Albania          ALB   9640.130216      96.845299   
2  2011      Algeria          DZA  12964.827210            NaN   

   life_expectancy  population_density  region  
0        60.065366           44.127634     NaN  
1        77.163220          106.013869     NaN  
2        70.751683           15.416096     NaN  

# OLS
In [5]: results = smf.ols('life_expectancy ~ population_density + gdp_per_cap',
   ...:                   data=my_data).fit()
   ...: 

In [6]: print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:        life_expectancy   R-squared:                       0.378
Model:                            OLS   Adj. R-squared:                  0.372
Method:                 Least Squares   F-statistic:                     65.10
Date:                Sun, 07 Aug 2016   Prob (F-statistic):           8.23e-23
Time:                        09:27:24   Log-Likelihood:                -734.50
No. Observations:                 217   AIC:                             1475.
Df Residuals:                     214   BIC:                             1485.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------
Intercept             65.0441      0.660     98.610      0.000        63.744    66.344
population_density    -0.0008      0.000     -2.033      0.043        -0.002 -2.38e-05
gdp_per_cap            0.0003   2.75e-05     11.023      0.000         0.000     0.000
==============================================================================
Omnibus:                       44.979   Durbin-Watson:                   2.081
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               66.635
Skew:                          -1.226   Prob(JB):                     3.39e-15
Kurtosis:                       4.166   Cond. No.                     3.55e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.55e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

# Categorical Vars are easy
   # Make categorical var
In [7]: my_data['low_income'] = my_data['gdp_per_cap'] < 4000

In [8]: results2 = smf.ols('life_expectancy ~ population_density + gdp_per_cap + C(low_income)', data=my_data).fit()

In [9]: print(results2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:        life_expectancy   R-squared:                       0.580
Model:                            OLS   Adj. R-squared:                  0.574
Method:                 Least Squares   F-statistic:                     97.92
Date:                Sun, 07 Aug 2016   Prob (F-statistic):           7.27e-40
Time:                        09:27:24   Log-Likelihood:                -692.02
No. Observations:                 217   AIC:                             1392.
Df Residuals:                     213   BIC:                             1406.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
=========================================================================================
                            coef    std err          t      P>|t|      [95.0% Conf. Int.]
-----------------------------------------------------------------------------------------
Intercept                69.7352      0.715     97.543      0.000        68.326    71.144
C(low_income)[T.True]   -10.6282      1.052    -10.103      0.000       -12.702    -8.555
population_density       -0.0003      0.000     -0.920      0.358        -0.001     0.000
gdp_per_cap               0.0002   2.57e-05      7.022      0.000         0.000     0.000
==============================================================================
Omnibus:                       75.439   Durbin-Watson:                   2.145
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              205.379
Skew:                          -1.527   Prob(JB):                     2.53e-45
Kurtosis:                       6.659   Cond. No.                     7.67e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.67e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

# Heteroskedastic-Robust Standard Errors
In [10]: results2_robust = results2.get_robustcov_results()

In [11]: print(results2_robust.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:        life_expectancy   R-squared:                       0.580
Model:                            OLS   Adj. R-squared:                  0.574
Method:                 Least Squares   F-statistic:                     83.84
Date:                Sun, 07 Aug 2016   Prob (F-statistic):           7.43e-36
Time:                        09:27:24   Log-Likelihood:                -692.02
No. Observations:                 217   AIC:                             1392.
Df Residuals:                     213   BIC:                             1406.
Df Model:                           3                                         
Covariance Type:                  HC1                                         
=========================================================================================
                            coef    std err          t      P>|t|      [95.0% Conf. Int.]
-----------------------------------------------------------------------------------------
Intercept                69.7352      0.918     75.969      0.000        67.926    71.545
C(low_income)[T.True]   -10.6282      1.203     -8.832      0.000       -13.000    -8.256
population_density       -0.0003      0.000     -0.851      0.396        -0.001     0.000
gdp_per_cap               0.0002   3.96e-05      4.564      0.000         0.000     0.000
==============================================================================
Omnibus:                       75.439   Durbin-Watson:                   2.145
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              205.379
Skew:                          -1.527   Prob(JB):                     2.53e-45
Kurtosis:                       6.659   Cond. No.                     7.67e+04
==============================================================================

Warnings:
[1] Standard Errors are heteroscedasticity robust (HC1)
[2] The condition number is large, 7.67e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

# Output to LaTeX
In [12]: latex = results2_robust.summary().as_latex()

In [13]: latex
Out[13]: '\\begin{center}\n\\begin{tabular}{lclc}\n\\toprule\n\\textbf{Dep. Variable:}        & life_expectancy  & \\textbf{  R-squared:         } &     0.580   \\\\\n\\textbf{Model:}                &       OLS        & \\textbf{  Adj. R-squared:    } &     0.574   \\\\\n\\textbf{Method:}               &  Least Squares   & \\textbf{  F-statistic:       } &     83.84   \\\\\n\\textbf{Date:}                 & Sun, 07 Aug 2016 & \\textbf{  Prob (F-statistic):} &  7.43e-36   \\\\\n\\textbf{Time:}                 &     09:27:24     & \\textbf{  Log-Likelihood:    } &   -692.02   \\\\\n\\textbf{No. Observations:}     &         217      & \\textbf{  AIC:               } &     1392.   \\\\\n\\textbf{Df Residuals:}         &         213      & \\textbf{  BIC:               } &     1406.   \\\\\n\\textbf{Df Model:}             &           3      & \\textbf{                     } &             \\\\\n\\bottomrule\n\\end{tabular}\n\\begin{tabular}{lccccc}\n                               & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$>$$|$t$|$} & \\textbf{[95.0\\% Conf. Int.]}  \\\\\n\\midrule\n\\textbf{Intercept}             &      69.7352  &        0.918     &    75.969  &         0.000        &        67.926    71.545       \\\\\n\\textbf{C(low_income)[T.True]} &     -10.6282  &        1.203     &    -8.832  &         0.000        &       -13.000    -8.256       \\\\\n\\textbf{population_density}    &      -0.0003  &        0.000     &    -0.851  &         0.396        &        -0.001     0.000       \\\\\n\\textbf{gdp_per_cap}           &       0.0002  &     3.96e-05     &     4.564  &         0.000        &         0.000     0.000       \\\\\n\\bottomrule\n\\end{tabular}\n\\begin{tabular}{lclc}\n\\textbf{Omnibus:}       & 75.439 & \\textbf{  Durbin-Watson:     } &    2.145  \\\\\n\\textbf{Prob(Omnibus):} &  0.000 & \\textbf{  Jarque-Bera (JB):  } &  205.379  \\\\\n\\textbf{Skew:}          & -1.527 & \\textbf{  Prob(JB):          } & 2.53e-45  \\\\\n\\textbf{Kurtosis:}      &  6.659 & \\textbf{  Cond. No.          } & 7.67e+04  \\\\\n\\bottomrule\n\\end{tabular}\n%\\caption{OLS Regression Results}\n\\end{center}'

# Save to disk
In [14]: with open("regression_table.tex", "w") as text_file:
   ....:     text_file.write(latex)
   ....: 

QuantEcon

QuantEcon is a new library specifically for economists with some tools not found in statsmodels. A full index is here

PyStan

PyStan is the Python interface for the Stan library – a set of tools for statisticians, especially bayesians. You can find resources on Stan in general here, and PyStan in particular here .