spreg.OLS¶

class spreg.OLS(y, x, w=None, robust=None, gwk=None, sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None)[source]¶

Ordinary least squares with results and diagnostics.

Parameters

yarray: nx1 array for dependent variable
xarray: Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant
wpysal W object: Spatial weights object (required if running spatial diagnostics)
robuststring: If ‘white’, then a White consistent estimator of the variance-covariance matrix is given. If ‘hac’, then a HAC consistent estimator of the variance-covariance matrix is given. Default set to None.
gwkpysal W object: Kernel spatial weights needed for HAC estimation. Note: matrix must have ones along the main diagonal.
sig2n_kboolean: If True, then use n-k to estimate sigma^2. If False, use n.
nonspat_diagboolean: If True, then compute non-spatial diagnostics on the regression.
spat_diagboolean: If True, then compute Lagrange multiplier tests (requires w). Note: see moran for further tests.
moranboolean: If True, compute Moran’s I on the residuals. Note: requires spat_diag=True.
white_testboolean: If True, compute White’s specification robust test. (requires nonspat_diag=True)
vmboolean: If True, include variance-covariance matrix in summary results
name_ystring: Name of dependent variable for use in output
name_xlist of strings: Names of independent variables for use in output
name_wstring: Name of weights matrix for use in output
name_gwkstring: Name of kernel weights matrix for use in output
name_dsstring: Name of dataset for use in output

Examples

>>> import numpy as np
>>> import libpysal
>>> from spreg import OLS

Open data on Columbus neighborhood crime (49 areas) using libpysal.io.open(). This is the DBF associated with the Columbus shapefile. Note that libpysal.io.open() also reads data in CSV format; also, the actual OLS class requires data to be passed in as numpy arrays so the user can read their data in using any method.

>>> db = libpysal.io.open(libpysal.examples.get_path('columbus.dbf'),'r')

Extract the HOVAL column (home values) from the DBF file and make it the dependent variable for the regression. Note that PySAL requires this to be an nx1 numpy array.

>>> hoval = db.by_col("HOVAL")
>>> y = np.array(hoval)
>>> y.shape = (len(hoval), 1)

Extract CRIME (crime) and INC (income) vectors from the DBF to be used as independent variables in the regression. Note that PySAL requires this to be an nxj numpy array, where j is the number of independent variables (not including a constant). spreg.OLS adds a vector of ones to the independent variables passed in.

>>> X = []
>>> X.append(db.by_col("INC"))
>>> X.append(db.by_col("CRIME"))
>>> X = np.array(X).T

The minimum parameters needed to run an ordinary least squares regression are the two numpy arrays containing the independent variable and dependent variables respectively. To make the printed results more meaningful, the user can pass in explicit names for the variables used; this is optional.

>>> ols = OLS(y, X, name_y='home value', name_x=['income','crime'], name_ds='columbus', white_test=True)

spreg.OLS computes the regression coefficients and their standard errors, t-stats and p-values. It also computes a large battery of diagnostics on the regression. In this example we compute the white test which by default isn’t (‘white_test=True’). All of these results can be independently accessed as attributes of the regression object created by running spreg.OLS. They can also be accessed at one time by printing the summary attribute of the regression object. In the example below, the parameter on crime is -0.4849, with a t-statistic of -2.6544 and p-value of 0.01087.

>>> ols.betas
array([[46.42818268],
       [ 0.62898397],
       [-0.48488854]])
>>> print(round(ols.t_stat[2][0],3))
-2.654
>>> print(round(ols.t_stat[2][1],3))
0.011
>>> print(round(ols.r2,3))
0.35

Or we can easily obtain a full summary of all the results nicely formatted and ready to be printed:

>>> print(ols.summary)
REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :    columbus
Weights matrix      :        None
Dependent Variable  :  home value                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      46.4281827      13.1917570       3.5194844       0.0009867
              income       0.6289840       0.5359104       1.1736736       0.2465669
               crime      -0.4848885       0.1826729      -2.6544086       0.0108745
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER           12.538

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2          39.706           0.0000

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                2           5.767           0.0559
Koenker-Bassett test              2           2.270           0.3214

SPECIFICATION ROBUST TEST
TEST                             DF        VALUE           PROB
White                             5           2.906           0.7145
================================ END OF REPORT =====================================

If the optional parameters w and spat_diag are passed to spreg.OLS, spatial diagnostics will also be computed for the regression. These include Lagrange multiplier tests and Moran’s I of the residuals. The w parameter is a PySAL spatial weights matrix. In this example, w is built directly from the shapefile columbus.shp, but w can also be read in from a GAL or GWT file. In this case a rook contiguity weights matrix is built, but PySAL also offers queen contiguity, distance weights and k nearest neighbor weights among others. In the example, the Moran’s I of the residuals is 0.204 with a standardized value of 2.592 and a p-value of 0.0095.

>>> w = libpysal.weights.Rook.from_shapefile(libpysal.examples.get_path("columbus.shp"))
>>> ols = OLS(y, X, w, spat_diag=True, moran=True, name_y='home value', name_x=['income','crime'], name_ds='columbus')
>>> ols.betas
array([[46.42818268],
       [ 0.62898397],
       [-0.48488854]])

>>> print(round(ols.moran_res[0],3))
0.204
>>> print(round(ols.moran_res[1],3))
2.592
>>> print(round(ols.moran_res[2],4))
0.0095

Attributes

summarystring: Summary of regression results and diagnostics (note: use in conjunction with the print command)
betasarray: kx1 array of estimated coefficients
uarray: nx1 array of residuals
predyarray: nx1 array of predicted y values
ninteger: Number of observations
kinteger: Number of variables for which coefficients are estimated (including the constant)
yarray: nx1 array for dependent variable
xarray: Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant
robuststring: Adjustment for robust standard errors
mean_yfloat: Mean of dependent variable
std_yfloat: Standard deviation of dependent variable
vmarray: Variance covariance matrix (kxk)
r2float: R squared
ar2float: Adjusted R squared
utufloat: Sum of squared residuals
sig2float: Sigma squared used in computations
sig2MLfloat: Sigma squared (maximum likelihood)
f_stattuple: Statistic (float), p-value (float)
logllfloat: Log likelihood
aicfloat: Akaike information criterion
schwarzfloat: Schwarz information criterion
std_errarray: 1xk array of standard errors of the betas
t_statlist of tuples: t statistic; each tuple contains the pair (statistic, p-value), where each is a float
mulCollifloat: Multicollinearity condition number
jarque_beradictionary: ‘jb’: Jarque-Bera statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
breusch_pagandictionary: ‘bp’: Breusch-Pagan statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
koenker_bassettdictionary: ‘kb’: Koenker-Bassett statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
whitedictionary: ‘wh’: White statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
lm_errortuple: Lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float
lm_lagtuple: Lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float
rlm_errortuple: Robust lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float
rlm_lagtuple: Robust lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float
lm_sarmatuple: Lagrange multiplier test for spatial SARMA model; tuple contains the pair (statistic, p-value), where each is a float
moran_restuple: Moran’s I for the residuals; tuple containing the triple (Moran’s I, standardized Moran’s I, p-value)
name_ystring: Name of dependent variable for use in output
name_xlist of strings: Names of independent variables for use in output
name_wstring: Name of weights matrix for use in output
name_gwkstring: Name of kernel weights matrix for use in output
name_dsstring: Name of dataset for use in output
titlestring: Name of the regression method used
sig2nfloat: Sigma squared (computed with n in the denominator)
sig2n_kfloat: Sigma squared (computed with n-k in the denominator)
xtxfloat: \(X'X\)
xtxifloat: \((X'X)^{-1}\)

__init__(y, x, w=None, robust=None, gwk=None, sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(y, x[, w, robust, gwk, sig2n_k, …])

Initialize self.

Attributes

`mean_y`
`sig2n`
`sig2n_k`
`std_y`
`utu`
`vm`

property mean_y¶

property sig2n¶

property sig2n_k¶

property std_y¶

property utu¶

property vm¶