spreg.Probit¶

class
spreg.
Probit
(y, x, w=None, optim='newton', scalem='phimean', maxiter=100, vm=False, name_y=None, name_x=None, name_w=None, name_ds=None, spat_diag=False)[source]¶ Classic nonspatial Probit and spatial diagnostics. The class includes a printout that formats all the results and tests in a nice format.
The diagnostics for spatial dependence currently implemented are:
 Parameters
 xarray
nxk array of independent variables (assumed to be aligned with y)
 yarray
nx1 array of dependent binary variable
 wW
PySAL weights instance aligned with y
 optimstring
Optimization method. Default: ‘newton’ (NewtonRaphson). Alternatives: ‘ncg’ (NewtonCG), ‘bfgs’ (BFGS algorithm)
 scalemstring
Method to calculate the scale of the marginal effects. Default: ‘phimean’ (Mean of individual marginal effects) Alternative: ‘xmean’ (Marginal effects at variables mean)
 maxiterint
Maximum number of iterations until optimizer stops
 name_ystring
Name of dependent variable for use in output
 name_xlist of strings
Names of independent variables for use in output
 name_wstring
Name of weights matrix for use in output
 name_dsstring
Name of dataset for use in output
Examples
We first need to import the needed modules, namely numpy to convert the data we read into arrays that
spreg
understands andlibpysal
to perform all the analysis.>>> import numpy as np >>> import libpysal >>> np.set_printoptions(suppress=True) #prevent scientific format
Open data on Columbus neighborhood crime (49 areas) using libpysal.io.open(). This is the DBF associated with the Columbus shapefile. Note that libpysal.io.open() also reads data in CSV format; since the actual class requires data to be passed in as numpy arrays, the user can read their data in using any method.
>>> dbf = libpysal.io.open(libpysal.examples.get_path('columbus.dbf'),'r')
Extract the CRIME column (crime) from the DBF file and make it the dependent variable for the regression. Note that libpysal requires this to be an numpy array of shape (n, 1) as opposed to the also common shape of (n, ) that other packages accept. Since we want to run a probit model and for this example we use the Columbus data, we also need to transform the continuous CRIME variable into a binary variable. As in [McM92], we define y = 1 if CRIME > 40.
>>> y = np.array([dbf.by_col('CRIME')]).T >>> y = (y>40).astype(float)
Extract HOVAL (home values) and INC (income) vectors from the DBF to be used as independent variables in the regression. Note that libpysal requires this to be an nxj numpy array, where j is the number of independent variables (not including a constant). By default this class adds a vector of ones to the independent variables passed in.
>>> names_to_extract = ['INC', 'HOVAL'] >>> x = np.array([dbf.by_col(name) for name in names_to_extract]).T
Since we want to the test the probit model for spatial dependence, we need to specify the spatial weights matrix that includes the spatial configuration of the observations into the error component of the model. To do that, we can open an already existing gal file or create a new one. In this case, we will use
columbus.gal
, which contains contiguity relationships between the observations in the Columbus dataset we are using throughout this example. Note that, in order to read the file, not only to open it, we need to append ‘.read()’ at the end of the command.>>> w = libpysal.io.open(libpysal.examples.get_path("columbus.gal"), 'r').read()
Unless there is a good reason not to do it, the weights have to be rowstandardized so every row of the matrix sums to one. In libpysal, this can be easily performed in the following way:
>>> w.transform='r'
We are all set with the preliminaries, we are good to run the model. In this case, we will need the variables and the weights matrix. If we want to have the names of the variables printed in the output summary, we will have to pass them in as well, although this is optional.
>>> from spreg import Probit >>> model = Probit(y, x, w=w, name_y='crime', name_x=['income','home value'], name_ds='columbus', name_w='columbus.gal')
Once we have run the model, we can explore a little bit the output. The regression object we have created has many attributes so take your time to discover them.
>>> np.around(model.betas, decimals=6) array([[ 3.353811], [0.199653], [0.029514]])
>>> np.around(model.vm, decimals=6) array([[ 0.852814, 0.043627, 0.008052], [0.043627, 0.004114, 0.000193], [0.008052, 0.000193, 0.00031 ]])
Since we have provided a spatial weigths matrix, the diagnostics for spatial dependence have also been computed. We can access them and their pvalues individually:
>>> tests = np.array([['Pinkse_error','KP_error','PS_error']]) >>> stats = np.array([[model.Pinkse_error[0],model.KP_error[0],model.PS_error[0]]]) >>> pvalue = np.array([[model.Pinkse_error[1],model.KP_error[1],model.PS_error[1]]]) >>> print(np.hstack((tests.T,np.around(np.hstack((stats.T,pvalue.T)),6)))) [['Pinkse_error' '3.131719' '0.076783'] ['KP_error' '1.721312' '0.085194'] ['PS_error' '2.558166' '0.109726']]
Or we can easily obtain a full summary of all the results nicely formatted and ready to be printed simply by typing ‘print model.summary’
 Attributes
 xarray
Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant
 yarray
nx1 array of dependent variable
 betasarray
kx1 array with estimated coefficients
 predyarray
nx1 array of predicted y values
 nint
Number of observations
 kint
Number of variables
 vmarray
Variancecovariance matrix (kxk)
 z_statlist of tuples
z statistic; each tuple contains the pair (statistic, pvalue), where each is a float
 xmeanarray
Mean of the independent variables (kx1)
 predpcfloat
Percent of y correctly predicted
 loglfloat
LogLikelihhod of the estimation
 scalemstring
Method to calculate the scale of the marginal effects.
 scalefloat
Scale of the marginal effects.
 slopesarray
Marginal effects of the independent variables (k1x1)
 slopes_vmarray
Variancecovariance matrix of the slopes (k1xk1)
 LRtuple
Likelihood Ratio test of all coefficients = 0 (test statistics, pvalue)
 Pinkse_error: float
Lagrange Multiplier test against spatial error correlation. Implemented as presented in [Pin04]
 KP_errorfloat
Moran’s I type test against spatial error correlation. Implemented as presented in [KP01]
 PS_errorfloat
Lagrange Multiplier test against spatial error correlation. Implemented as presented in [PS98]
 warningboolean
if True Maximum number of iterations exceeded or gradient and/or function calls not changing.
 name_ystring
Name of dependent variable for use in output
 name_xlist of strings
Names of independent variables for use in output
 name_wstring
Name of weights matrix for use in output
 name_dsstring
Name of dataset for use in output
 titlestring
Name of the regression method used

__init__
(y, x, w=None, optim='newton', scalem='phimean', maxiter=100, vm=False, name_y=None, name_x=None, name_w=None, name_ds=None, spat_diag=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(y, x[, w, optim, scalem, maxiter, …])Initialize self.
gradient
(par)hessian
(par)ll
(par)par_est
()Attributes

property
KP_error
¶

property
LR
¶

property
PS_error
¶

property
Pinkse_error
¶

gradient
(par)¶

hessian
(par)¶

ll
(par)¶

par_est
()¶

property
phiy
¶

property
predpc
¶

property
predy
¶

property
scale
¶

property
slopes
¶

property
slopes_std_err
¶

property
slopes_vm
¶

property
slopes_z_stat
¶

property
u_gen
¶

property
u_naive
¶

property
vm
¶

property
xb
¶

property
xmean
¶

property
z_stat
¶