Cointegration Test using Stata

We now present a short answer to a common questions on cointegration test using Stata. This question commonly appear on many social media groups that what is cointegration test or how can we test for cointegration using Stata. The answer is not so simple but also not that difficult. The first thing we need before finding an exact answer to such questions on cointegration test using Stata needs the type of data one has to deal with. So we present our answer in two sections. Section 1 lists down the methods one can use for cointegration test for time series data and secon section lists down the approaches one can use for panel data. Enroll for Time Series Analysis using Stata here.

Time Series Cointegration test using Stata

bayerhanck from
'BAYERHANCK': module to compute test for non-cointegration / bayerhanck produces a joint test-statistic for the null of / no-cointegration based on Engle-Granger, Johansen maximum / eigenvalue, Boswijk, and Banerjee tests. / KW: cointegration / KW: Engle-Granger / KW: Johansen / KW:

egranger from
'EGRANGER': module to perform Engle-Granger cointegration tests and 2-step ECM estimation / egranger conducts tests for cointegration proposed by Engle and / Granger (1987), reporting test statistics plus critical values / calculated by MacKinnon (1990, 2010). egranger will also / estimate an

ghansen from
'GHANSEN': module to perform Gregory-Hansen test for cointegration with regime shifts / ghansen performs the Gregory-Hansen test for cointegration with / regime shifts (structural breaks) proposed in Gregory and Hansen / (1996) The test's null hypothesis is no cointegration against the /

johans from
'JOHANS': module to perform Johansen-Juselius ML estimates of cointegration / This command is an updated version of mlcoint, originally written / by Heinecke and Morris, part of the tslib (Stata 5) time-series / package, the use of which is somewhat problematic under Stata 6 / or 7.

lmeg from
'LMEG': module to compute Augmented Engle-Granger Cointegration Test at Higher Order AR(p) / lmeg computes Augmented Engle-Granger Cointegration Test at / Higher Order AR(p) / KW: cointegration / KW: Engle-Granger / KW: AR(p) / Requires: Stata version 11.0 / Distribution-Date: 20120618 /

mlcoint from
'MLCOINT': module to compute Johansen cointegration tests / This is a corrected version of mlcoint from that originally / published in STB-21 and updated to work under Stata 6, available / in the Becketti tslib. That version failed when more than 9 / variables were included in the varlist.


ranktest from
'RANKTEST': module to test the rank of a matrix using the Kleibergen-Paap rk statistic / ranktest implements the Kleibergen-Paap (2006) rk test for the / rank of a matrix. Tests of the rank of a matrix have many / practical applications. For example, in econometrics the / requirement

vececm from
'VECECM': module to estimate vector error correction models (ECMs) / vececm estimates a vector error correction model (ECM) after one / or more cointegrating vectors have been identified using / Johansen's maximum-likelihood cointegration rank test (see help / johans). vececm


Panel Data Cointegration test using Stata

nharvey from
'NHARVEY': module to perform Nyblom-Harvey panel test of common stochastic trends / nharvey estimates one form of the test of common stochastic / trends developed by Nyblom and Harvey (NH, 2000). The test is of / the validity of a specified value of the rank of the covariance / matrix of

xtdolshm from
'XTDOLSHM': module to perform panel data cointegration / xtdolshm fits a model of depvar on indepvars using Kao and / Chiang (2000) Dynamic Ordinary Least Squares for Cointegrated / Panel Data with homogeneous long-run covariance structure across / cross-sectional units. You must

xtpedroni from
'XTPEDRONI': module to perform Pedroni's panel cointegration tests and Panel Dynamic OLS estimation / xtpedroni has two functions: First, it allows Stata users to / compute Pedroni's (OBES 1999, REStat 2001) seven test statistics / under a null of no cointegration in a heterogeneous

xtpmg from
'XTPMG': module for estimation of nonstationary heterogeneous panels / We introduce a new Stata command, xtpmg, for estimating / nonstationary heterogeneous panels in which the number of groups / and number of time-series observations are both large. Based on / recent advances in the

xtwest from
'XTWEST': module for testing for cointegration in heterogeneous panels / The xtwest command implements the four panel cointegration tests / developed by Westerlund (2007). The underlying idea is to test / for the absence of cointegration by determining whether there / exists error

Sensitivity Analysis of Regression Models using Stata

Let us see how we can conduct sensitivity analysis using Stata for regression models and coefficients from different types of regression models and also for various approaches to statistical analysis. Enroll for a private and instructor led course to learn more about Econometrics using Stata.

What is sensitivity analysis?

Sensitivity Analysis is a tool used in regression modeling to analyze how the different values of a set of independent variables affect a specific dependent variable under certain specific conditions. We can see that if the regression model is estimated from a given data, then we change the values of the independent variables from lowest to highest in different ranges and look into the values of the coefficients, we can determine how sensitive is the value of coefficientIn general, Sensitivity Analysis is used in a wide range of fields, ranging from biology and geography to economics and engineering.

It is especially useful in the study and analysis of a “Black Box Processes” where the output is an opaque function of several inputs. An opaque function or process is one which for some reason can’t be studied and analyzed. For example, climate models in geography are usually very complex. As a result, the exact relationship between the inputs and outputs are not well understood. (Full article on CFI)

Stata Commands for Sensitivity Analysis using Stata

Sensitivity Analysis using Stata can be conducted on various dimensions for various situations and analysis. This first code in Stata - senspec- provides sensitivity for quantitative classification variables.

'SENSPEC': module to compute sensitivity and specificity results saved in generated variables / senspec inputs a reference variable with two values and / a quantitative classification variable. It creates, as / output, a set of new variables, containing, in each / observation, the numbers

The second method for sensitivity analysis using Stata is for multiple imputation. The code for this objective in Stata is called -mimix-.

'MIMIX': module to perform reference based multiple imputation for sensitivity analysis of longitudinal clinical trials with protocol deviation / mimix imputes missing numerical outcomes for a longitudinal / trial with protocol deviation under distinct reference group / (typically

'MBSENS': module to compute Sensitivity metric for matched sample using McNemar's test / mbsens is used for calculating the binary sensitivity metric / (gamma) using McNemar's statistic using the matched sample as the / input. / KW: sensitivity / KW: matched samples / KW: Rosenbaum / KW:

'ISA': module to perform Imbens' (2003) sensitivity analysis / isa produces a figure for the sensitivity analysis developed by / Imbens (American Economic Review, 2003). Observational studies / cannot control for the bias due to the omission of unobservables. / The sensitivity

'GSA': module to perform generalized sensitivity analysis / gsa produces a figure for the sensitivity analysis similar to / Imbens (American Economic Review, 2003). Observational studies / cannot control for the bias due to the omission of unobservables. / The sensitivity analysis provides a

'EPISENS': module for basic sensitivity analysis of epidemiological results / episens provides basic sensitivity analysis of the observed / relative risks adjusting for unmeasured confounding and / misclassification of the exposure. episensi is the / immediate form of

'EPISENSRRI': module for basic sensitivity analysis for unmeasured confounders / episensrri provides basic sensitivity analysis of the apparent or / observed relative risks according to specified plausible / values of the prevalence of the unmeasured confounding / among exposed and

The above commands can be found in Stata by running the code:

findit sensitivity

I hope this simple Stata tutorial will be useful in learning more about Sensitivity Analysis.

Stata vs SPSS: Structural Equation Modeling

We have been working on comparing Stata vs SPSS for many aspects like ease of use, handling of data, allowing for running a specific statistical technique and community involvement to answer some problems not commonly arising. These comparisons help in selection of Stata or SPSS by students and researchers to conduct data analysis for academic or commercial research projects. In this tutorial, we will provide a simple comparison of how Stata and SPSS can be used for running Structural Equation Modeling and what are the simple differences that one has to note down before deciding for selection o f Stata vs Stata.

Downloading Stata and SPSS

The Stata has a built in SEM Builder which does not need to buy a separate package or software extension to add additional statistical feature like SPSS needs AMOS to be bought and installed as a separate software. The Stata can be purchased here and evaluation version/trial version with full features including the SEM builder as built in can be requested here. On the other hands, the AMOS should be purchased separately here, can be downloaded for trial here and it needs you to have SPSS installed before AMOS is installed. Download SPSS for download on trial here.

SEM Builder vs AMOS

The simple outlook of the two SEM builders look different but we can say they are same at the outset. Like both allows you to design you path diagram, develop the linkages between latents and items and hence design the covariance. We can see the two screenshots from AMOS and Stata here.

Learn Structural Equation Modeling in Private and Instructor Led Courses here

The Stata SEM Builder Screenshot

stata vs spss sem builder

Picture: AnEc Center for Econometrics Research Online Course in Structural Equation Model using Stata

The AMOS Screenshot

stata vs spss amos

Picture: AnEc Center for Econometrics Research Online Course in Structural Equation Model using SPSS/AMOS

Process of SEM in Stata vs SPSS AMOS

To run SEM in Stata, we follow these simple steps:

  1. Open Stata and your data on items for the corresponding items.
  2. Click on Statistics
  3. Click  SEM (structural equation modeling)
  4. Then click on Model building and estimation

The SEM Builder window will open. Now we need to carefully read the menu items on this window, its side bars etc. We can see basic constructs like Ovals, single headed Arrows and two headed Arrows and Squares. Some icons might combine these. We will use these tools to design the path diagram for our theoretical framework and estimation of the model. Ovals are used to denote Latent variables, Square is used to show an item/observed variable and single headed arrow is used to define one way effect and two headed arrows are used for covariance/correlation between the variables.

We can see the path diagram in the above screenshot from Stata SEM Builder. Once this kind of path diagram has been created, variables names are given to the latent-ovals and variables added to the Squared observed items, we can click on Estimation and proceed with the menus to create the model output in Stata. It looks like a simple OLS regression table with coefficients, SEs, and P-values. We can interpret these results in a separate article later.

To run SEM in AMOS, we proceed to follow these steps:

In SPSS, once AMOS is installed separately, we can click estimate an SEM model following these steps:

  1. Open SPSS and data file
  2. Click on Analyse
  3. Click on AMOS at the bottom of the menu items if AMOS is installed.
  4. The AMOS Graphics will open the new Window of AMOS which looks like the above screenshot.

In AMOS, the menu bar are also very simple to follow and the left bar has direct options to create latent variables, observed variables and groups of measurements (latent with observed variables) and arrows. In AMOS, this side bar has many other important options to alter the path diagrams and finalize the path diagram. Once the model has been created, estimated and results are produced (we will write a separate article on interpretation of SEM results in coming days).

Community Support for Stata vs SPSS

We can easily find that Stata has a huge community compared to SPSS through social media, lists and forums while SPSS has a disadvantage at this despite having some great and privately managed tutorials websites. It is quite easy to find basic answers to many questions on the official FAQs pages from both Stata vs SPSS but the instant community based discussion gives Stata an edge over the SPSS.


This comparison is not for marketing purposes or technical evaluation of Stata and SPSS. We recommend you request a technical review of both the statistical softwares from our experts if you need more deeper analysis.

Interpreting Panel Data Regression Models

Interpreting Panel Data Regression Models is commonly requested by many students on social media groups or through private messages on Facebook and LinkedIn. In this tutorial, I will elaborate how to interpret different results from Panel Data Regression Models like Fixed Effects and Random Effects models. You can learn Panel Data Analysis in private course with us. Submit your application and confirm your admission here.

Interpreting Panel Data Regression Models

Interpreting Fixed Effects Model

In this section, we are Interpreting Panel Data Regression Models with Fixed Effects.

We can see the left top corner of the first set of regression results that Number of obs = 3,792 which is actually time period (t) multiplied by entities (i). The number of groups (i) in this data is 130. So we can find average number of time period for which these (i) has data is 3,792/130 = 30 (29.XX). This does not mean the maximum number of time periods for which data is available for a given variable, say debt in our example. There can be some variables for some countries where data can be available for less than 30 years and for some countries the same variable will have more than 30 years of data. So the value of 30 means on average 30 years of data is available for countries. This can be seen from the the same part of panel data fixed effects regression given in the left top corner of the output in Stata. We can see minimum observation per (i) is 2, maximum is 36 and on average it is 36.

The lower part of this section of the output shows the F test which is loosely considers goodness of fit statistic (in addition to R Square statistics or explanatory power of the regression model). So if the F statistic here is greater than the tabulated value of F distribution for given degrees of freedom, we will reject the null hypothesis that all the included independent variables (Xit = gdp gfc indva trade) does not affect/determine the dependent variable (y = debt). We can see the the degrees of freedom include a value of 4 which is equal to the Xit so we can guess from this it the k parameters of regression. The second value is (N-i-k = 3,792-130-4). If the p-value corresponding to this test is less than 0.05, we reject the null hypothesis that all the Xit variables do not affect Yit and accept the alternative that all Xit affect Yit (or at least some of them do).

Interpreting Panel Data Regression Models 1

The second half of the top results include R square. As we could see from the panel structure, the variations in Yit can be both at time based on entity based. So we can split the total R squared into two types. Within R square and Between R Square. It should be clear that R square in Panel Data models is not simple R squares that we obtain from OLS estimators. These R square values are based on correlations actually between the actual Yit and its Predicted values from the regression equation.

Let us assume, when we estimate a panel data model and its prediction system becomes like this:

Interpreting Panel Data Regression Models 1

The predictions from equation 1 and its correlation with actual Y gives overall R square. The prediction from equation 2 and the actual Y gives Between R square while the predictions from equation 3 and actual Y gives Within R squares. We can see that within R square is = 0.9710, between R square is = 0.9940 and overall R square is = 0.9764 and these are relatively higher indicators of goodness of fit statistic

The last section of the top results is an indication of the correlation between entity specific error term as Ui and Xb which is model explanatory variables. We can from the above results output that there is a high correlation between Ui and Xb which is equal to 0.6631. We can see that a value close to 0 is a weaker indicator of no fixed effects in the model and hence we might need an alternative specification for panel data. This is not a formal test but provides a basis for further formal tests to compare Fixed Effects models with Random Effects Models like we do in Hausman test that we will explain in a separate post.

Before reading the next paragraph, please note that main table of coefficients and other statistics is given after the above statistics at top. The first column in this table is usually the Yit and the other variables are all Xit in the model in rows. The Intercept of the regression model in Stata is denoted by _cons.

The key issue in Interpreting Panel Data Regression Models is related to the above few bits of statistics that commonly we miss to understand. The key results all by the way, the table with coefficients (in Stata results it is denoted as Coef.) , standard error (in Stata results it is denoted as Std. Err.), t or z statistics (in Stata results it is denoted as t or Z), p-values(in Stata results it is denoted as P>|t|) and confidence intervals (in Stata results it is denoted as [95% Conf. Interval]). I would explain these in turn in a single line each. Coefficient is the effect of Xit on Yit and we have to note its sign and size both. The objective of regression model determines if we need sign or size or both. For inferences, we usually need the sign only, for predictive analysis, we do need sign and for both we have to use sign and size of the coefficient. Also, the variation in Yit is both across time and i so we have to denote the effect of Xit on Yit in that context as well. So the interpretation of Coef. is in terms of time and i both.

The second column is Standard Errors of the coefficients. These are the standard deviations from many coefficients of the same variable in a similar regression model estimated from various samples taken from the same data. We call this process as sampling distribution. Thus, the standard errors shows us the margin of error within which the coefficients can vary across different cases. The standard errors are required to compute the t statistics or Z statistics we will use for hypothesis testing on coefficients to determine the significance of effects.

The next column shows the values of t statistics for all coefficients. These are interpreted simply to showcase if the given coefficient is statistically significant or not. The null hypothesis on coefficients are stated as: Ho: b(X)=0 against the alternative that H1: b(0) is not equal to 0. If the value of t statistic is greater than 1.96, we reject the null hypothesis and accepts the alternative hypothesis. Thus, rejection of the null hypothesis based on the observation that value of t statistic is greater than 1.96 means the effect of X is significantly different than 0 or simply putting, we can say X affects Y significantly. This should be stated for each independent variable in the model. The null hypothesis on intercept can be interpreted as the values of Y is different than 0 when all the X variables are joinly 0. This does not test any effect by the way.

The most important results in Interpreting Panel Data Regression Models is p-value. The next column in Fixed Regression results in the given Stata output shows the p-values. If the p-value is less than 0.05, we reject the null hypothesis that Ho: b(X)=0 against the alternative that H1: b(X) is not equal to 0.

It is the average.... MORE NEXT WEEK 🙂

Interpreting Random Effects Model

In this section, we are Interpreting Panel Data Regression Models with Random Effects.

Interpreting Panel Data Regression Models 2





Watch our Econometrics Workshop Videos here.

Stata verses SPSS: Multiple regression example

Comparing Stata verses SPSS based on multiple regression is a useful tool to determine the capabilities of the two softwares. Many researchers and data analysts in social sciences prefers SPSS over Stata for many reasons while some love Stata more than SPSS for other reasons. In this tutorial, we demonstrate the estimation of a Multiple Regression model for a given dataset and what we can do with Stata verses SPSS.

Stata verses SPSS: Multiple Regression

Generally we can see that Stata offers more options to estimate a multiple regression model than SPSS because we can handle many types of data which SPSS cannot handle. One example for such comparative preference of Stata over SPSS for many in the area of social sciences and healthcare is analysis of limited dependent variables and count data types with Poisson family regression models like Negative Binomial Regression models.

Multiple Regression using SPSS: Menu System

In the following screenshot, we can see the menu in SPSS to select running a multiple regression model.

stata verses spss menu 1

It can also be seen from the second screenshot that SPSS offers a submenu to produce addition post estimation results like measuring Autocorrelation through Durbin-Watson Statistic and testing for multicolinearity.

stata verses spss submenu spss

Also another submenu can be used to plot the residuals and predicted values from the multiple regression model. This gives us a slight perception that it is much easier to run a multiple regression model in SPSS verses Stata.

The output in SPSS for multiple regression looks like this:

stata verses spss output in spss

Multiple Regression using Stata: Menu System

In Stata verses SPSS, the menu system in Stata is very similar to SPSS but the difference is in names of main menu bar and then the basic difference arises in specification menu to select the dependent and independent variables. Also, we saw that SPSS allows us to select the post estimation tests after multiple regression during the model specification and variables selection while in Stata, we have more useful and valuable options through his menu than post estimation tests like weights selection, comparing results across groups and conditioning the results through IF and in menu. Also, Stata offers us to select the reporting styles and conditions for our estimated model. See the menu in Stata here:

stata verses spss menu 2

and submenu for options (not post estimation tests as we did in SPSS) here:

stata verses spss postestimation in Stata

We can see that Stata offers a unique PostEstimation sub menu through the main menu bar. It allows us to test for many more tests than autocorrelation and multicollinearity like Heteroscedasticity and creating Marginals and Elasticities as well which SPSS lacks.

Finally, we can see that outputs in Stata verses SPSS is also different. SPSS provides split tables for multiple parts of the regression results like ANOVA and Diagnostics are produced in separate tables, R square and explanatory statistics are produced in another tables and coefficients are produced in another separate table. The results in Stata is somehow amazingly arranged into one condensed table. Explanatory part and R Square is produced at the left top of the table of table of coefficients estimates with ANOVA at the right top. This makes the reading of regression estimates easily. The SPSS output gives postestimation tests separately as done by Stata as well.

So the conclude on Stata verses SPSS, there is slight differences in the menu system as well as offered tests in Stata and SPSS. One can easily determine based on convenience to select a model very quickly both in Stata and SPSS.

If you need to learn Stata, SPSS and Eviews, click here to enroll for a private course or request our freelancers to complete your Homework and Assignments here.

Note: This comparison is our private view. There is sponsoring content. Also, SPSS and Stata are copyrights of StataCorp and IBM SPSS.

VAR vs VECM: An Overview of Basic Differences

A Note On VAR vs VECM

Most of the times, a beginner in applied econometrics time series asks the key question about VAR vs VECM. In this simple tutorial, I will mention some of the key differences between VAR and VECM. We can presume that VAR is a multiple regression model with multiple dependent variables without any restrictions on the variables while the VECM is a restricted multiple regression with some coefficients and some modification in the specification of functional form.

To determine the key difference between VAR and VECM, we can use the mathematical form of the functions that define both VAR vs VECM. The VAR system can be represented in the following:

var vs vecm eqn 1

while the general form of the equation to estimate an VECM is given by:

var vs vecm eqn 2

This gives the basic answer to the comparing VAR vs VECM. We can easily determine the nature of equations for estimating an VAR and VECM system despite assuming the VECM is a modified version of the VAR or we can say VECM is a restricted VAR.

We can also see in the VAR system that where yt is a vector of K variables, each modeled as function of p lags of those variables and, optionally, a set of exogenous variables xt where we assume that E(ut ) = 0, E(ut u ) = Σ and E(ut u􀀀s) = 0 ∀t 􀀀= s. While for VECM, If all variables in yt are I(1), the matrix Π has rank 0 ≤ r < K , where r is the number of linearly independent cointegrating vectors. If the variables are cointegrated (r > 0) the VAR in first differences is misspecified as it excludes the error correction term. (Kit Baum, 2013).

Then, we can see the results between VAR vs VECM from the Eviews and Stata results for a given set of time series variables.  The following Stata output shows the VAR setup for the given three variables, Debt, GDP and GFC.

var vs vecm stata 1

While the same system can be estimated using the VECM setup in Stata. This gives the basic difference between VAR vs VECM with additional parameters due to restrictions and modification of the VAR setup.

var vs vecm stata 2

We can see that the variables in simple VAR becomes normally in levels unless we define a differenced VAR but in VECM, the same level variables becomes in differenced for the VAR and the modified/normalized the cointegrated equation.

To learn more about VAR vs VECM, click here to enroll for a private course in Applied Econometrics Research.

ARDL Lag Selection

There are commonly asked question that about ARDL Lag Selection or how to select lag structure for ARDL models. In this simple tutorial using Eviews, we demonstrate and explain how to select the ARDL model based on various criteria. Note, that Eviews produces a the ARDL model automatically so we have to explain the model selection criteria in a little more details so we are clear as to what Lag selection is.

Information Criteria

Eviews reports Log Likelihood, Akaike Information Criteria, Bayesian Information Criteria, Hannan-Quin Information Criteria and Adj. R-sq (not specifically an information criteria). We should note that many models are proposed in different settings so selection of an information criteria for lag selection in a time series models should be carefully dealt with.

One can read more about AIC vs BIC in a reply to Adrift by Methodology Center (Read more about Latent Class Analysis here):

Dear Adrift,

As you know, AIC and BIC are both penalized-likelihood criteria. They are sometimes used for choosing best predictor subsets in regression and often used for comparing nonnested models, which ordinary statistical tests cannot do. The AIC or BIC for a model is usually written in the form [-2logL + kp], where L is the likelihood function, p is the number of parameters in the model, and k is 2 for AIC and log(n) for BIC.

AIC is an estimate of a constant plus the relative distance between the unknown true likelihood function of the data and the fitted likelihood function of the model, so that a lower AIC means a model is considered to be closer to the truth. BIC is an estimate of a function of the posterior probability of a model being true, under a certain Bayesian setup, so that a lower BIC means that a model is considered to be more likely to be the true model. Both criteria are based on various assumptions and asymptotic approximations. Each, despite its heuristic usefulness, has therefore been criticized as having questionable validity for real world data. But despite various subtle theoretical differences, their only difference in practice is the size of the penalty; BIC penalizes model complexity more heavily. The only way they should disagree is when AIC chooses a larger model than BIC.

AIC and BIC are both approximately correct according to a different goal and a different set of asymptotic assumptions. Both sets of assumptions have been criticized as unrealistic. Understanding the difference in their practical behavior is easiest if we consider the simple case of comparing two nested models. In such a case, several authors have pointed out that IC’s become equivalent to likelihood ratio tests with different alpha levels. Checking a chi-squared table, we see that AIC becomes like a significance test at alpha=.16, and BIC becomes like a significance test with alpha depending on sample size, e.g., .13 for n = 10, .032 for n = 100, .0086 for n = 1000, .0024 for n = 10000. Remember that power for any given alpha is increasing in n. Thus, AIC always has a chance of choosing too big a model, regardless of n. BIC has very little chance of choosing too big a model if n is sufficient, but it has a larger chance than AIC, for any given n, of choosing too small a model.

So what’s the bottom line? In general, it might be best to use AIC and BIC together in model selection. For example, in selecting the number of latent classes in a model, if BIC points to a three-class model and AIC points to a five-class model, it makes sense to select from models with 3, 4 and 5 latent classes. AIC is better in situations when a false negative finding would be considered more misleading than a false positive, and BIC is better in situations where a false positive is as misleading as, or more misleading than, a false negative.

So we can conclude that between AIC and BIA for ARDL Lag Selection, one has to think not mere on some references from literature but also a reason to why the model has been proposed.

ARDL Lag Selection using Eviews

So in Eviews, we can proceed in the following to ARDL lag selection for proposed an ARDL model.

  1. Open the time series data in Eviews workfile
  2. Test all variables to be included in the model for unit root to make sure none is I(2) i.e. unit root in first difference or stationary in second difference.
  3. Then estimate the ARDL equation selecting your Dependent variable (DV) and all the independent variables (IVs). Here, we would like to comment down that the stationarity of DV (or DV should be I(0) before estimating the ARDL model is not a must to have condition as has been asked many times. In literature, we can see most of the papers has the same I(1) variables but when we estimate, the model specification itself defines the DV is in difference. So it becomes stationary by default when we estimate an ARDL model using an I(1) variable. One example model has been produced here: Image result for ARDL model (see the full paper here)
  4. Once the ARDL model has been estimated, click on the View menu of the results window.
  5. Click on Model Selection Summary.
  6. Now, we have two options, either to see the ARDL Lag Selection in Table with all the proposed ARDL models with corresponding lag selection criteria like LogL, AIC and BIC etc. Or we can see the Graph of selected models in ranking based on the criteria we chose.
  7. Either way, click on any of the two option, we can find the ARDL Lag Selection criteria and we will be able to determine the best model based on Lower AIC or Lower BIC as we proposed above for when to use AIC or when to use BIC.

I am sure, this simple intro to ARDL Lag Selection will help us in the future to determine a feasible ARDL Model. For more details, enroll to one of our Econometrics Courses here to develop similar critical skills in Applied Econometrics Research.

If you need Assistance in Data Analysis for writing your PhD Thesis or MS Dissertation, hire Top Econometrics Freelancer here.

Event Study Analysis using Stata

To run Event Study Analysis using Stata, copy, paste and edit the following do file and ensure the variable names (returns and market returns are properly named) to run the code on your data. The data should be prepared in Excel using this video to be running the do file with success.


***Note change the folder location path where data is saved
import delimited "F:\Dropbox\Stata Training\Do File For Event Study Analysis using Stata Workshop\data.csv", clear

***Create return series from stock and market prices
gen ret = ln(price[_n] / price[_n-1])
gen market_return = ln(index[_n] / index[_n-1])

****Create Compani Numeric ID
sort name
by name : gen newid = 1 if _n==1
replace newid = sum(newid)
replace newid = . if missing(name)
br newid name

*** Create Event Windows
sort newid date
by newid: gen datenum=_n
gen evwin = 0
by newid: gen target=datenum if date==20080219 | date==20130512
egen td=min(target), by(newid)
drop target
gen dif=datenum-td
by newid: gen event_window=1 if dif>=-7 & dif<=30
egen count_event_obs=count(event_window), by(newid)
by newid: gen estimation_window=1 if dif<-50 & dif>=-300
egen count_est_obs=count(estimation_window), by(newid)
replace event_window=0 if event_window==.
replace estimation_window=0 if estimation_window==.
tab newid if count_event_obs<5
tab newid if count_est_obs<30
drop if count_event_obs < 5
drop if count_est_obs < 30
set more off

* this command just keeps Stata from pausing after each screen of output *
gen predicted_return=.
egen id=group(newid)
* for multiple event dates, use: egen id = group(group_id) *
*note: replace N with the highest value of id *
summ id
local N = r(max)
forvalues i=1(1)`N' {
l id newid if id==`i' & dif==0
reg ret market_return if id==`i' & estimation_window==1
predict p if id==`i'
replace predicted_return = p if id==`i' & event_window==1
drop p

*** Determine the AAR and CAAR

sort id date
gen abnormal_return=ret-predicted_return if event_window==1
bysort id: egen cumulative_abnormal_return = sum(abnormal_return)
sort id date
by id: egen ar_sd = sd(abnormal_return)

***note frequency of 1
gen test =(1/sqrt(385)) * ( cumulative_abnormal_return /ar_sd)
list newid cumulative_abnormal_return test if dif==0

The do file can be better understood with practical demonstration. Request a free demo on here.

Also there is extensive list of resources and literature on Event Study Analysis using Stata. One can begin reading the material here and here.

Stochastic frontier models using Stata

Stochastic Frontier Models using Stata

How to run Stochastic Frontiers Models using Stata has been a core question commonly asked around in addition to the basic question of when to apply Stochastic Frontier Models. In this simple tutorial and related freelance project support, we help our students to run stochastic frontier models using Stata

Read More About Stochastic Frontier Models

Stochastic Frontier Models using Stata

We can estimate stochastic frontier models based on panel data and cross sectional data using Stata. There variety of menu driven tools in Stata and user written ado's in Stata to help us estimate stochastic frontier models. In the following, we describe the most common tools.
Stochastic Frontier Models using Stata

Stochastic Frontier Models for Panel Data

You can request stochastic frontier models using Stata for panel data if you are writing your PhD thesis.

Request SFM using Stata

Stochastic Frontier Models using Stata

Stochastic Frontier Models for Cross Sectional Data

Stata also offers an excellent menu driven system to conduct stochastic frontier modeling for cross sectional data.

Request SFM using Stata

Freelancing and Learning Solutions

AnEc Center for Econometrics Research provides private and instructor led courses in Applied Econometrics, Advanced Econometrics, Financial Econometrics, Statistical Analysis and Quantitative Research. You can also request complete data analysis for thesis writing on freelance terms.

Online Courses

Learn Efficiency Analysis and Stochastic Frontier Models using Stata in a Private and Instructor Led course in Econometrics

Enroll Now

Freelance Offers

Get your data analysed for stochastic frontier models and thesis written with the help of our top econometrics freelancers and consultants

Hire Freelancer


Request for a private and customized online training through Virtual Classroom from AnEc Center for Econometrics Research

Get Trained

Examples of SFM Application

Stochastic Frontier Models are applied to investigate issues in production efficiency, cost efficiency and input-output relationship in industrial and organizational setups. The application of SFM provides excellent feedback and insights on related factors and hence entities can develop a strong policy to provide tools and policy suggestions for future growth..

Request Analysis of Data using Stochastic Frontier Models

Firm's production efficiency
Firm's cost efficient
Regional trade efficiency
Technical efficiency
Efficiency in services sectors

Fees and Charges for Research Consultancy

AnEc Center for Econometrics Research offers private and instructor led courses and research consultancy on freelance basis in the area of Data Analysis and Writing of Results Chapters for Thesis and Research Papers

Online Courses

Providing private and instructor led online courses in Applied Econometrics, Applied Statistics, Quantitative Research and Applied Economics to students in Economics and Finance

Enroll Now

Freelance Data Analysis

Producing complete results from your data using Stochastic Frontier Models using Stata and other softwares to help you write an effective research report, PhD Thesis or MSc Dissertation

Hire Freelancer

Training and Workshop

Creating completely customized learning solutions for students, faculty and researchers in using Stata, Eviews, R, Matlab, RATS or GAUSS for Data Analysis and Research Report Writing

Request Workshop

Testimonials and Feedback

Our students and clients provide regular feedback and remarks on our courses and freelance projects. Read some of the feedback to help you decide easily about quality of our work done.
AnEc freelancers are really top of the list. They work excellent and provide complete solutions. Only problem is to get them start due to the selected areas of work they do in Econometrics. I recommend they should extend their services to writing and editing as many students face problems in writing as well. Overall, I feel 100% satisfied with the work they did for me.
Students Feedback
Jamal ud Din
PhD Student in Economics, UK
Anees worked well. He provided complete data analysis and results writing in given time. Highly recommended freelancers.
Shahid Hakan
PhD Student in Economics, UK
Econometrics is a touch subject for many students in Economics and Finance. I found the explanation by Prof. Anees easy to understand and follow for application to my research project. I am happy to have him as my research mentor.
Javed Barkat
PhD Student in Finance, Germany

Logistic Regression Models

Logistic regression models and other econometric methods for estimating relationships and causal linkages between a categorical dependent variable and independent variables are important tools and techniques to explore variety of social and economic conditions. In this simple tutorial, we will explain to show when to use a given regression model for a given situation with Stata as a main software and sometimes Eviews.

Read More About Logistic Regression Models

Logistic Regression Model

Logistic Regression Model is applied to analyse the effect of some independent variables on the likelihood of favorable outcome against the non-favorable in a binary dependent variable (binomial and dichotomous variable). It means when the dependent variable in a regression model has two possible values like (Yes, No) or (1 , 0), then we use Logistic Regression. The logistic regression model can be used to estimate the effect of independent variables on the likelihood of outcomes in form of coefficients or odds ratio. The reporting objective of coefficients and odds ratio are different in nature, though.

Enroll for Econometrics Courses

Using Stata for Logistic Regression Models
Click File, Open and locate to open data file
Click on Statistic, Click on Binary outcomes
To estimate coefficients, Click on Report Outcomes
To estimate Odds Ratios, click on Odds Ratio
Follow the windows, and select DepVar and IndepVars
Click on options, select any and click on OK to finish.

Multinomial Logistic Regression

Multinomial Logistic Regression Models is used to analyse the effect of some independent variables on the likelihood of favorable outcome against the non-favorable in a categorical dependent variable (multiple categories where each category is taken as a binary outcome). The multinomial logistic regression model can be used to estimate the effect of independent variables on the likelihood of outcomes in form of coefficients or odds ratio. The reporting objective of coefficients and odds ratio are different in nature, though.

Enroll for Econometrics Courses

Use Stata for Multinomial Logistic Regression Models
Click File, Open and locate to open data file
Click on Statistic, Click on Categorical outcomes
Then click on Multinomial Logistic Regression
To estimate, select the DepVar and IndepVar
Click on Options, select relevant options
To finish the estimation, click on OK to finish.

Ordered Logistic Regression

Ordered Logistic Regression Models is used to analyse the effect of some independent variables on the likelihood of favorable outcome against the non-favorable in a categorical dependent variable where the categories can be ordered. The ordered logistic regression model is used to estimate the effect of independent variables on the likelihood of outcomes where the odds of occurrences for outcomes is rank-able. The reporting objective of coefficients and odds ratio are different in nature, though but the estimation itself remains very important.

Enroll for Econometrics Courses

Using Stata for Ordered Logistic Regression Models
Click File, Open and locate to open data file
Click on Statistic, Click on Ordinal outcomes
Then click on Ordered Logistic Regression
To estimate, select the DepVar and IndepVar
Click on Options, select relevant options
To finish the estimation, click on OK to finish.

Panel Data Logistic Regression Models

Panel Data Logistic Regression Models is used to analyse the effect of some independent variables on the likelihood of favorable outcome against the non-favorable in a binary or binomial variable . The ordered logistic regression model for panel data is used to estimate the effect of independent variables on the likelihood of outcomes on a panel structure of data. The reporting objective of coefficients and odds ratio are different in nature, though but the estimation itself remains very important because the variations are over t across i..

Enroll for Econometrics Courses

Using Stata for Ordered Logistic Regression Models
Click File, Open and locate to open data file
Click on Statistic, Click on Longitudinal/panel data
Then click on Binary Outcomes
Then click on Logistic Regression (FE, RE etc)
Select your depvar and indepvar and other options
To finish the estimation, click on OK.
Enroll for Econometrics using Stata
Our recommended Logistic Regression Models book is by Joseph Hilbe. You can buy it here. We will provide a copy in PDF directly rented for you from the publisher for use in the courses. You can enroll for our courses here.