Friday, October 17, 2008

 

Wald, LR, and Score Tests

From Cornell, "Econ 620: Three Classical Tests; Wald, LM(Score), and LR tests" is a good description of the Wald, likelihood ratio, and score tests. The Hausman test seems more like an LR test, since it estimates both the restricted and unrestricted equations. I found the statalist post below on the Wald test for exogeneity of regressors:
This test is mentioned along with the theory behind -ivprobit- in Wooldridge's "Econometric Analysis of Cross Section and Panel Data" (2002, pp. 472-477). For the maximum likelihood variant with a single endogenous variable, the test is simply a Wald test that the correlation parameter rho is equal to zero. That is, the test simply asks whether the error terms in the structural equation and the reduced-form equation for the endogenous variable are correlated. If there are multiple endogenous variables, then it is a joint test of the covariances between the k reduced form equations' errors and the structural equation's error. In the two-step estimator, in the second stage we include the residuals from the first-stage OLS regression(s) as regressors. The Wald test is a test of significance on those residuals' coefficients.

Labels:

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Saturday, October 11, 2008

 

Conditional Logit

I was trying to understand how conditional logit and fixed effects in multinomial logit worked, to explain to someone who asked, and I failed. Greene's text was not very helpful. The best thing I found was some notes from Penn: "Conditional Logistic Regression (CLR) for Matched or Stratified Data". The bottom line seems to be that conditional logit (clogit in Stata) chooses its parameter estimates to maximize the likelihood of the variation we see within the strata, while ignoring variation across strata. Thus, if we have data on 30 people choosing to travel by either car or bus over 200 days, we could use 30 dummies for the people, but in conditional logit we don't. Also, in conditional logit, unlike logit with dummies, if someone always travels by car instead of varying, that person is useless to the estimation.

Labels:

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Tuesday, June 24, 2008

 

Ratio Variables in Regressions

I was reading Gibbs and Firebaugh (Criminology, 1990) on ratio variables in regressions. Suppose you regress Arrests/Crime on Crimes/population using city-by-city data, and in fact there is no causal connection. Will they be negative correlated anyway, since CRIMES is in both variables?

No, so long as all relevant control variables are in the regression. Here is a way to see it. Suppose we regress 1/Crime on Crimes/Population. Suppose too, that Crime and Crimes/Population are uncorrelated--- that bigger cities do not have a higher crime rate. Then 1/Crime and Crimes/Population will be uncorrelated.

If, of course, bigger cities do have higher crime rates, then 1/Crime and Crimes/Population will be correlated, but if we suspect that to be true, then in our original regression we should have regressed Arrests/Crime on not only Crimes/Population but on the control variable Crimes.

There is some issue of measurement error-- of false correlation arising if Crime has measurement error. Then we are regressing Arrests/(Crime+Error) on (Crime+Error)/Population. I think if we use (Crime +Error) as a control variable that will fix the problem, though.

Labels: ,

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Thursday, May 8, 2008

 

NASA's Temperature Data Adjustments

Too little attention has been given to the news last August that NASA had made a year-2000 mistake in calculating US temperatures, a mistake that meant the temperatures after 2000 were all too high. Details are at Coyote Blog. The mistake was in the adjustment NASA makes for the fact that if a weather station's location become urban, the temperature rises because cities are always hotter. What is more important than the mistake itself are that (1) NASA very quietly fixed its data without any indication to users that it had been wrong earlier. (2) NASA's adjustment is by a secret method it refuses to disclose to outsiders. (3) NASA's adjustment appears (hard to say since it's kept secret) to both adjust "bad" stations (the ones in cities) down and "good" stations (the ones that read accurately) up, on the excuse of some kind of smoothing of off-trend stations. (4) The NASA people doing the adjustment are not statisticians. (5) It isn't clear what, if any, adjustment is made to weather station data from elsewhere in the world. The US has some of the best data, and there seems to be no warming trend in the US.

Labels: ,

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Wednesday, January 9, 2008

 

Elasticities in Regressions. (update of old post)Here are how to calculate elasticities from regression coefficients, a note possibly useful to economists who like me keep having to rederive this basic method:
  1. The elasticity is (%change in Y)/(%change in X) = (dy/dx)*(x/y).
  2. If y = beta*x then the elasticity is beta*(x/y).
  3. If y = beta* log(x) then the elasticity is (beta/x)*(x/y) = beta/y.
  4. If log(y) = beta* log(x) then the elasticity is (beta*y/x)*(x/y) = beta, which is a constant elasticity.
    (reason: then y= exp(beta*log(x)), so dy/dx = beta*exp(beta*log(x))*(1/x) = beta*y/x.)
  5. If log(y) = beta*x then the elasticity is (beta* y )*(x/y) = beta*x.
    (reason: then y = exp(beta*x), so dy/dx = beta*exp(beta*x) = beta*y.)

  6. If log(y) = alpha + beta*D, where D is a dummy variable, then we are interested in the finite jump from D=0 to D=1, not an infinitesimal elasticity. That percentage jump is

    dy/y = exponent(beta)-1,

    because log(y,D=0) = alpha and log(y, D=1) = alpha + beta, so

    (y,D=1)/(y, D=0) = exp(alpha+beta)/exp(alpha) = exp(beta)

    and

    dy/y = (y,D=1)/(y, D=0) -1 = exp(beta)-1

    This is consistent, but not unbiased. We know that OLS is BLUE, unbiased, as an estimator of the impact of the dummy D on log(Y), but that does not imply that it is unbiased as an estimator of the impact of D on Y. That is because E(f(z)) does not equal f(E(z)) in general and that ultimate effect of D on y, exp(beta)-1, is a nonlinear function of beta. Alexander Borisov pointed out to me that Peter Kennedy (AER, 1981) suggests using exp(betahat-vhat(betahat)/2)-1 as an estimate of the effect of going from D=0 to D=1, as biased, but less biased, and also consistent .

Labels: ,

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Saturday, October 13, 2007

 

Partial Identification and Chi-Squared Tests

I heard Adam Rosen give his paper, "Confidence sets for partially identified parameters that satisfy a finite number of moment inequalities." It stimulated some thoughts. (Click here to read more.)

Labels: ,

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Wednesday, October 10, 2007

 

An Umbrella with a Drip Case

I brought this umbrella back from Taipei. It has a case to prevent dripping from the wet umbrella onto the floor when it is folded up. The case opens automatically when you open the umbrella, telescoping down into a little cap on top of the umbrella.

Labels:

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

 

A Coin Flip Example for Intelligent Design

1. Suppose we come across a hundred bags of 20-chip draws from hundred different urns. Each bag contains 20 red chips. We naturally deduce that the urns contain only red chips. (Click here to read more.)

Labels: , , ,

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Monday, October 8, 2007

 

Case Control Studies and Repeated Sampling

A standard counterintuitive result in statistics is that if the true model is logit, then it is okay to use a sample selected on the Y's, which is what the "case-control method" amounts to. You may select 1000 observations with Y=1 and 1000 observations with Y=0 and do estimation of the effects of every variable but the constant in the usual way, without any sort of weighting. This was shown in Prentice & Pyke (1979). They also purport to show that the standard errors may be computed in the usual way--- that is, using the curvature (2nd derivative) of the likelihood function. (Click here for more)

Labels: , , ,

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Thursday, October 4, 2007

 

Is Not Necessarily Equal To

At lunch at Nuffield I was just asking MM about some math notation I'd like: a symbol for "is not necessarily equal to". For example, and economics paper might show the following:

Proposition: Stocks with equal risks might or might not have the same returns. In the model's notation, x IS NOT NECESSARILY EQUAL TO y.

Click here to read more

Labels: , , ,

 

To view the post on a separate page, click: at (the permalink). 4 Comments Links to this post

Tuesday, October 2, 2007

 

Bayesian vs. Frequentist Statistical Theory: George and Susan

Susan either likes George or dislikes him. His prior belief is that there is a 50% chance that she likes him. He also believes that if she does, there is an 80% chance she will smile at him, and if she does not, there is a 60% chance. She smiles at him. What should he think of that?

The Frequentist approach says that George should choose the answer which has the greatest likelihood given the data, and so he should believe that she likes him.Click here to read more

Labels: , ,

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Friday, September 28, 2007

 

Weighted Least Squares and Why More Data is Better

<p>In doing statistics, when should we weight different observations differently?<p>

Suppose I have 10 independent observations of $x$ and I want to estimate the population mean, $\mu$. Why should I use the unweighted sample mean rather than weighting the first observation .91 and each of the rest by .01?<p>

Either way, I get an unbiased estimate, but the unweighted mean gives me lower variance of the estimator. If I use just observation 1 (a weight of 100% on it) then my estimator has the variance of the disturbance. If I use two observations, then a big positive disturbance on observation 1 might be cancelled out by a big negative on observation 2. Indeed, the worst case is that observation 2 also has a big positive disturbance, in which case I am no worse off by having it. I do not want to overweight any one observation, because I want mistakes to cancel out as evenly as possible.<p>

All this is completely free of the distribution of the disturbance term. It doesn't rely on the Central Limit Theorem, which says that as $n$ increases then the distribution of the estimator approaches the normal distribution (if I don't use too much weighting, at least!).<p>

If I knew that observation 1 had a smaller disturbance on average, then I *would* want to weight it more heavily. That's heteroskedasticity. <p>

Labels:

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

Tuesday, September 25, 2007

 

Asymptotics

Page 96 of David Cox’s 2006 Principles of Statistical Inference has a very nice one-sentence summary of asymptotic theory:

[A]pproximations are derived on the basis that the amount of information is large, errors of estimation are small, nonlinear relations are locally linear and a central limit effect operates to induce approximate normality of log likelihood derivatives.

Labels:

 

To view the post on a separate page, click: at (the permalink). 0 Comments Links to this post

 

Bayesian vs. Frequentist Statistical Theory

The Frequentist view of probability is that a coin with a 50% probability of heads will turn up heads 50% of the time.

The Bayesian view of probability is that a coin with a 50% probabilit of heads is one on which a knowledgeable risk-neutral observer would put a bet at even odds.

The Bayesian view is better.

When it comes to statistics, the essence of the Frequentist view is to ask whether the number of heads that shows up in one or more trials is probable given the null hypothesis that the true odds in any one toss are 50%.

When it comes to statistics, the essence of the Bayesian view is to estimate, given the number of number of heads that shows up in one or more trials and the observer’s prior belief about the odds, the probability that the odds are 50% versus the odds being some alternative number.

I like the frequentist view better. It’s neater not to have a prior involved.

Labels:

 

To view the post on a separate page, click: at (the permalink). 1 Comments Links to this post