Thursday, May 8, 2008

 

NASA's Temperature Data Adjustments

Too little attention has been given to the news last August that NASA had made a year-2000 mistake in calculating US temperatures, a mistake that meant the temperatures after 2000 were all too high. Details are at Coyote Blog. The mistake was in the adjustment NASA makes for the fact that if a weather station's location become urban, the temperature rises because cities are always hotter. What is more important than the mistake itself are that (1) NASA very quietly fixed its data without any indication to users that it had been wrong earlier. (2) NASA's adjustment is by a secret method it refuses to disclose to outsiders. (3) NASA's adjustment appears (hard to say since it's kept secret) to both adjust "bad" stations (the ones in cities) down and "good" stations (the ones that read accurately) up, on the excuse of some kind of smoothing of off-trend stations. (4) The NASA people doing the adjustment are not statisticians. (5) It isn't clear what, if any, adjustment is made to weather station data from elsewhere in the world. The US has some of the best data, and there seems to be no warming trend in the US.

Labels: ,

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

Wednesday, January 9, 2008

 

Elasticities in Regressions. (update of old post)Here are how to calculate elasticities from regression coefficients, a note possibly useful to economists who like me keep having to rederive this basic method:
  1. The elasticity is (%change in Y)/(%change in X) = (dy/dx)*(x/y).
  2. If y = beta*x then the elasticity is beta*(x/y).
  3. If y = beta* log(x) then the elasticity is (beta/x)*(x/y) = beta/y.
  4. If log(y) = beta* log(x) then the elasticity is (beta*y/x)*(x/y) = beta, which is a constant elasticity.
    (reason: then y= exp(beta*log(x)), so dy/dx = beta*exp(beta*log(x))*(1/x) = beta*y/x.)
  5. If log(y) = beta*x then the elasticity is (beta* y )*(x/y) = beta*x.
    (reason: then y = exp(beta*x), so dy/dx = beta*exp(beta*x) = beta*y.)

  6. If log(y) = alpha + beta*D, where D is a dummy variable, then we are interested in the finite jump from D=0 to D=1, not an infinitesimal elasticity. That percentage jump is

    dy/y = exponent(beta)-1,

    because log(y,D=0) = alpha and log(y, D=1) = alpha + beta, so

    (y,D=1)/(y, D=0) = exp(alpha+beta)/exp(alpha) = exp(beta)

    and

    dy/y = (y,D=1)/(y, D=0) -1 = exp(beta)-1

    This is consistent, but not unbiased. We know that OLS is BLUE, unbiased, as an estimator of the impact of the dummy D on log(Y), but that does not imply that it is unbiased as an estimator of the impact of D on Y. That is because E(f(z)) does not equal f(E(z)) in general and that ultimate effect of D on y, exp(beta)-1, is a nonlinear function of beta. Alexander Borisov pointed out to me that Peter Kennedy (AER, 1981) suggests using exp(betahat-vhat(betahat)/2)-1 as an estimate of the effect of going from D=0 to D=1, as biased, but less biased, and also consistent .

Labels: ,

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

Saturday, October 13, 2007

 

Partial Identification and Chi-Squared Tests

I heard Adam Rosen give his paper, "Confidence sets for partially identified parameters that satisfy a finite number of moment inequalities." It stimulated some thoughts. (Click here to read more.)

Labels: ,

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

Wednesday, October 10, 2007

 

An Umbrella with a Drip Case

I brought this umbrella back from Taipei. It has a case to prevent dripping from the wet umbrella onto the floor when it is folded up. The case opens automatically when you open the umbrella, telescoping down into a little cap on top of the umbrella.

Labels:

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

 

A Coin Flip Example for Intelligent Design

1. Suppose we come across a hundred bags of 20-chip draws from hundred different urns. Each bag contains 20 red chips. We naturally deduce that the urns contain only red chips. (Click here to read more.)

Labels: , , ,

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

Monday, October 8, 2007

 

Case Control Studies and Repeated Sampling

A standard counterintuitive result in statistics is that if the true model is logit, then it is okay to use a sample selected on the Y's, which is what the "case-control method" amounts to. You may select 1000 observations with Y=1 and 1000 observations with Y=0 and do estimation of the effects of every variable but the constant in the usual way, without any sort of weighting. This was shown in Prentice & Pyke (1979). They also purport to show that the standard errors may be computed in the usual way--- that is, using the curvature (2nd derivative) of the likelihood function. (Click here for more)

Labels: , , ,

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

Thursday, October 4, 2007

 

Is Not Necessarily Equal To

At lunch at Nuffield I was just asking MM about some math notation I'd like: a symbol for "is not necessarily equal to". For example, and economics paper might show the following:

Proposition: Stocks with equal risks might or might not have the same returns. In the model's notation, x IS NOT NECESSARILY EQUAL TO y.

Click here to read more

Labels: , , ,

 

To view the post on a separate page, click at (the permalink). 4 Comments Links to this post

Tuesday, October 2, 2007

 

Bayesian vs. Frequentist Statistical Theory: George and Susan

Susan either likes George or dislikes him. His prior belief is that there is a 50% chance that she likes him. He also believes that if she does, there is an 80% chance she will smile at him, and if she does not, there is a 60% chance. She smiles at him. What should he think of that?

The Frequentist approach says that George should choose the answer which has the greatest likelihood given the data, and so he should believe that she likes him.Click here to read more

Labels: , ,

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

Friday, September 28, 2007

 

Weighted Least Squares and Why More Data is Better

<p>In doing statistics, when should we weight different observations differently?<p>

Suppose I have 10 independent observations of $x$ and I want to estimate the population mean, $\mu$. Why should I use the unweighted sample mean rather than weighting the first observation .91 and each of the rest by .01?<p>

Either way, I get an unbiased estimate, but the unweighted mean gives me lower variance of the estimator. If I use just observation 1 (a weight of 100% on it) then my estimator has the variance of the disturbance. If I use two observations, then a big positive disturbance on observation 1 might be cancelled out by a big negative on observation 2. Indeed, the worst case is that observation 2 also has a big positive disturbance, in which case I am no worse off by having it. I do not want to overweight any one observation, because I want mistakes to cancel out as evenly as possible.<p>

All this is completely free of the distribution of the disturbance term. It doesn't rely on the Central Limit Theorem, which says that as $n$ increases then the distribution of the estimator approaches the normal distribution (if I don't use too much weighting, at least!).<p>

If I knew that observation 1 had a smaller disturbance on average, then I *would* want to weight it more heavily. That's heteroskedasticity. <p>

Labels:

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

Tuesday, September 25, 2007

 

Asymptotics

Page 96 of David Cox’s 2006 Principles of Statistical Inference has a very nice one-sentence summary of asymptotic theory:

[A]pproximations are derived on the basis that the amount of information is large, errors of estimation are small, nonlinear relations are locally linear and a central limit effect operates to induce approximate normality of log likelihood derivatives.

Labels:

 

To view the post on a separate page, click at (the permalink). 0 Comments Links to this post

 

Bayesian vs. Frequentist Statistical Theory

The Frequentist view of probability is that a coin with a 50% probability of heads will turn up heads 50% of the time.

The Bayesian view of probability is that a coin with a 50% probabilit of heads is one on which a knowledgeable risk-neutral observer would put a bet at even odds.

The Bayesian view is better.

When it comes to statistics, the essence of the Frequentist view is to ask whether the number of heads that shows up in one or more trials is probable given the null hypothesis that the true odds in any one toss are 50%.

When it comes to statistics, the essence of the Bayesian view is to estimate, given the number of number of heads that shows up in one or more trials and the observer’s prior belief about the odds, the probability that the odds are 50% versus the odds being some alternative number.

I like the frequentist view better. It’s neater not to have a prior involved.

Labels:

 

To view the post on a separate page, click at (the permalink). 1 Comments Links to this post