The Bell Curve Page

Last updated: July 6, 1998/5 Feb., 2007. URL: http://rasmusen.org/pacioli/bellcurve/bellcurve.htm Administered by Eric Rasmusen, [email protected], Kelley School of Business, Indiana University, BU 456, 1309 East Tenth Street, Bloomington, Indiana 47405-1701, (812)855-9219. To return to the Rasmusen homepage, click on: Rasmusen Home Page.

Charles Murray has provided the data he used in the analysis in his book, The Bell Curve . The actual data files are in the low megabyte range, and are available in two formats. Please note that the data includes weights for each observation, because the survey from which it comes sampled different groups with different weights. Also, some of the data is in the form of `z-scores', which means that is is measured as standard deviations away from a mean of zero.

If you encounter problems reading the data, please let me know. I probably can't help, since I haven't been using this data in the past few years, but you never know. If you solve your problem, let me know about that too, so I can post the solution.

Data Files From Charles Murray, in His Format

Files are saved as Macintosh text files with labels, tab indicating end of field, and CR indicating end of line.

NATION.TXT, which has 12,686 cases and 50 variables. This file includes variables scored for all NLSY subjects, one line per subject. Size: 3.112MB.
CHILD1.TXT, which has 8,513 cases and 26 variables. Variables scored for all NLSY children, representing one case per child for whom data were available through SY90. Size: 1.312MB.
CHILD2.TXT, which has 17,040 cases and 40 variables. Each case represents one child for one test year. A given child may therefore be represented in up to three cases. TY=test year. Percentiles on the developmental and behavioral indicators all represent within-gender percentiles. Size: 3.040MB.
WOMEN.TXT, which has 6,283 cases and 28 variables. Variables scored for all women in the NLSY (one case per subject). Size: 1.032MB.
The Documentation is available in a number of forms. The original is a 51K file, 1TBC_Documentation5.rtf. You can get the same thing in Word in 33K at 3TBC_Documentation.doc, or in Ascii in 25K at 2TBC_Documentation.ascii. Finally, you can get a 15K version describing just the NATION variables at 1TBC_Nation.Documentation5.rtf.

Data Files As I Modified Them, with Commas Separating the Entries

I used the EXCEL spreadsheet to change the format into one I could use more easily, and made a few other small changes. The output are csv files, with each entry separated by a comma.

nation.csv, which has 12,686 cases and 50 variables. This 2.980M file includes variables scored for all NLSY subjects, one line per subject. The variable names are listed in the file, nation.hdr.
child1.csv, a 1.201M file. The variable names are listed in a 278-byte file, child1.hdr.
child2a.csv and child2b.csv, 1.667M and 1.265M files. The variables names seem to have gone astray since 1996. They should be listed in child2.hdr, but they are not.
women.csv, a 952K file. The variable names are listed in a 291-byte file, woman.hdr.

Some Regression Files Using STATA

I like the STATA program very much, and here include some input and output files using the data above.

bell2a.do, an input file using nation.csv. The output from this is the log file, bell2a.log, which has regression results, and a STATA data file, nation1.dta, which has a subset of the nation.txt variables in the condensed STATA format.
bell2.do, a 3K input file using nation.csv. The output from this is the 13K file, bell2.log. This do-file wasn't working in May 2002--my present version of STATA, STATA 7.0, says there is not room enough for all the observations.
jan6c.do, a 2K input file using nation.csv. The output from this is the 2K file, jan6c.log.