Inaccurate Age and Sex Data in the Census PUMS Files

In April 2009, the U.S. Census Bureau acknowledged problems in the techniques it uses to prevent the identification of specific individuals in public-use microdata sample (PUMS) files. These techniques produced inconsistent sex ratios for people ages 65 and older in the PUMSs produced from the 2000 Census, the 2003-2006 ACS, and the 2005-2006 PRCS. These errors carried through to the 2005 and 2006 cases in the 2005-2007 3-year ACS/PRCS.

The Census Bureau released corrected age data for 2006 in December 2009 (ACS Erratum #50) and for the other samples in December 2010 (ACS Erratum #65). These corrected data are now available via the IPUMS.

The IPUMS versions of these samples now contain the correct data in the AGE variable, and the original (incorrect) data are contained in the AGEORIG variable. Almost all AGE values 65 and over have changed, suggesting that the revised data were imputed or calculated with synthetic data methods; QAGE has been set to 4 (allocated) for all persons age 65 and up.

Uncorrected errors in the person identifiers of the original Census Bureau 1 percent samples for the United States and Puerto Rico prevented the revised data from being linked to the original data, and revised 2000 1 percent samples for the United States and Puerto Rico have now replaced the old files entirely. The original files with erroneous age values remain available for testing purposes.

For a full discussion of the problem and its implications for researchers, see:

Alexander, J. Trent, Michael Davern, and Betsey Stevenson. 2010. "Inaccurate Age and Sex Data in the Census PUMS Files: Evidence and Implications." National Bureau of Economic Research Working Paper No. 15703.