IPUMS 4.0 Released!

April 11, 2008

We are very pleased to announce IPUMS 4.0, the first major revision of the IPUMS files since 2004. For many of the files, these are the first improvements to the data since their original production in the 1990s. The new release includes many new samples, new variables, and dataset improvements.

IPUMS 4.0 includes the following 14 previously un-available samples:

  • 1880 U.S. census: Preliminary 5% sample with minority oversamples
  • 1910 Puerto Rico census: 12% sample
  • 1920 Puerto Rico census: 12% sample
  • 1970 Puerto Rico census: 1% State sample, 1% Municipio sample, and 1% Neighborhood sample
  • 1980 Puerto Rico census: 5% State sample and 1% Metro sample
  • 1990 Puerto Rico census: 5% State sample and 1% Metro sample
  • 2000 Puerto Rico census: 5% sample and 1% sample
  • 2005 Puerto Rico Community Survey: 1% sample
  • 2006 Puerto Rico Community Survey: 1% sample

These new datasets include approximately fifty new variables and hundreds of variables that have been integrated with all other IPUMS samples. Additional information on all new samples is available via the sample descriptions page and the sample designs page.

In addition to these new datasets, all existing files from the U.S. censuses of 1850-1930 underwent a series of improvements resulting in a small number of corrected values, the addition and subtraction of small numbers of cases, and the addition of numerous variables.

Improvements and additions to IPUMS 4.0 files include:

  • All IPUMS 4.0 samples from 1850-1930 now draw on a single set of integrated "dictionaries" that translate original alphabetic census entries into numeric codes. In previous versions of the IPUMS, each sample had its own unique set of dictionaries specifying the correspondence between each alphabetic string and a numeric IPUMS code. These various dictionaries have been integrated into a single modern database, correcting numerous small errors and inconsistencies in the process.
  • A new long-term Hispanic identification variable going back to 1850 (HISPAN). This variable uses methods in a frequently-cited 2000 article to identify Hispanic persons prior to 1980, when the Census Bureau first began to include an explicit question on Hispanic origin.
  • A battery of new constructed socioeconomic status variables. These 10 new variables provide alternatives to IPUMS OCCSCORE and SEI variables.
  • Variables containing original strings for occupation (OCCSTR) and industry (INDSTR). Our other occupation and industry variables reduce millions of occupational responses to one of several hundred categories (see OCC, OCC1950, IND, and IND1950). The OCCSTR and INDSTR variables permit researchers to see respondents' actual alphabetic response to the occupation and industry questions.
  • New detailed weight variables for the historical samples (HHWTDET and PERWTDET). Like HHWT and PERWT, these variables are used to create accurate nationally-representative statistics. Whereas HHWT and PERWT are benchmarked on published population counts for the entire country, HHWTDET and PERWTDET are benchmarked on published population counts for each county. The precision available in HHWTDET and PERWTDET will be particularly useful for studies of small geographic areas or of oversampled population subgroups. UPDATE: The detailed weights formerly contained in HHWTDET and PERWTDET are now contained in the standard HHWT and PERWT variables.
  • New standardized low-level geography variables (MCD and INCORP). These variables provide the lowest-level geographic identifiers that are consistently avaiable from 1850-1930.
  • STRATA and CLUSTER variables to permit Taylor series linear approximation corrections of the complex sample design characteristics in the IPUMS.
  • A consistent single-race identification variable from 1850-2012 (RACESING). Census and American Community Survey forms from 2000 onward allow respondents to select multiple racial categories. This presents serious problems for long-term analysis. RACESING uses new methods of assigning a single race to multi-racial people, and then codes all race responses from all years into a simple, historically compatible scheme.

The most recent version of IPUMS 3.0 data and documentation is still available via the IPUMS archive page at ICPSR. The archive page permits users to revise old extracts, create new extracts, and download data and documentation. The link titled "IPUMS-USA website as of March, 2008" leads to a fully-functioning mirror of the IPUMS website as it existed prior the release of IPUMS 4.0. The archive page contains versions of the website from previous years as well.