DATA
Create an Extract
Download or Revise Extracts
Analyze data online
Register as a New User

DOCUMENTATION
FAQ
User's Guide
Variables
Samples

RESOURCES
  Enumeration Forms
Published Census Volumes
Revision History

RESEARCH
Citation and Use
Bibliography
Related Sites

CONTACT US
  Feedback
IPUMS Staff
How to Help

Frequently Asked Questions on the ACS/PRCS Multi-year files

What are the ACS/PRCS multi-year files?

Do the multi-year files simply join the previously-released one-year files?

Should I continue using the single-year files for anything?

Is the IPUMS data any different from what the Census Bureau is providing?

Where can I get better geographic identifiers?

What are the ACS/PRCS multi-year files?

Since 2008, the Census Bureau has released PUMS files from the American Community Survey and Puerto Rican Community Survey that cover multiple years of data.  The 2005-2007 ACS/PRCS 3-year file, released in early 2009, combines previously-released single-year files from 2005, 2006, and 2007. The 2006-2008 ACS/PRCS 3-year file, released at the end of 2009, combines previously-released single-year files from 2006, 2007, and 2008. The single-year files each documented 1% of the population. The 3-year files combine three single-year files, documenting a total of 3% of the population.


Starting in 2010, the Census Bureau will begin releasing 5-year files. The first will be the 2005-2009 ACS/PRCS 5-year file, which will combine previously-released single-year files from 2005, 2006, 2007, 2008, and 2009.  This will document a total of 5% of the population.

This Census Bureau page explains the data cycle and the projected release dates. (Note that this describes the aggregate estimates, not the PUMS files included in the IPUMS database. While the release dates are the same, the column labeled "Population Size of Area" does not apply to PUMS data.)

Do the multi-year files simply join the previously-released one-year files?

While the multi-year files contain all cases from the previously-released single-year files, they do differ in important ways:

  • Weighting variables now yield estimates for the entire 3-year period. This includes PERWT, HHWT, REPWT, and REPWTP.
  • All income and dollar-amount variables are inflated to dollars for the last data year (i.e., 2007 dollars for the 2005-2007 3-year file; 2008 dollars for the 2006-2008 3-year file; and so on).
  • Except for the final year of data, topcodes in the multi-year file differ from those in the single-year files. For some high-income cases, previously topocoded values are no longer topcoded.
  • In the 2005-2007 3-year file:
    • three New Orleans PUMAs, separately identifiable in the 2005 single-year data, were collapsed in the combined file to match the post-Hurricane Katrina geography of 2006 and 2007 (see PUMA).
    • variables on second residence were not available in the 2006 and 2007 single-year files and are not available in the 3-year file (these include SECRES, SECRESMO, and SECRESRE).
    • AGE in the 2005 and 2006 cases is inaccurate because of problems with the Census Bureau's disclosure methods. The 2005 single-year cases are likewise inaccurate, but the 2006 single-year file now contains a revised AGE variable that appears to have been created using synthetic data techniques. The original inaccurate AGE variable for the 2006 single-year file, identical to AGE for the 2006 cases in the 2005-2007 3-year file, is available as AGEORIG06.
  • In the 2006-2008 3-year file:
    • Variables that were available only in 2008 (some disability variables, marital history variables, and health insurance variables) or whose content changed so much between 2007 and 2008 that they should not be compared across time (other disability variables) are not available.
    • Several coding schemes changed between the 2007 and 2008 single-year files (most notably education, house value, and the number of rooms/bedrooms in the house). These have been set to the least detailed scheme for all years. Other coding scheme changes for 2008 have been wrapped into the 2006-2007 coding scheme (e.g., the ancestry category "Germanic", available for the first time in the 2008 single-year ACS/PRCS, has been coded as "German" for all cases in the 2006-2008 3-year ACS/PRCS).
    • Two cases in the 2006 single-year data and three cases in the 2007 single-year data had IND codes of 669; in the 2006-2008 3-year file, these cases are 669.

In sum, although the multi-year files sometimes contains reduced information relative to the single-year files, they are ideal for researchers who are studying small populations (provided that the characteristics of interest are reasonably stable in their level over the three-year period).

Should I continue using the single-year files for anything?

Yes, those are still the best files for studying change over time between 2005 and 2007. The weights from the single-year files are calibrated to provide population estimates during each year. The weights from the 3-year file are calibrated to provide 3-year estimates.

It would be possible to use the 3-year file's YEAR variable to separate cases by year to study change over time. This is not recommended. The best way to study change over the 2005-2007 period is to use the single-year files.

This Census Bureau report is a helpful summary of the issues that arise in deciding whether to use single-year or multi-year data, as well as differences in interpretation between the two types of data.

Is the IPUMS data any different from what the Census Bureau is providing?

Yes, there are several differences. The IPUMS data harmonizes variables as closely as possible with previous data releases; this often results in new variable names, codes, and labels. Variables reporting dollar values have been pre-standardized to constant dollars; original PUMS values are not adjusted, and users must apply the Census Bureau-provided adjustment factor manually. (Please note that the Census Bureau's adjustment factor for the multi-year files includes two components: the CPI-U factor that inflates dollar values to the final year of the sample, and a smaller adjustment factor that addresses the varying reference periods in the survey responses. The IPUMS pre-adjustment applies only the CPI-U factor; the latter factor is available as a separate download in ADJUST. For more information, see the note on the standardization of ACS dollar amounts.)

IPUMS also provides many variables describing geography and family interrelationships that are not available in the original census files. Geographic variables unique to the IPUMS include CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS.   Furthermore, in-laws are not completely classified in the ACS/PRCS, and subfamily variables appear to be flawed. IPUMS applies programming that provides more detail on in-laws and more consistent subfamily identifiers.

Where can I get better geographic identifiers?

The lowest unit of geography in the microdata files is still the PUMA. PUMAs contain at least 100,000 people. Aggregate data (but not microdata) is currently available from the Census Bureau for geographic areas as small as 20,000 people. In 2010, the Census Bureau will release aggregate data for all units of census geography down to the block group. These data will represent the 2005-2009 period.