Frequently Asked Questions on the ACS/PRCS Multi-year files
- What are the ACS/PRCS multi-year files?
- Do the multi-year files simply join the previously-released one-year files?
- Should I continue using the single-year files for anything?
- Is the IPUMS data any different from what the Census Bureau is providing?
- Where can I get better geographic identifiers?
What are the ACS/PRCS multi-year files?
Since 2008, the Census Bureau has released PUMS files from the American Community Survey and Puerto Rican Community Survey that cover multiple years of data. The 2005-2007 ACS/PRCS 3-year file, released in early 2009, combines previously-released single-year files from 2005, 2006, and 2007. The 2006-2008 ACS/PRCS 3-year file, released at the end of 2009, combines previously-released single-year files from 2006, 2007, and 2008. The single-year files each documented 1% of the population. The 3-year files combine three single-year files, documenting a total of 3% of the population.
In 2010, the Census Bureau began releasing 5-year files. The first of this series was the 2005-2009 ACS/PRCS 5-year file, which combines previously-released single-year files from 2005, 2006, 2007, 2008, and 2009. This documents a total of 5% of the population.
For more information, see the relevant sections of the Census Bureau handbooks, which are tailored to the needs of particular data users.
Do the multi-year files simply join the previously-released one-year files?
While the multi-year files contain all cases from the previously-released single-year files, they do differ in important ways:
- Weighting variables now yield estimates for the entire 3-year or 5-year period. This includes PERWT, HHWT, REPWT, and REPWTP.
- All income and dollar-amount variables are inflated to dollars for the last data year (i.e., 2007 dollars for the 2005-2007 3-year file; 2008 dollars for the 2006-2008 3-year file; and so on).
- Except for the final year of data, topcodes in the multi-year file differ from those in the single-year files. For some high-income cases, previously topocoded values are no longer topcoded.
- Variable codes and availability differ from the single-year files:
- In the 2005-2007 3-year file and 2005-2009 5-year file, three New Orleans PUMAs, separately identifiable in the 2005 single-year data, were collapsed in the combined file to match the post-Hurricane Katrina geography of 2006 and 2007 (see PUMA).
- Variables that are not available in all of the years in the multi-year file are excluded. For example, variables on second residence were not available in the 2006 and 2007 single-year files and are not available in the 2005-2007 3-year file (these include SECRES, SECRESMO, and SECRESRE).
- Variables whose content changed dramatically within the multi-year period are excluded from the multi-year files by the Census Bureau. For example, the Census Bureau introduced revisions to its disability variables in the 2008 ACS/PRCS; though similar, these variables do not appear in the 2006-2008 3-year file because of comparability concerns.
- Other variables have the same content, but different coding schemes. Most notably, education, relationship to householder, house value, and the number of rooms/bedrooms in the house all gained additional detail between the 2007 and 2008 ACS questionnaires. The multi-year files including these two years of data reflect the least detailed coding schemes available over the multi-year period.
In sum, although the multi-year files sometimes contains reduced information relative to the single-year files, they are ideal for researchers who are studying small populations (provided that the characteristics of interest are reasonably stable in their level over the three-year period).
Should I continue using the single-year files for anything?
Yes. The single-year files represent the most current data possible.The weights from the single-year files are calibrated to represent the population in a single year, while the weights from the 3-year and 5-year files are calibrated to represent the population over the entire 3-year (or 5-year) period. provide 3-year estimates. If users need current data and have enough sample cases for their population of interest, the single-year files are a good choice.
The single-year files are also the best files for studying change over time (e.g., between 2005 and 2009). While it is possible to use the multi-year file's MULTYEAR variable to separate cases by year to study change over time, this is not recommended because the weights are designed to represent the entire 3-year or 5-year period, as described above. The Census Bureau recommends that users compare multi-year datasets only if the periods do not overlap; for example, the 2005-2007 3-year file (currently available via IPUMS) can be compared to the 2008-2010 3-year file (scheduled to be released in late 2011). Until then, the best way to study change over time is to use the single-year files.
This Census Bureau report is a helpful summary of the issues that arise in deciding whether to use single-year or multi-year data, as well as differences in interpretation between the two types of data.
Is the IPUMS data any different from what the Census Bureau is providing?
Yes, there are several differences. The IPUMS data harmonizes variables as closely as possible with previous data releases; this often results in new variable names, codes, and labels. Variables reporting dollar values have been pre-standardized to constant dollars; original PUMS values are not adjusted, and users must apply the Census Bureau-provided adjustment factor manually. (Please note that the Census Bureau's adjustment factor for the multi-year files includes two components: the CPI-U factor that inflates dollar values to the final year of the sample, and a smaller adjustment factor that addresses the varying reference periods in the survey responses. The IPUMS pre-adjustment applies only the CPI-U factor; the latter factor is available as a separate download in ADJUST. For more information, see the note on the standardization of ACS dollar amounts.)
IPUMS also provides many variables describing geography and family interrelationships that are not available in the original census files. Geographic variables unique to the IPUMS include CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGCITY5, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. Furthermore, in-laws are not completely classified in the ACS/PRCS, and subfamily variables appear to be flawed. IPUMS applies programming that provides more detail on in-laws and more consistent subfamily identifiers.
Where can I get better geographic identifiers?
The lowest unit of geography in the microdata files is still the PUMA. PUMAs contain at least 100,000 people. Aggregate data (but not microdata) is currently available from the Census Bureau for geographic areas as small as block groups, but only for the entire 2005-2009 period.