2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 |
2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 |
2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 |
September 4, 2024
- CBPOVERTY has been added to PRCS samples. CBPOVERTY reports the Census Bureau's calculation of each person's total family income as a percentage of the poverty threshold for their family situation. The family situation used is as specified by the Census Bureau's family definitions, rather than the family definitions that use IPUMS family interrelationship variables. (See FAMUNIT and POVERTY for the IPUMS definitions of families and poverty.)
- In the 2022 ACS 1-year and 5-year samples, we revised 5 variables (CITY, MET2013, MET2023, MIGMET131, PWMET13) to improve the identification of the corresponding geographic areas as detailed below. IPUMS USA contructs these variables by identifying which PUMAs (PUMA), Migration PUMAs (MIGPUMA1), or Place-of-Work PUMAs (PWPUMA00) have a majority of their population in an identifiable city or metro area. We have corrected an error in this identification process that had mishandled certain cases where a city or metro area contained only part of a PUMA (or Migration or Place-of-Work PUMA). We have also updated CITYPOP and five "mismatch error" variables (CITYERR, MET2013ERR, MET2023ERR, MIGMET13ERR, PWMET13ERR) with corresponding revisions. Users who previously downloaded any of these variables in a 2022 ACS data extract should resubmit their extract request(s) to obtain the corrected data.
- CITY: Improved the identification of 24 cities and added codes for 13 previously unidentified cities. CITY now identifies 112 cities for the 2022 responses in ACS/PRCS samples. For a listing of the affected cities, including the changes in mismatch errors, see this spreadsheet: Revisions to CITY in 2022 ACS Samples.
- MET2013: Improved the identification of 21 metro areas and added codes for 2 previously unidentified metro areas. MET2013 now identifies 265 metro areas for the 2022 responses in ACS/PRCS samples. For a listing of the affected metro areas, including the changes in mismatch errors, see this spreadsheet: Revisions to MET2013 in 2022 ACS Samples, March 2024 - September 2024.
- NOTE: We also made a distinct set of corrections to MET2013 on March 7, 2024. For a listing of the 133 metro areas affected by the two revisions, including the combined effects on mismatch errors, see this spreadsheet: Revisions to MET2013 in 2022 ACS Samples, November 2023 - September 2024. We have updated the revision history to include an entry for March 7, 2024 with additional details.
- MET2023: Improved the identification of 17 metro areas and added a code for 1 previously unidentified metro area. MET2023 now identifies 264 metro areas for the 2022 responses in ACS/PRCS samples. For a listing of the affected metro areas, including the changes in mismatch errors, see this spreadsheet: Revisions to MET2023 in 2022 ACS Samples.
- MIGMET131 and PWMET13: Improved the identification of 4 metro areas and added a code for 2 previously unidentified metro areas. MIGMET131 and PWMET13 now identify 244 metro areas for the 2022 responses in ACS/PRCS samples. For a listing of the affected metro areas, including the changes in mismatch errors, see this spreadsheet: Revisions to MIGMET131 and PWMET13 in 2022 ACS Samples.
- COUNTYFIP and COUNTYICP have been revised in the 2022 ACS 1-year and 5-year samples to identify a previously unidentified county equivalent in Connecticut (STATEFIP 9): Northwest Hills Planning Region, COUNTYFIP 160, COUNTYICP 1600.
Expanded variables.
Edited variables.
July 2, 2024
- An error that resulted in FTOTINC values being inflated twice for all multi-year ACS/PRCS samples has been corrected. The previously incorrect FTOTINC values also impacted POVERTY values for the multi-year files, resulting in incorrect POVERTY estimates for secondary families (only) prior to February 6, 2024 and all families (primary and secondary) between February 6, 2024 and June 25, 2024.
- A new variable, CBPOVERTY, has been added to the ACS samples. CBPOVERTY is calculated by the Census Bureau and reports each person's total family income expressed as a percentage of the poverty threshold for their specific family situation based on Census Bureau family definitions (rather than the IPUMS family interrelationship variables that are used in the variable POVERTY). Because of differences in how the Census Bureau and IPUMS identify families, IPUMS USA provides CBPOVERTY for those who wish to replicate published Census Bureau poverty estimates.
Edited variables.
Added variables.
April 29, 2024
- A crosswalk between 2010 and 2020 PUMAs has been added to the 2020 PUMA Definitions page.
- An error affecting migration and place-of-work variables has been fixed in the 2022 ACS/PRCS 5-year samples. Migration and place-of-work counties (MIGCOUNTY1, PWCOUNTY) and metropolitan areas (MIGMET131, MIGMET13ERR, PWMET13, PWMET13ERR) were not being correctly identified. Users who previously downloaded data containing these samples and these variables should re-submit their extract requests to obtain these updated variables.
Improved documentation.
Edited variables.
April 15, 2024
- MLP Version 1.2 incorporates recently revised 19th century census files and adds more customization to the linking model to address changing variable availability across censuses. Additionally, this release provides links between 20 and 30-year census pairings as well as links to the 1950 census. We also provide crosswalks to the Social Security Numident data for the first time in this version in addition to World War II enlistment data linked to the 1940 census. The enlistment data were prepared by the CenSoc team at UC Berkeley. The version 1.2 links, including those for 1950, are not yet incorporated into the IPUMS extract system and are temporarily available only as crosswalks.
Data improvements.
March 27, 2024
- Migration and Place-of-Work PUMA variables are now available for the 2022 1-year ACS/PRCS samples: MIGCOUNTY1, PWCOUNTY, MIGMET131, MIGMET13ERR, PWMET13, and PWMET13ERR
- Migration and Place-of-Work PUMA variables are now available for the 2022 5-year ACS/PRCS samples, as well. With the 2022 5-year ACS/PRCS data, the Census Bureau released two different variables to account for different census definitions of Migration and Place-of-Work PUMAs (2010 and 2020). These variables have been combined into single variables, MIGPUMA1 and PWPUMA00, based on MULTYEAR, in order to provide single Migration and Place-of-Work PUMA variables for multi-year samples during the transition period from 2010 PUMA definitions (2012 - 2021 ACS samples) to 2020 PUMA definitions (2022 - 2031 ACS samples). The following Migration and Place-of-Work variables were updated to use the updated MIGPUMA1 and PWPUMA00 variables for 2022 5-year ACS/PRCS samples: MIGCOUNTY1, PWCOUNTY, MIGMET131, MIGMET13ERR, PWMET13, and PWMET13ERR
Expanded variables.
March 7, 2024
- The following geography variables were updated to use newly-released 2020 PUMA definitions for the 2022 1-year ACS/PRCS samples: DENSITY, METPOP10, HOMELAND. For the 2022 5-year ACS/PRCS samples, the Census Bureau released two different PUMA variables to account for different census definitions of PUMA (2010 and 2020). These variables have been combined into one variable, PUMA, based on MULTYEAR, in order to provide a single PUMA variable for multi-year samples during the transition period from 2010 PUMA definitions (2012 - 2021 ACS samples) to 2020 PUMA definitions (2022 - 2031 ACS samples). The same process was used to update STRATA, which had been split into two variables by the Census Bureau based on 2010 and 2020 definitions. The following geography variables were updated to use the updated PUMA variable for 2022 5-year ACS/PRCS samples: CITY, CITYERR, MET2013, MET2013ERR, METRO, PCTMETRO, HOMELAND, DENSITY, METPOP10, COUNTYFIP, COUNTYICP, PRCOUNTY.
- SLWT was incorrect in the 1950 full count file and has been revised. Users who previously downloaded 1950 full count data should resubmit their extract requests to obtain this updated variable.
- MET2013 was revised in the 2022 ACS sample to improve the identification of 110 metro areas and to identify 14 previously unidentified metro areas. MET2013 now identifies a total of 263 metro areas in the 2022 ACS sample. MET2013ERR was also revised to correspond with the MET2013 revisions. As explained in the MET2013 variable description, IPUMS USA constructs MET2013 by identifying the 2013 metropolitan statistical area in which the majority of each PUMA's population resided. The 2022 ACS sample was the first to use new 2020 PUMAs. For our initial release of the 2022 ACS sample in November 2023, we used a faulty crosswalk to associate 2020 PUMAs with 2013 metro areas, resulting in numerous cases where a PUMA should have been associated with a MET2013 code and was not, as well as cases where a PUMA was associated with a MET2013 code and should not have been. We provide a listing of the metro areas affected by this revision, including the changes in mismatch errors, in this spreadsheet: Revisions to MET2013 in the 2022 ACS Sample, March 2024. Users who previously downloaded MET2013 in a 2022 ACS data extract should resubmit their extract request to obtain the corrected data.
Expanded variables.
Edited variables.
February 6, 2024
- The 2022 5-year American Community Survey and Puerto Rico Community Survey data are now available.
- Due to the effects of the COVID-19 pandemic on the 2020 ACS data collection and data quality, the Census Bureau revised its methodology for weighting households in the 2018-2022 5-year sample by using their standard weighting methodology for the data from non-2020 years and the entropy-balance weighting methodology for the 2020 portion of the data. The Census Bureau encourages users to proceed with caution when comparing data products containing 2020 data with other years. For more information about how this impacts the 2022 ACS 5-year and 2022 PRCS 5-year samples, please see the Census Bureau’s user notes on modifications and increased margins of error for 5-year estimates containing data collected in 2020. More information about changes to the weighting methodology can be found in the ACS 2022 5-year sample description.
- A data collection error in certain counties occurred in 2019 resulting in an error for the availability of telephone service. As a result, data for PHONE was removed or suppressed for respondents in the affected areas to account for this error. The Census Bureau has a more detailed errata note available with more information. More detail is available in the ACS 2022 5-year sample description.
- An error was found and corrected in FAMUNIT. The spouse of the head of household as identified by SPLOC was not identified as being related to the head of household in RELATE. As a result, the coresident grandchildren of the head of household and the head of household’s spouse were reported to not be part of the appropriate family unit in FAMUNIT. These third-generation children are now included in the IPUMS primary family unit.
- Several errors were found and corrected in SUBFAM. Unmarried partners of the head were previously included in their own subfamily; they now have a SUBFAM value of 0. Members of male same-sex couples where neither partner is the head were each being incorrectly put into their own, single-person subfamily. Members of female same-sex couples where neither partner is the head were not being put into a subfamily when they should have been. Parents and siblings of the head were being incorrectly put in their own subfamily in the instance where the household head is actually a young, unmarried person without children and their siblings and single parent are further down the household roster. In these cases, the parent and siblings under 18 were being assigned to their own subfamily even though they are really part of the primary family. These cases now have a SUBFAM value of 0. In some instances, NSUBFAM did not match the number of subfamlies in the household. These now match.
- An error was found and corrected in FTOTINC. In households in which the head of household has an unmarried partner, other members of the family unit (FAMUNIT) were assigned the head of household’s value of INCTOT as their FTOTINC value.
- POVERTY has been modified to improve internal consistancy. POVERTY is now determined based on the IPUMS family unit, which differs from the Census Bureau family unit for primary families, for all family units within a household. Previously, for those with FAMUNIT > 1, the value of POVERTY was based on an IPUMS calculation using family size, number of children, and FTOTINC with in combination with Census Bureau published thresholds and everyone with FAMUNIT == 1 in the ACS and PRCS samples was assigned the Census Bureau recode value of poverty.
Added samples.
Edited variables.
Data improvements.
January 31, 2024
- The 1950 100% population database is now available. This first version includes core household and demographic variables, geographic indicators, and family composition information. Some data are available for occupation, income, and veteran status. Users should be aware that this is a preliminary release only, and should use appropriate caution when conducting analyses. Full details on data caveats and areas of future improvement can be found on the full count census page and individual variable descriptions as applicable.
Added samples.
November 21, 2023
- The 2022 1-year American Community Survey and Puerto Rico Community Survey data are now available.
- A new version of the 1850-1880 full count data is available. This version includes improvements to household definition, geographic, nativity, occupation, and demographic variables, as well as new slave-related variables and improvements to imputation and allocation processes. Efforts to increase standardization and improve harmonization have resulted in minor changes throughout the data. The variable DWSIZE (Dwelling Size) has been removed from the full-count files; use FAMSIZE for comparable information. Significant cleanup and standardization of coding and universe enforcement were done to group quarters variables (GQ, GQTYPE, and GQFUNDS) across all files. Previous versions of birthplace variables (BPL, FBPL, and MBPL) had inconsistent coding for persons born within certain US Territories. The detailed codes for these variables have been updated to reflect the Territory status of some states, including the following updates: Idaho (1600) to Idaho Territory (1610), New Mexico (3500) to New Mexico Territory (3510), Oklahoma (4000) to Indian Territory (4010), North/South Dakota (3800/4600) to Dakota Territory (4610), Utah (4900) to Utah Territory (4910), Washington (5300) to Washington Territory (5310), and Wyoming (5600) to Wyoming Territory (5610). A change was made to CITY to identify only cities within the stated universe. This adjustment suppressed populations (identified in CITYPOP) for some small cities that were previously available; this change also affects SIZEPL. The variables REALPROP and PERSPROP had erroneously high values that have been corrected. Small improvements to relationship imputation may result in changes to IMPREL/RELATE and ensuing dependent vaiables. These changes are not statistically significant, but users may see adjustments to individual records. These changs also apply to all 1/5/10% sample files where relationships are imputed rather than recorded. Adjustments have been made to improve and refine hot-deck allocation for missing data. The new criteria exclude persons with outlier values from donor eligibility. Users should expect some changes in birthplace, occupation, and age distributions. The following variables have been added: MCDPOP (Minor Civil Division Population) for all 19th century full count files, QRPROP (Real Estate Value Editing Flag) for 1860 and 1870 files, QPPROP (Personal Estate Value Editing Flag) for 1860 and 1870 files, OCCDUPE (Duplicate Occupation Flag) for the 1870 file, SLAVEHH (Slave Inhabitants in Household) for 1850 and 1860 files, SLAVEHOLDINGS (Number of Slaveholdings Associated with Individual) for 1850 and 1860 files, SLAVENUM (Number of Slaves Associated with Indivdual) for 1850 and 1860 files, SLAVEOWN (Number of Slaves Owned By Individual) for the 1860 file. Updates to single-year files are outlined below. For the 1850 full count file, two new codes have been added to CITY: 3397 (historical Lafayette, LA) and 4612 (Williamsburg area within New York City). New variables SLAVEHH, SLAVENUM, and SLAVEHOLDINGS describe whether any houshold member was linked to a slave holding and the number of slaves and slave holdings linked to each individual. For the 1860 full count file, an error has been corrected in RACE for persons in Dakota Territory who were previously misclassified as Mulatto and are now classified as American Indian. An error in STATEFIP has been corrected in which households located in Oklahoma/Indian Lands Territory were mistakenly being coded as "Overseas Military". New codes added to CITY include 3680 (Lockport, NY), 2510 (Gloucester, MA), and 4839 (North Providence, RI). In addition to the added slave variables described for 1850, 1860 includes SLAVEOWN which indicates how many slaves were specifically "owned" by an individual. An alignment error in several counties that previously assigned incorrect values for OCC, PERSPROP, and REALPROP to some individuals, has been fixed. The counties include Dickinson, KS, West Baton Rouge, LA, Itasca, MN, Washington, NE, Hyde, NC, and San Augustine, TX. Corrections were made for some very high values to REALPROP and PERSPROP and flagged with QRPROP and QPPROP, respectively. These same corrections were applied for the 1870 full count file, which also included a fix for the variable OCCDUPE in which a number of individuals were erroneously assigned the OCC value of the head of their household. The error has been corrected and affected individuals are flagged in the new variable OCCDUPE; their occupations have been logically edited based on age and relationship to the household head. These person now have occupation codes of "Other non-occupational response" (OCC=310 and OCC1950=995). For the 1880 full count file, significant revisions were made to URBAN, especially in Massachusetts and Rhode Island. Some persons who were living in urbanized fringe areas (MDSTATUS) were erroneously coded as "rural". This has been corrected. A new code for appendicitis (912) has been added to the variable SICKNESS. New codes 0490 (Austin, TX) and 3790 (Lynchburg, VA) were added to the variable CITY. For a full description of the changes and updates in this revision, please review the full count census page. NOTE: This version does not include a new version of MLP crosswalks.
- New metropolitan area variables, MET2023 and MET2023ERR were added for the 2022 1-year ACS/PRCS samples. MET2023 identifies metro areas of residence using the 2023 definitions for metropolitan statistical areas (MSAs) from the U.S. Office of Management and Budget (OMB). The 2023 MSAs are the first to be based on 2020 standards and 2020 census data.
- RACHSING (simplified race variable) and its predictor variables are now available for the 2020 and 2021 ACS samples.
- An error that resulted in incorrect values for CITYPOP in the 2019-2021 ACS 1-year samples has been corrected. The values for CITYPOP in the 2019-2021 samples were not updated correctly and instead reflected the 2018 ACS 1-year values. These samples have been updated and now include the correct CITYPOP values.
Added samples.
Data improvements.
Added variables.
Expanded variables.
Edited variables.
Geography update.
April 13, 2023
- We have re-released Version 1.1 of the IPUMS MLP variables that identify individuals who can be linked across census years, as well as the HISTID crosswalks that can be used for manual linkage. This release corrects for an error in the links available only from March 10 to March 28, 2023 that did not consistently impose the appropriate age restrictions on linkages between some years.
Data improvements.
March 10, 2023
- The 2021 5-year American Community Survey and Puerto Rico Community Survey data are now available.
- Due to the effects of the COVID-19 pandemic on the 2020 ACS data collection and data quality, the Census Bureau revised its methodology for weighting households in the 2017-2021 5-year sample by using their standard weighting methodology for the 2017-2019 portion of the data and the entropy-balance weighting methodology for the 2020 portion of the data. The Census Bureau encourages users to proceed with caution when comparing data products containing 2020 data with other years. For more information about how this impacts the 2021 ACS 5-year and 2021 PRCS 5-year samples, please see the Census Bureau’s user notes on modifications and increased margins of error for 5-year estimates containing data collected in 2020. More information about changes to the weighting methodology can be found in the ACS 2021 5-year sample description.
- As with the 2020 5-year sample, a data collection error in certain counties occurred in 2016 and 2019 resulting in a error for the availability of telephone service. As a result, data for PHONE was removed or suppressed for respondents in the affected areas to account for this error. The Census Bureau has a more detailed errata note available with more information. More detail is available in the ACS 2020 5-year sample description.
- As with the 2016-2020 5 year samples, two data collection errors occurred in select areas in 2017, and as a result, the 2017 ACS 1-year estimates for the affected variables in the affected areas should not be compared to other ACS estimates. The Census Bureau has a user note for both data collection errors: one in Pennsylvania and one in Delaware. More detail is available in the ACS 2021 5-year sample description.
- Due to the monetary inflation adjustment for the 5-year files, the width of the variable RENTGRS was increased from 4 to 5 to account for the some adjusted values being over 4 wide.
- STEPMOM and STEPPOP were inadvertantly released for the 2021 1-year ACS and the 2021 1-year PRCS samples. They have now been removed for those samples. Users who included STEPMOM and STEPPOP in their extracts should NOT use these variables.
- Changes have been made to YRIMMIG to correct errors. For the 2005 PRCS 1-year and 2010 PRCS 5-year samples, cases that were incorrectly assigned to a code of 9999 (Blank) are now coded as 0000 (N/A). For the 2012 and 2013 ACS 3-year and the 2012, 2013, and 2014 ACS 5-year samples, 1931 and 1933 were incorrectly coded as 1932 and 1934, respectively. This error has been fixed.
- A new code (912, appendicitis) has been added to SICKNESS for the 1880 samples. These cases had previously been coded as 9999, Missing or N/A.
- VET47X50 will no longer be available in the 5-year samples beginning with the 2017-2021 5-year file. VET47X50 was collapsed into VETOTHER.
- IPUMS USA users can now select records for inclusion in their extracts based on linkages from the IPUMS Multigenerational Longitudinal Panel (MLP). MLP links individuals across adjacent full-count censuses, providing an opportunity to study people over time. Selecting this feature will automatically limit the number of records in a user's extract to only cases that can be linked across all the selected samples, which will reduce the size and increase the usability of the resulting dataset; this feature also allows users to include the records of those residing with linked individuals.
- A new version (v1.1) of the MLP links is available. This new version reflects improvements to and refinement of the linking algorithm used by MLP to link individuals across censuses. This version also corrects an error in the previously-released data (v1.0) that did not restrict viable Step I links between the 1880 and 1900 censuses to men only. Users should note that this does not necessarily indicate that the Step I links for women are incorrect, but we strongly encourage them to use the v1.1 MLP links.
- HIUFPGBASE, HIUFPGINC, HIUHHSPOV, HIUID, HIUNPERS, HIUPOVUNIV, and HIURULE are now available for the 2021 ACS 1-year sample. These health insurance variables were constructed by SHADAC. For more detailed information, please consult the variable descriptions.
- Several typos in value labels for LANGUAGE have been corrected. These fixes apply to the 2010-2011 ACS 1-year, 2012-2013 ACS 3-year, and 2014-2015 ACS 5-year samples.
- The ACS Multiyear page has been updated to clarify the specific monetary variables that are adjusted to the final year's value.
- The codes tab for RENT was corrected to specify the differing top codes between the 1960 1% and 5%.
Added samples.
Edited variables.
Discontinued variables.
New feature release.
Data improvements.
Expanded Variables.
Improved documentation.
November 30, 2022
- The 2021 1-year American Community Survey and Puerto Rico Community Survey data are now available.
- In order to preserve privacy, the Census Bureau combined two variables describing periods of military service into a single variable. All veterans who had peacetime service prior to 1950 are now included in VETOTHER. To accommodate changing definitions of a 'Yes' response over time, the width of the variable has been expanded from 1 to 2 digits. Detailed codes that preserve sample-specific detail are available for the 'Yes' response when "Detailed codes" are selected on the codes tab.
- The Census Bureau collapsed values for BUILTYR2 between 2000 and 2019 into two decadal categories, and the year 2021 has been added.
- The variable VET47X50, which describes veterans with peacetime military service during the period between World War II and 1950, has been discontinued by the Census Bureau. Veterans with service during this period have been combined with people who served prior to WWII and are classified in VETOTHER.
- Users should note that the number of samples included in the default sample selection for the USA samples has been reduced to improve variable browsing.
Added samples.
Edited variables.
Discontinued variables.
Sample selection update.
October 17, 2022
- New Supplemental Poverty Measure (SPM) variables, including MOOP, MEDICAREB, TAXID, ADJGINC, SPMTOTRES, SPMFTOTVAL, SPMSNAP, SPMCAPHOUS, SPMLUNCH, SPMHEAT, SPMWIC, SPMFEDTAXAC, SPMFEDTAXBC, SPMEITC, SPMFICA, SPMSTTAX, SPMCAPXPNS, SPMWKXPNS, SPMCHXPNS, SPMMEDXPNS, and SPMPREMIUM were added for the 2009-2019 1-year ACS samples. IPUMS now makes available all the component variables used in constructing the SPM poverty measure.
- The CITY code for Texarkana is used for cases in both Texarkana, Texas, and Texarkana, Arkansas. Previously named “Texarkana, TX”, the value label for this code was updated to “Texarkana, TX/AR” to accurately reflect that both states are represented.
- The variables RACWHT, RACPACIS, and RACOTHER were erroneously made available for the 1900 1% and 1.2% and 1910 1% and 1.4% samples. This availability has been corrected to only include the 2000 through 2020 samples.
- The variable description for ENUMDIST was updated to clarify that ENUMDIST, in the full count files, must be read with both a state variable and a county variable in order to uniquely identify enumeration districts within and between states.
Added variables.
Improved documentation.
September 15, 2022
- IPUMS USA is introducing a new feature: Adjust Monetary Values, which allows users to attach new monetary variable(s) to their data extract. In the initial release of this feature, variables will be adjusted to 2010 dollars using the CPI-U. Future expansion of the feature will include additional pricing indices, additional years available to adjust to, and additional monetary variables available for adjustment. Future expansion will also include availability of this feature for other IPUMS data collections. This feature will be available as one of the options on the extract summary page. More information about this feature is available on the ”https://usa.ipums.org/usa/adjusted_monetary_values.shtml”>Monetary Adjustment Feature
- A new variable, LNKLIFEM, was added for the 1880, 1900, 1910, 1920, and 1940 full count files to link individuals to the ”https://life-m.org/”>Longitudinal, Intergenerational Family Electronic Micro (LIFE-M) database
- BIRTHYR is now available for the 1940 full count file.
- An error that resulted in incorrect coordinates of “0” was fixed for XGPS and YGPS in the 1880 full count file. Values of 0 are now being recoded as 998=Missing.
- A user note was added to the variable description for LINGISOL to alert users to an error in the original Census Bureau data in which several houses were incorrectly designated as ‘linguistically isolated’ or ‘not linguistically isolated’ due to the exclusion of some 14-year-olds in the 2000 U.S. samples, the 2000 5% Puerto Rican sample, and the 2000 through 2004 ACS samples.
New feature release.
Added variables.
Expanded variables.
Edited variables.
Improved documentation.
July 14, 2022
- New Supplemental Poverty Measure (SPM) variables, including SPMPOV, OFFPOV, SPMFAMUNIT, SPMNPERS, SPMNADULTS, SPMNCHILD, SPMCOHABIT, SPMUICHILD, SPMTHRESH, SPMEQSCALE, SPMMORT, and SPMGEOADJ were added for the 2009-2019 1-year ACS samples.
- Experimental weights (EXPWTH and EXPWTP) are now available for the 2019 1-year ACS sample. Note that experimental weights are also available for the 2020 1-year ACS, 2020 5-year ACS, and 2020 5-year PRCS through both the experimental weight variables and the standard production weight variables (HHWT and PERWT). For more information, see this ”https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2022-07.html”>Census Bureau note about 2019 ACS 1-Year PUMS with Experimental Weights
Added variables.
May 11, 2022
- The 2020 5-year American Community Survey and Puerto Rico Community Survey data are now available.
- Due to the effects of the COVID-19 pandemic on the 2020 ACS data collection and data quality, the Census Bureau revised its methodology for weighting households in the 2016-2020 5-year sample by using their standard weighting methodology for the 2016-2019 portion of the data and the entropy-balance weighting methodology for the 2020 portion of the data. The Census Bureau encourages users to proceed with caution when comparing data products containing 2020 data with other years. For more information about how this impacts the 2020 ACS and PRCS 5-year samples, please see the Census Bureau’s user notes on ”https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2022-03.html”>modifications
and ”https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2022-04.html”>increased margins of error
- HIUFPGBASE, HIUFPGINC, HIUHHSPOV, HIUID, HIUNPERS, HIUPOVUNIV, and HIURULE are now available for the 2020 1-year sample. These health insurance variables were constructed by SHADAC. For more detailed information, please consult the variable descriptions.
- An error affecting EMPSTAT has been corrected for the 1940 complete count file. This fix addresses some cases of children under the age of 14 who were mistakenly reported as Not in Labor Force (EMPSTAT = 3) rather than as Not in universe (EMPSTAT = 0).
- An error affecting LABFORCE has been corrected for the 1940 complete count file. This fix concerns instances where residents of group quarters were erroneously being coded as Not in Labor Force (LABFORCE = 1). This has been changed so that only persons listed as institutional inmates (RELATE = 13) are classified as Not in Labor Force, and other GQ residents are classified correctly according to their employment status.
- An error in INCWAGE for the 1940 complete count file has been corrected. Previously, some institutional inmates (RELATE = 13) had values for INCWAGE despite being excluded from the universe. They have now been correctly coded as N/A (INCWAGE = 999999).
- Due to a minor adjustment in data processing, our imputation of IMPREL has changed slightly for a few cases in datasets from 1850-1910. For datasets where relationship to household head, RELATE, was not collected (1850-1870), this small number of cases also have different family interrelationship values consistent with their new IMPREL values.
- TOILET and COMMUSE, and the corresponding data quality flags QTOILET and QCOMMUSE, were discontinued in 2015, and as a result, will no longer be available in the 5-year samples beginning with the 2016-2020 5-year file.
- The comparability tab for MORTAMT1 has been updated to reflect a slight change in universe for 2020. Beginning in 2020, homeowners with a home equity loan are no longer considered to have a primary mortgage.
- The poverty definition page has been updated to include the adjustment factors for all available sample years.
- The process for agreeing to the terms of use for the full count data has changed. Previously, users needed to review the terms for full count data as part of each extract request that includes full count data. Under the new process, these terms only need to be agreed to once per year when requesting an extract that includes full count data. These terms are associated with a user's account registration for IPUMS USA, and the agreement to these terms must be recertified annually as part of account renewal.
Added samples.
Expanded variables.
Edited variables.
Discontinued variables.
Improved documentation.
Terms of agreement update.
January 18, 2022
- The 2020 1-year American Community Survey data are now available.
- Users should note that due to the effects of the COVID-19 pandemic on the 2020 ACS data collection and data quality, the 2020 1-year ACS PUMS file was released with experimental weights. Users should proceed with caution when using the 2020 1-year ACS PUMS file and should not compare it to other ACS years. Please see ACS and COVID-19: Guidance for Using the PUMS with Experimental Weights for more information.
- Users should also note that due to the effects of the COVID-19 pandemic on the 2020 ACS data collection and data quality, the 2020 1-year PRCS PUMS file is not available.
- A revised version of the 1900-1930 complete count Census files are now available.
- Users should note that the number of records in the 1900-1930 revised versions have decreased slightly, resulting in minor changes to variable frequencies.
- Users should also note that all territories, including Alaska and Hawaii, have been removed from the 1900-1930 complete count files.
- A new single race identifier, RACHSING, has been added to the 2000 census, 2010 census, and the 2000-2019 ACS samples. RACHSING assigns a single race to non-Hispanic multiple-race people and assigns a federally defined race to non-Hispanic people who reported only “some other race.” Predicted value variables used to assign a single race are also available: PREDAI, PREDAPI, PREDBLK, PREDWHT, and PREDHISP. Please see the variable descriptions for more detailed information about these variables.
- Two new supplementary health insurance variables have been added to the 2019 ACS 1-year file. These variables are: HIUPOVUNIV and HIUHHSPOV. These health insurance variables were constructed by SHADAC. For more detailed information, please consult the variable descriptions.
- A new variable, VACOTH, and the corresponding data quality flag, QVACOTH, were added to the 2020 ACS 1-year file to provide additional information about vacant housing units for records with a VACANCY value of “9= other vacant.”
- ENUMDIST is now available for the 1900, 1910, 1920, and 1930 complete count files.
- VERSIONHIST is now available for all complete count files.
- VACDUR, which reports the time, in months, that a property has been vacant, and its flag QVACDUR are now available in the 2020 ACS 1-year file. VACDUR is also available in the 1960-1990 samples.
- CITY has been updated with newly identified cities in the 1990 1% and 5% samples. In addition, duplicate codes have been combined into a single CITY code for the following places: 1400 merged with 1374 for Cohoes, NY, 3540 merged with 3521 for Lebanon, PA, 4850 merged with 4839 for North Providence, RI, 6220 merged with 6211 for San Angelo, TX, 7092 merged with 7110 for Valleho, CA, and 7451 merged with 7460 for Wilkinsburg, PA.
- A universe error for CHBORN in the 1900 complete count file has been corrected; the universe was previously incorrectly limited to 'ever-married females,' resulting in never-married female responses being incorrectly coded to the N/A category (CHBORN=0); this has been corrected and CHBORN values for never-married females are now available. Note that this issue was corrected in the 1900 5% sample in September 2021.
- Some of the codes for VET1930 were changed in the 1930 1% and 5% samples, as well as the 1930 complete count file. A code of 6 now equals “Period of service not ascertained”, a code of 7 equals “Veteran status missing but war field implies veteran status”, and a code of 8 equals “Illegible” cases. The previous codes were 6="Illegible" and 7="Period of service not ascertained."
- An error that resulted in incorrect values for ERSCOR50 in the 2019 5-year sample has been corrected. The values for ERSCOR50 were erroneously replaced by the values for EDSCOR50.
- Some codes for VACDUR were updated to accommodate additional detail in the 2020 ACS 1-year file. All codes after the code of 3 were shifted down by one value such that 4 now equals “2-4 months,” 5 equals “4-6 months,” 6 equals “6-12 months,” 7 equals “12-24 months,” and 8 equals “more than 24 months.” As a result, some of the coding in the 1960 sample has changed to include more detail than was previously available.
- Due to a minor adjustment in data processing, our imputation of IMPREL has changed slightly for a few cases in datasets from 1850-1910. For datasets where relationship to head RELATE was not collected (1850-1870), this small number of cases also have different family interrelationship values consistent with their new IMPREL values.
- We extended the files that report how well metro areas and cities align with PUMAs. These mismatch reports, available through the MET2013, MET2013ERR, CITY and CITYERR variable descriptions, had previously been limited to the metro areas identified by MET2013 (those with less than 15% mismatch) and cities identified by CITY (less than 10% mismatch). They now include all 2013 metro areas and all large places (those with populations > 75,000).
- For CITY and CITYERR, we also extended the geographic crosswalk files that report relationships between 1990 PUMAs and large places. These had been limited to cities identified in the original PUMA equivalency files. They now include all large places (populations > 75,000).
Added samples.
1900-1930 Complete Count files.
Added variables.
Expanded variables.
Edited variables.
Improved documentation.
September 15, 2021
- Two errors affecting cases for INCWAGE have been fixed in the 1940 complete count file. The first fix corrected an error that was introduced in the previous data release in which the Missing (999998) and N/A (999999) values were erroneously combined with the top code value of 5001. The second fix corrected the universe specification so that individuals outside of the universe (persons under age 14, institutional inmates (RELATE=13)) get recoded to N/A (999999).
- An error affecting a significant number cases for METRO has been fixed in the 2012, 2013, 2014, and 2015 3-year and 5-year ACS samples. The error occurred following the change in PUMA boundaries in the 2010 census.
- An error affecting the adjustment factors for income-adjusted variables in the 2007-2017 ACS and PRCS multi-year files has been corrected, resulting in minor differences in dollar values in these variables. The following variables were affected: CONDOFEE, COSTELEC, COSTFUEL, COSTGAS, COSTWATR, FDSTPAMT, FTOTINC, HHINCOME, INCBUS00, INCEARN, INCINVST, INCOTHER, INCRETIR, INCSS, INCSUPP, INCTOT, INCWAGE, INCWELFR, MOBLHOME, MORTAMT1, MORTAMT2, OWNCOST, POVERTY, PROPINSR, RENT, RENTGRS.
- LANGUAGE is no longer available for the 1920 and 1930 samples. The question upon which LANGUAGE is based, whether or not a person could speak English, was not asked in these years, and MTONGUE was incorrectly being used as the source variable for LANGUAGE in these samples. Additional information can be found on the LANGUAGE and MTONGUE comparability tabs.
- An error affecting cases for CHBORN has been fixed in the 1900 5% sample. The universe for this sample was incorrectly specified as ever-married females, resulting in never married females being incorrectly coded to the N/A category (CHBORN=0). This specification has now been removed so that the correct number of children ever born to never married females is available and the universe statement has been updated.
- An error affecting cases for PROPTX99 in the 2018 and 2019 ACS samples has now been fixed. PROPTX99 was incorrectly adjusted to 1999 dollars, which has since been reversed. Secondly, the values in the 2018 and 2019 multi-year files are now getting de-inflated to sample year dollars to maintain consistency with the previous multi-year files.
- An error in the questionnaire text for SEX and RELATE in the 2019 PRCS sample has been corrected. The text previously referenced the wrong question on the enumeration form.
- An error in the reference period for VETKOREA in the 2005-2019 1-year ACS/PRCS samples has been fixed. The variable description and enumeration forms erroneously indicated June as the start of the Korean conflict era for ACS samples 2003 and onward.
- The code 9997 = Multiple Counties" was added to the specific variable codes for MIGCOUNTY.
- A user note was added to the variable description for QAGE and QAGE2 to clarify that QAGE is the flag for AGE based on decade and year whereas QAGE2 is the flag for AGE based on year only. In addition, an error in the source variable labels for QAGE and QAGE2 was corrected. The source variables were incorrectly labeled as the flag for marital status" rather than the "flag for age allocation."
- WKSWORK1 documentation was updated to note that it is only available for 2019 in the 2019 ACS 5-year sample.
Edited variables.
Improved documentation.
April 19, 2021
- A revised version of the 1940 complete count Census file is available.
- A revised version of the 1870 complete count Census file is available. This version has the same number of records, but they are in a different sort order; this correction may have impacted family and relationship variables.
- PLACENHG is a new variable that identifies incorporated places. This variable uses the IPUMS National Historical Geographic Information System (NHGIS) coding scheme for historical places. PLACENHG is currently only available for the 1940 complete count file.
- VERSIONHIST is now available for the 1850 and 1940 complete count files.
- ENUMDIST has been corrected in the 1940 complete count file. As part of this fix, SUPDIST is no longer available for the 1940 complete count file.
- INCWAGE values above $5000 have been topcoded to $5001 in the 1940 complete count file.
- Income-adjusted variables for the 2018 and 2019 ACS and PRCS multi-year files have been adjusted to sample-year dollars. The following variables were affected: CONDOFEE, COSTELEC, COSTFUEL, COSTGAS, COSTWATR, FTOTINC, HHINCOME, INCBUS00, INCEARN, INCINVST, INCOTHER, INCRETIR, INCSS, INCSUPP, INCTOT, INCWAGE, INCWELFR, MOBLHOME, MORTAMT1, MORTAMT2, OWNCOST, POVERTY, PROPINSR, RENT, RENTGRS.
- The variables SPLIT40, SPLIT80, SERIAL40, SERIAL80, NUMPREC40, and NUMPREC60 are no longer available. These sample specific variables are now available as three variables in all complete count files. SPLIT40 and SPLIT80 are available as SPLIT. SERIAL40 and SERIAL80 are available as SPLITHID. NUMPREC40 and NUMPREC60 are available as SPLITNUM.
- An error in the codes and comparability documentation for RENTGRS has been corrected to clarify differences between the 1960 1% and 1960 5% samples.
- Value labels for CHBORN codes of 97, 98, and 99 are now listed on the codes tab for this variable; while these codes were present in the data, they were not listed in our online documentation.
- The comparability documentation for OCC and OCC1950 was updated to include a note about differences between OCC and OCC1950 in the 1910 and 1920 Puerto Rico samples. In addition, the label for OCC code 993 was updated to include a note that this is a specific code in 1910 and 1920 Puerto Rico samples.
1940 Complete Count file.
1870 Complete Count file.
Added variables.
Expanded variables.
Edited variables.
Deprecated variables.
Improved documentation.
February 9, 2021
- The 2019 5-year American Community Survey and Puerto Rico Community Survey data are now available.
- Users should note that due to data collection errors in 2015, 2016, and 2019, values for availability of a telephone (PHONE) were removed or suppressed for respondents in the affected areas. This user note from the Census Bureau contains more information about the error.
- Two data collection errors occurred in 2017, and as a result, the 2017 ACS 1-year estimates for the affected variables should not be compared to other ACS estimates. The first error occurred in Philadelphia City, Philadelphia County, and Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metropolitan Statistical Area resulting in a collection error for the following topics: employment status, health insurance, households and families, income and earnings, rent (no rent paid), and poverty. The second data collection error occurred in New Castle County, DE resulting in a collection error for the following topics: ancestry, commuting, disability, employment status, fertility, health insurance, income and earnings, industry and occupation, marital history and status, poverty, presence of grandchildren, and veteran status. The same error occurred in New Castle County, DE in 2018 but was detected prior to the ACS release, so users should note that affected variables will have higher allocation rates in the 2018 ACS 1-year sample. User notes from the Census Bureau for Philadelphia and Delaware contain more information.
- QWKSWORK, the quality flag for WKSWORK1 and WKSWORK2, was split into two variables: QWKSWORK1 and QWKSWORK2. This change occurred because both WKSWORK1 and WKSWORK2 variables were made available by the Census Bureau for the first time in the 2019 5-year ACS/PRCS samples, requiring two flags instead of just one. QWKSWORK previously served as the flag for both WKSWORK variables. Now, QWKSWORK1 will be the flag for WKSWORK1 and QWKSWORK2 will be the flag for WKSWORK2. This change extends to all samples that QWKSWORK was available for. Users should note that because WKSWORK2 is available for all samples that include WKSWORK, QWKSWORK will automatically be updated to QWKSWORK2 in preexisting extracts. Users can use either QWKSWORK1 or QWKSWORK2 for samples prior to the 2019 5-year file because the values will be the same.
- HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS are now available for the 2019 1-year ACS sample.
- An error affecting cases for OCC2010 in the 1980 samples has been fixed. Some Armed Forces respondents (EMPSTAT detailed codes 14 (Armed Forces--at work) and 15 (Armed Forces--with job but not at work)) were not being correctly coded to the Armed Forces occupation code of 9830 for OCC2010.
- PROPTX99 has been de-adjusted for inflation in the 2018 and 2019 samples to be consistent with the standardized dollar value amounts used for PROPTX99 prior to 2018.
- The following categories were added for the variable HIURULE: “Point eligible child to primary wage earning parent, non-single parent household” and “Point eligible child to parent, single parent household”. As a result, two new values were added to HIURULE to account for these new categories.
- MTONGUE and QMTONGUE were removed from the 1910 full count sample due to a data transcription error. MTONGUE information from the original Census forms was not transcribed into our digital version of the complete count 1910 data file. The derivation of this variable led to incorrectly high rates of English as a MTONGUE value. Only the 1910 full count file was affected by this error.
- The universe statement for PROPTX99 was corrected to clarify differences between sample universes.
- Due to a data transcription error in the 1920 full count file, approximately two hundred thousand records are missing data for select variables in select enumeration districts located in Maine, California, Illinois, Georgia, and Massachusetts. The sample description for the 1920 full count file has been updated to reflect this error.
Added samples.
Added variables.
Expanded variables.
Edited variables.
Improved documentation.
December 15, 2020
- An error in the layout of source variables is now fixed. This error was introduced with the release of the 1-Year ACS on 11/12/2020.
Edited variables.
November 12, 2020
- The 2019 1-year American Community Survey and Puerto Rican Community Survey data are now available.
- Users should note that due to a data collection error in Flagler County, FL and St. Joseph County, IN, values for availability of a telephone (PHONE) were removed or suppressed for respondents in the affected areas. This user note from the Census Bureau contains more information about the error.
- A new variable, COUPLETYPE, expands SSMC to provide greater detail about the householder’s relationship type by including categories for married and unmarried partners in same-sex and opposite-sex relationships. Both COUPLETYPE and SSMC will continue to be available.
- A new variable, CBHHTYPE, classifies all households based on the relationship of the householder(s). It includes categories for households with own children under the age of 18 and households with cohabiting couples.
- A new variable, HCOVSUB2, was added to capture the number of people with subsidized marketplace health insurance coverage.
- The flag for HCOVSUB2, QHCOVSUB2, was also added.
- WKSWORK1, which reports the number of weeks a respondent worked in the past year, is available in the 2019 ACS/PRCS samples. It has not been made available since 2007 because the data were released as intervaled values in the 2008-2018 ACS/PRCS samples.
- TRANWORK categories were updated in the 2019 ACS/PRCS samples, resulting in changes to the TRANWORK codes in sample years 1960-2019. Category changes include a separation of “bus” into its own category, the addition of “light rail” in the “streetcar and trolley” category, and the replacement of “railroad” to “long-distance train or commuter train”.
Added samples.
Added variables.
Expanded variables.
Edited variables.
September 16, 2020
- The following occupational standing variables are now available for 2018 ACS/PRCS samples: OCCSCORE, SEI. The following occupational standing variables are now available for 2018 ACS samples: HWSEI, PRESGL, PRENT, EDSCOR50, EDSCOR90, ERSCOR50, ERSCOR90, NPBOSS50, NPBOSS90.
- An error affecting cases for OCC1990 has been fixed in the 2018 ACS/PRCS samples. Some Armed Forces respondents (EMPSTAT detailed codes 14 (Armed Forces--at work) and 15 (Armed Forces--with job but not at work)) were not being correctly coded to the Armed Forces occupation code of 905 for OCC1990.
- An additional error affecting cases for OCC1990 has been fixed in the 2010, 2011, 2012, 2013, and 2014 ACS/PRCS 1-year samples. The OCC code of 7850 was being incorrectly recoded to the OCC1990 code of 779 instead of the correct code of 769. Due to this change, frequencies for HWSEI, PRENT, EDSCOR90, ERSCOR90, NPBOSS90 will also be slightly different since these variables are dependent on OCC1990 codes.
- QOCC has been improved for all samples to better identify logical edits and allocation.
- Four new crosswalks are available for the OCCSOC and INDNAICS variables, which are based on the Standard Occupational Classification (SOC) and the North American Industrial Classification (NAICS) systems, for the Census and ACS/PRCS sample years 2000-2018. The OCCSOC crosswalk, INDNAICS crosswalk, OCC to OCCSOC crosswalk, and IND to INDNAICS crosswalk document code changes for these variables from 2000-2018. Note: these crosswalks are reflective of the codes in the ACS PUMS data files, which will slightly differ from the Industry and Occupation Codes found on the Census Bureau website.
- Linked Full-Count Census Data are now available. The IPUMS Multigenerational Longitudinal Panel (MLP) project links individuals' records between censuses. The first IPUMS MLP data release consists of a set of crosswalks between pairs of adjacent censuses from 1900 to 1940. For more information, see the IPUMS Linked Data page.
Expanded variables.
Edited variables.
Improved documentation.
Linked Full-Count Census Data.
April 30, 2020
- Computer and internet access variables are now available for the 2017 and 2018 5-year ACS/PRCS samples. As the CISMRTPHN and CILAPTOP questions were not asked prior to 2016, the Census Bureau allocated values for all 2013, 2014, and 2015 cases in these multiyear files.
- HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS are now available for the 2018 1-year ACS sample.
- METRO is now available for the 2010 10% sample.
- The 2018 ACS introduced new occupation codes (derived from the 2018 Standard Occupational Classification (SOC) system) and industry codes (derived from the 2017 North American Industry Classification System (NAICS)). The occupation variables (OCC1950, OCC1990, OCC2010) and industry variables (IND1950, IND1990) have been harmonized to account for these changes. These variables, as well as OCCSOC and INDNAICS, are now available for the 2018 ACS/PRCS samples. For a general overview of the process used for harmonizing occupation and industry codes, refer to the following working paper: Harmonizing the 2010 and 2002 Census Occupation Coding Schemes. For more information on occupation and industry codes, refer to the Census Bureau's website on Industry and Occupation codes.
- The data quality flag QINCEARN erroneously coded allocated cases as "1" rather than "4 - Allocated (method not specified)" in the 2016-2018 ACS/PRCS 5-year samples. This has been corrected.
- The METRO variable has been improved to better identify PUMAs to metropolitan areas (even if the PUMA straddles two different metropolitan areas), and central cities by allowing for a 1% population tolerance (based on 2010 PUMA populations) for 2010 vintage PUMAs (see the PUMA comparability section for details on PUMA vintage). This has changed roughly 6% of households away from "... indeterminable (mixed)" identifications, improving metro status identification.
Expanded variables.
Edited variables.
Added variables.
February 12, 2020
- The 2018 5-year American Community Survey and Puerto Rican Community Survey data are now available.
- Users should note that due to a 2018 data collection error in New Castle County, DE there are higher allocation rates for the following topics: ancestry, commuting, disability, employment status, fertility, health insurance, income and earnings, industry and occupation, marital history and status, poverty, presence of grandchildren, and veteran status. Ancestry and marital history were suppressed for the respondents in the affected area. This user note from the Census Bureau contains more information.
- A revised version of the 1880 full count Census file is available.
- HISTID is now available for the 1880 full count file.
Added samples.
1880 Full Count file.
Expanded variables.
Edited variables.
December 16, 2019
- The 2018 1-year American Community Survey and Puerto Rican Community Survey data are now available.
- Users should note that due to a data collection error in New Castle County, DE there are higher allocation rates for the following topics: ancestry, commuting, disability, employment status, fertility, health insurance, income and earnings, industry and occupation, marital history and status, poverty, presence of grandchildren, and veteran status. Ancestry and marital history were suppressed for the respondents in the affected area. This user note from the Census Bureau contains more information.
- New geographic variables measuring the level of urbanization in each PUMA have been added for samples from 2000 on. DENSITY is an index of local population density, giving the population-weighted geometric mean of census tract population densities in each PUMA. METPOP00 and METPOP10 index the size of commuting systems, giving the population-weighted geometric mean of the populations of CBSAs (metro and micro areas) in each PUMA. Specifically, METPOP10 summarizes the 2010 populations of 2013 CBSAs and noncore counties, and METPOP00 summarizes the 2000 populations of 2003 CBSAs and noncore counties. For a detailed overview of these measures, refer to the following working paper: Across the Rural-Urban Universe: Two Continuous Indices of Urbanization for U.S. Census Microdata.
- SSMC is now available for the 2017 5-year ACS/PRCS datasets.
Added samples.
Added variables.
Expanded variables.
September 25, 2019
- Preliminary versions of the 1860 and 1870 full count census files are now available. More information on these 100% Population Databases can be found on the Sample Description page and on the IPUMS Complete Count Data page.
- A new variable, SAMPLE, replaces DATANUM. Though the last two digits in SAMPLE do not correlate exactly with the now-deprecated DATANUM, the variable serves the same purpose of assigning a unique id to all cases that belong to the same dataset. SAMPLE also provides value labels for clarity. Extracts that get revised containing DATANUM will automatically have SAMPLE instead, so there should be minimal impact on data extracts. However, personal code that references the deprecated variables will need to be updated.
- VERSIONHIST is now available. VERSIONHIST specifies the version of the historical census file that was used for harmonization and is currently available on the IPUMS USA website.
- DEGFIELD and DEGFIELD2 are now available for the 2013-present multi-year ACS files.
- HISTID is now available for the 1850 full count file.
- Top- and bottom-code documentation, which was previously disseminated on dataset-specific webpages is now accessible in a single consolidated table. Additionally, the original codebooks webpage has been streamlined. Both of these resources can be accessed through the User's Guide.
- The variable description for FAMSIZE now includes clarification about cohabiting partners.
- A user caution regarding unmarried partners has been added to the HHTYPE variable description.
- In the 1850 full count file, IND1950 cases that were coded as 999 are now coded as 0.
- PERWT in the 1910 1% file and HHWT in the 1930 1% file were referring to the wrong underlying variables. This is now fixed.
- NSUBFAM and SUBFAM are now topcoded at 9 in the 1880 full count file to fix truncated values. This approach is consistent with the 1910-1940 full count files and affected two and nine cases, respectively.
- Negative SURSIM values were being assigned to a small number of cases (less than 100 per dataset) in the 1880, 1900, 1910, 1940 full count census files. This has been corrected.
Added datasets.
Added variables.
Expanded variables.
Improved documentation.
Edited variables.
March 25, 2019
- Variables from the Urban Transition Historical GIS Project are now available for the 1880 Full Count file. The Urban Transition Historical GIS Project, directed by John Logan, Professor of Sociology at Brown University, assigns GPS coordinates (XGPS and YGPS) to 1880 Full Count file households in 39 cities for use in spatial analysis. Households that are part of the project can be identified by the variable UTP.
- A minor error affecting a small number of cases in BIRTHYR has been fixed for the 1900 Full Count file. Previously, some 9999 (missing/blank) values were being erroneously coded as 1999.
Added variables.
Edited variables.
March 7, 2019
- The 2013-2017 ACS/PRCS 5-Year files are now available. These files include all cases in the previously-released single-year files from the 2013, 2014, 2015, 2016, and 2017 ACS/PRCS. More information on the datasets can be found on the Sample Description page for the 2017 5-year ACS and the 2017 5-year PRCS.
- OCC1990 has been revised for the 2010 ACS samples onwards. Previously, Office Financial Clerks, All Others (5165) were denoted as OCC1990 = 383, Bank tellers - this has been revised so that they are categorized as 389, Administrative support jobs, n.e.c. Similarly, Transportation Security Screeners (3945) were denoted as OCC1990 = 36, Inspectors and compliance officers, outside construction - this has been revised so that they are categorized as 426, Guards, watchmen, doorkeepers.
- HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS are now available for the 2017 1-year ACS sample.
Added samples.
Edited variables.
January 31, 2019
- We held a webinar describing IPUMS USA on Thursday, January 31, 2019. The webinar provided an overview of the datasets and topics available in IPUMS USA, explored website features for customizing data extracts, and shared some useful insider knowledge. If you were unable to attend, we have posted the video and slides.
Webinar on IPUMS USA.
January 29, 2019
- A revised version of the 1940 full count Census file is available.
- In order for researchers to identify individual records across version releases of the full count files HISTID can now be used to consistently identify records within a full count file.
- Alaska and Hawaii were not included in this version of the 1940 full count release. Since Alaska and Hawaii were United States territories at the time of the 1940 Census (April 1, 1940), both were enumerated using the Territorial Census form. Due to this difference in enumeration, Alaska and Hawaii will be available in a later release of the 1940 full count census file after further harmonization.
- An improved version of SLREC which identifies sample-line respondents is now available.
- The names of three county variables have changed. COUNTY is now COUNTYICP, COUNTYFIPS is now COUNTYFIP, and NHGISJOIN is now COUNTYNHG. These new names are more intuitive and also parallel the naming convention of the corresponding state variables (STATEICP and STATEFIP), which will reduce confusion for users. Extracts that get revised containing the old variable names will automatically have the new ones included, so there should be minimal impact on data extracts. However, personal code that references these variables will need to be updated. This blog post sheds more light on the history of the original names and why we decided to change them.
- Migration and place-of-work variables are now available for samples that use 2010-based PUMA boundaries. This includes the addition of eight new variables and the expansion of two existing variables.
- MIGMET131, MIGMET135, MIGMET13ERR, PWMET13, and PWMET13ERR are new variables that were modelled after MET2013 and MET2013ERR.
- The standards put in place for the 2010 Migration/Place-of-work PUMA definitions require that the boundaries follow county boundaries. Due to this change, we created the variables MIGCOUNTY1, MIGCOUNTY5, and PWCOUNTY. Only a small number of cities have boundaries that are also county boundaries; consequently, IPUMS does not plan to extend MIGCITY1 or PWCITY into 2012 and later ACS samples.
- Because counties are not identified in source public-use microdata from 1950 onwards, IPUMS instead identifies counties wherever a set of geographic units identified in source microdata (e.g., county groups or PUMAs) correspond exactly to a whole county. Through this process, there should be no mismatch errors (if a county is identified, then all county residents--and only county residents--are coded as residents), but some mismatches were identified in 1980. As a result, we investigated COUNTYICP and COUNTYFIP and made the following corrections, which can be viewed in more detail in this spreadsheet.
- There were numerous errors in 1980 samples, with many cases of partially identified counties, unnecessarily unidentified counties, and in Maryland, some misidentified counties due to overlooked ICPSR/FIPS discrepancies.
- 1990 and later samples had a small number of issues. There were a few missed counties (Anchorage, AK) and missed omission errors, including partial reporting of Orleans Parish after Hurricane Katrina.
- In 1970, there were 6 counties that could be identified but weren't.
- The COUNTYFIP code for District of Columbia, which was incorrectly given as 010, has been corrected to 001.
- The source 2012, 2013, and 2014 ACS/PRCS PUMS data files from the Census Bureau (including multi-year files that span one or more of these years) contain known migration and place-of-work PUMA coding errors. To streamline use, IPUMS corrected the erroneous values in the affected source variables and harmonized variables. The codes tabs for MIGPUMA1 and PWPUMA00 explain the errors, and this document contains the person-level corrections.
- The value labels for METRO, MIGTYPE1, MIGTYPE5, and PWTYPE were revised to use consistent metropolitan area terminology. This clarified the codes and did not change the meaning.
- PWTYPE was fixed for a small number of cases in the 2005-2011 ACS/PRCS samples. Respondents who reported a place-of-work in Puerto Rico are now coded as 9 instead of 0. There were also seven place-of-work PUMAS (statefip,pwpuma; 22,22777; 27,02703; 27,02714; 36,03603; 48,04807; 48,04816; 48,04820) that were coded 0 when they should have been coded 9.
- The PWMETRO code for Hattiesburg, MS was changed from 3285 to 3300 to be consistent with METAREA and MIGMET.
- 2000 PUMA-PWPUMA-MIGPUMA relationship files, which were previously only available in HTML or PDF form, are now available in machine-readable form. Additionally, 2000-2010 Migration and Place-of-work PUMA crosswalks are available. These files can all be accessed through the geographic tools page.
- Links to the Census Bureau's 2010 PUMA composition and equivalency files have been added to the 2010 PUMA geographic tools page. We also added correct population totals in the 2000-2010 PUMA crosswalk for the PUMAs in Louisiana that were merged following Hurricane Katrina.
- The documentation for relevant migration and place-of-work variables now clarifies that the questions were only asked of respondents who did not live in the house or apartment in the reference period. Differences in code lengths between MIGPUMA (3 digits) and MIGPUMA1 (5 digits) have also been clarified.
1940 Full Count file.
Renamed county variables.
New and expanded migration and place-of-work variables.
Edited variables.
Improved documentation.
November 6, 2018
- The 2017 1-year American Community Survey and Puerto Rican Community Survey data are now available. Users should note that due to hurricane activity, data collection was temporarily suspended in Puerto Rico and certain parts of the US and may lead to higher margins of error in the impacted areas. This user note from the Census Bureau contains more information.
- The shapefile and composition file for CNTYGP97 have been updated to correct several incorrectly defined country groups. A complete list of these changes can be found here: 1970 county group changes.
Added samples.
Corrected 1970 county group files.
October 30, 2018
- Due to a minor adjustment in data processing our imputation of IMPREL has changed for less than 0.001 percent of cases in datasets from 1850-1910. For datasets where relationship to head RELATE was not collected (1850-1870), this small number of cases also have different family interrelationship values consistent with their new IMPREL values.
Edited variables.
October 1, 2018
- A revised version of the 1850 full count Census file is now available. This release includes improved GQ and GQTYPE. NHGISJOIN is also now available for this dataset.
- IPUMS allocates missing values for nearly all pre-1950 datasets, with the exception of the 1900 1%, 1900 1% oversample, 1910 1%, 1910 1.4% oversample, and 1940 1%. This release includes an upgrade to the process for allocating missing values in historical datasets, and with the upgrade the original allocated values could not be exactly replicated. Our improvements changed no more than 2% of cases in a given dataset.
- A complete list of variables that receive allocated values for missing data is as follows: AGE, AGEMARR, BIRTHMO, BIRTHQTR, BPL, CHBORN, CHSURV, CITIZEN, CLASSWKR, EMPSTAT, FARM, FBPL, FMTONGUE, GQ, GQFUNDS, GQTYPE, IND1950, LABFORCE, LANGUAGE, LIT, MARRNO, MARST, MBPL, MMTONGUE, MORTGAGE, MOUNEMP, MTONGUE, NATIVITY, OCC, OCC1950, OCCSCORE, OWNERSHP, RACE, RACESING, RELATE, SCHOOL, SEI, SEX, SPEAKENG, YRIMMIG, YRSUSA1, and YRSUSA2.
- These allocation improvements also indirectly affected other variables primarily through universe restrictions (for example, a new allocated value for MARST may result in that individual not being part of the universe for DURMARR) and largely fall under these categories: technical, demographic, family interrelationship, occupational standing, economic characteristics, migration, race and ethnicity and nativity, as well as corresponding data quality flags. For these variables, less than 1% of values were changed.
- Relationship to head (RELATE) was not collected in 1850, 1860, and 1870 so IPUMS imputes these values. We improved our IMPREL procedures for distinguishing between "Siblings" and "Non-Relatives" resulting in fewer siblings and more non-relatives. Preliminary analysis shows that IMPREL has changed for about 5% of cases.
- Due to the changes in the IMPREL variable, there have also been indirect changes to variables that build upon family interrelationships. These variables are: AGE, BIRTHMO, BIRTHQTR, BIRTHYR, BPL, CITIZEN, ELDCH, FAMUNIT, FARM, FBPL, GQ, HISPAN, HISPRULE, IMPREL, IND1950, LIT, MARRINYR, MOMLOC, MOMRULE_HIST, NATIVITY, NCHILD, NCHLT5, NCOUPLES, NFATHERS, NMOTHERS, NPBOSS50, NUMPERHH, POPLOC, POPRULE_HIST, PRESGL, QBIRTHMO, QMARINYR, QRACE, QRELATE, QSEX, RACE, RACESING, RELATE, SPLOC, SPRULE_HIST, STEPMOM, STEPPOP, and YNGCH.
- Our analysis shows that these changes affected no more than 6% of cases. Three variables changed at higher rate: NSIBS (9%), NFAMS (10%) and FAMSIZE (14%) but these reflect small shifts in these continuous variables.
- BIRTHYR is now available for the 1900 US 5% sample and the 1930 Puerto Rico 1% sample.
- SPLIT and DWSIZE are now available for the 1880 full count.
- We now more accurately distinguish between edits made by Census, logical computer edits by IPUMS, and hot deck allocation by IPUMS in the data quality flag variables. A complete list of improved variables is as follows: QBIRTHMO, QBPL, QDURMARR, QFBPL, QFMTONG, QGQ, QGQFUNDS, QGQTYPE, QLANGUAG, QMARST, QMBPL, QMMTONG, QMTONGUE, QOCC, QOWNERSH, QRACE, QRELATE, QSEX and QYRIMM. See our Introduction to Data Editing and Allocation for more information.
- We improved the identification of non-occupations for the 1870 datasets. This resulted in fewer cases getting assigned a value of zero for SEI.
- In the 1900-1910 full counts and the 1910 PR 20% sample, IND1950 non-industry values are now coded to non-classifiable.
- All 1930 datasets that contain GQSTR now contain improved processing so more strings can be identified.
- There is now consistent enforcement of the universe for LANGUAGE in the 1930 datasets, OCC1950 in the 1860 datasets, SUPDIST in the 1920 US 1% sample, IND1950 1920 PR 20% sample, and ENUMDAY and ENUMMO in the 1970 US 1.4% oversample.
- GQTYPE in the 1940 full count was improved so that non-group quarters households can only receive a value of zero. The change in GQTYPE affects a number of demographic, family interrelationship, household composition, veteran status, and race, ethnicity and nativity variables. The percent of cases that change as a result of this programming improvement does not exceed 0.2%.
- We implemented better code for identifying household fragments in historical datasets. This affected a small number of cases in NUMPERHH, GQ, GQFUNDS, and GQTYPE.
- IPUMS no longer allocates values for GQ and GQTYPE in the 1880 US full count.
- We corrected a minor error for BPL in the 1870 US 1% sample that introduced a string character ("Z").
- 1900 full count households were being erroneously included in the GQTYPE question universe. These households are now classified as NA (non-group quarter households).
- In the 1900 US full count, all children under 1 year of age were having their AGEMONTH values allocated even if they had reported a valid value. This issue has been corrected and the new data reflect the transcribed enumeration forms.
- QOWNERSH in the 1900 US full fount was showing values of 5 and 6 when they should have been 4. This is now fixed.
- In the 1900 US full count, QBIRTHMO values of 0 were being output as a string character ("Z"), which has now been fixed.
- Households that included last name strings similar to "missing" (e.g. "??", "!", etc.) would occasionally lead to gaps in the SURSIM order. The logic has been corrected to prevent these gaps. In the 1920 full count, the last person in the household was intermittently coded as having the same SURSIM as the head even if they were not related to the head. This fix led to very minor changes in RELATE-dependent variables.
- HHWT and PERWT for 1910 US 1% were showing incorrect values and are now correct.
- Due to a coding error in the 1930 US 1% and 5% samples, YRSUSA2 values were being assigned to the bins incorrectly. This is now fixed.
- A coding error that did not account for implied decimals in SLWT in the 1940 full count has been fixed.
- Coding for NUMPERHH was erroneously modified for the 1900, 1920-1940 full counts to count the number of person records listed under the household record. The variable now correctly reports the number of people listed as being a part of the household on the original census form before any splitting of large group quarters, as originally intended.
- An error in CBSFTYPE swapped values of 1 and 2 in the 1970 PR samples. This is now fixed.
- CBPERNUM in the 2010 US 5-yr had missing values for what should have been values 01-09. This is now fixed.
- Coding for RACE in the 1970 Metro samples had a minor error that has now been fixed. This also resulted in a small number of cases changing for RACESING and QRACE.
- The metadata erroneously included an OCC2010 value of 1810 which affected the 2010-2011 PRCS 3-year and 2010-2013 PRCS 5-year samples. Values that had been assigned 1810 are now correctly assigned 1800.
Edited samples.
Improved allocation procedures.
Improved imputation of RELATE (IMPREL).
Expanded variables.
Variable improvements.
Fixed errors.
July 25, 2018
- An error in the layout of source variables is now fixed.
- LANGUAGE is now available for ACS and PRCS 2016 5-year samples.
- An error in CBSFTYPE swapped values 1 and 2 in 1970. This is now fixed.
- SCHOOL had an error in the 1900 Full Count file. We are working on fixing this issue and have hidden the variable for the time being. Users are advised to use SCHLMNTH until the fix has been implemented.
Edited variables.
June 12, 2018
- HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS are now available for the 2016 1-year ACS sample.
- HIURULE has been expanded to 2 digits in order to accommodate 2016 ACS questionnaire changes.
- CIHISPEED and similarly CIDATAPLN have been harmonized across the period 2013-2016.
- LANGUAGE is now available for ACS and PRCS 2016 1-year samples.
Edited variables.
April 4, 2018
- Revised versions of the 1900, 1910, 1920, 1930 and 1940 full count Census files are available.
In order for researchers to identify individual records across version releases of the full count files HISTID can now be used to consistently identify records within a full count file. - OCC1950 and IND1950 are no longer allocated as detailed in the editing and allocation procedures for pre-1940 and the 1940 complete count census file for the 1900, 1910, 1920 and 1930 complete count census files.
- Due to the preliminary nature of the complete count census files, although OCC1950/IND1950 string values are available for most observations on the restricted use Census files, many have yet to be coded and are currently designated 979.
- To limit confusion and possibilities of erroneous allocation, these codes will no longer be allocated until string values are coded in a later version.
- Any addition revisions are documented below by Census file:
- Universe statement applied to values for LIT
- CNTRY is now available.
- Additional edits have been applied to OCC1950 to better identify Farm Laborers, wage workers (820's)
- Removed some blank and erroneous observations.
- Missing values for RACE are now allocated.
- 999's for IND1950 are now designated as 0's (N/A or none reported)
- Removed duplicative entries that were present in the previous version.
- Made corrections to bad coding for AGEMONTH
- Universe statement applied to values for GQTYPE, which affects GQ, GQTYPE and GQFUNDS
- Edits to correct values for NUMPREC and HHTYPE
- WARD is now available. (Note: Ward is not available in the current 1910 and 1930 versions)
- CNTRY is now available.
- Universe statement applied to values for GQTYPE, which affects GQ, GQTYPE and GQFUNDS
- Edits to correct values for NUMPREC and HHTYPE
Historical Complete Count Census File Revisions.
1900 Full Count file.
1910 Full Count file.
1920 Full Count file.
1930 Full Count file.
1940 Full Count file.
February 8, 2018
- The 2012-2016 ACS/PRCS 5-Year files are now available. These files include all cases in the previously-released single-year files from the 2012, 2013, 2014, 2015 and 2016 ACS/PRCS. More information on the datasets can be found on the Sample Description page for the 2016 5-year ACS and the 2016 5-year PRCS. Note that changes in the 2016 1-year ACS files have resulted in changes for the following variables in the 2012-2016 ACS/PRCS 5-Year files: COMMUSE and TOILET.
Added samples.
January 19, 2018
- The 2016 1-year ACS and PRCS samples' POVERTY values have been updated to correct a processing error with family income adjustment values. This error did not affect any other variables.
Edited variable.
January 16, 2018
- The 1900 100% database is now available. More information on the 100% Population Databases can be found on the Sample Description page and on the IPUMS Complete Count Data page.
Added sample.
November 1, 2017
- The 2016 1-year American Community Survey and Puerto Rican Community Survey data are now available. Due to changes in the 2016 ACS questionnaire, LANGUAGE will be released at a later date.
- Computer and internet access variables have been added to reflect a change in the 2016 ACS questionnaire: CISMRTPHN, CITABLET, CIHISPEED and CIDATAPLN. The questionnaire text for CIHISPEED effectively includes the variables CIMODEM, CIDSL, CIFIBER. The variables CIHAND, COMMUSE and TOILET are no longer included in the ACS questionnaire.
Added samples.
Added variables.
August 22, 2017
- The variables CBSERIAL and CBPERNUM are now available for ACS/PRCS samples from 2005-onward and can be used to merge IPUMS USA records with the corresponding records from the harmonized Census Bureau PUMS files.
- CPI99 value changed from 0.690 to the correct value 0.703 for samples from 2015.
- The data quality flag QMARST now accurately identifies cases allocated by the Census Bureau. The 2013, 2014, and 2015 1-year ACS samples had previously identified logical edits to the "Married" category (identifying married individuals as having a spouse present or absent based on the SPLOC pointer variable) as allocations.
- As a part of the new family pointer methodology, household heads can now be in the same FAMUNIT as unrelated individuals (if they are likely partners or same-sex spouses, unidentified in the unharmonized data files). FTOTINC now includes incomes from those "unrelated" individuals.
Added variables.
Edited variables.
June 14, 2017
- The 1910 100% database is now available. More information on the 100% Population Databases can be found on the Sample Description page and on the IPUMS Complete Count Data page.
Added sample.
May 25, 2017
- The IPUMS constructed family interrelationship variables, SPLOC, MOMLOC, POPLOC, SPRULE, MOMRULE, and POPRULE have been re-vamped (see details). These changes also affect the other IPUMS-CPS constructed family interrelationship variables, NCOUPLES, NFAMS, NFATHERS, NMOTHERS, NCHILD, NCHLT5, NSIBS, ELDCH, YNGCH, FAMSIZE, and FAMUNIT. Versions of these variables prior to this change are available on our website but not in the extract system. These archived versions can be found here.
- An error was corrected in STEPMOM and STEPPOP in the 1940 and 1950 1% samples. Some records with suspicious age differences between them and the identified step parent (value of 1) were being recoded to having explicitly identified a step parent (value of 3). Some records that had a value of 3 for these variables now have a value of 1.
- A programming improvement has altered some values for SURSIM for some records in the 1950 sample. No groups are changed, but SURSIM groups are now numbered sequentially in households.
- MIGTYPE5 was being incorrectly recoded based on the size of urban error in the 1970 Form 2 Neighborhood sample and 1970 Puerto Rico Neighborhood sample. Some records that had a value of 9 now have values of 1 or 2.
- A programming error resulted in incorrect values for HINSIHS in the 2010 ACS 3-year sample. Several records that had a value of 1 now have a value of 2. QHINSIHS reflect the fix made to HINSIHS; some records that had a value of 1 now have a value of 0.
- Errors were fixed in several health insurance variables the 2011 ACS and PRCS 3-year samples.
- HINSCAID: Some records that had a value of 2 now have a value of 1.
- QHINSCAI was updated to reflect the fix in HINSCAID. Some records that had a value of 0 or 3 now have a value of 2.
- HINSCARE: Some records that had a value of 1 now have a value of 2.
- QHINSCAR has been updated to reflect the fix in HINSCARE. Some records that had values of 0 or 3 now have a value of 2.
- HINSTRI: Some records that had a value of 1 now have a value of 2.
- QHINSTRI has been updated to reflect the fix in HINSTRI. Some records that used to have a value of 2 now have a value of 0 or 4.
- An error in RELATE in the 1970 and 1980 Puerto Rico samples has been fixed. Records that used to have a value of 1284 now have values ranging from 1291 to 1294.
- Several ACS samples from 2008 to 2015 contain records whose RELATE was updated from other relative (1001) to sibling-in-law (801).
- An error in RELATE in ACS 5-year samples from 2010 and 2011 has been fixed. In these samples, records from 2006 and 2007 were not identified as parent-in-laws when they should have been sibling-in-laws. These records values changed from 601 to 801.
- An error was corrected in the variable GQ in ACS and PRCS 3-year samples from 2010 and 2011. In these samples, in-laws were being counted as non-relatives. Several records which had values of 2 and 5 now have a value of 1. The ACS 3-year sample from 2012 had a similar error. In this sample in-laws and other relatives were being counted as non-relatives.
- GQ is used to modify GQTYPE. In The ACS 3-year samples in 2010-2012 and the PRCS 3-year sample from 2010, some records that had a value of 0 now have a value of 900.
- An error was corrected in the variable WRKLSTWK PRCS 3-year samples from 2011 and 2012. Some records that had a value of 0 now have a value of 3.
- Errors were found and corrected in INCOTHER in the Puerto Rico samples from 1990 where some records were being coded as 0 which should have had non-zero values. In the 2011 3-year PRCS sample, some records were being incorrectly topcoded. These records now have non-topcoded values.
- An error was corrected in HHINCOME in the 2000 5% sample, the 2000 1% (old version) sample, the 2000 1% unweighted sample, the 2000 Puerto Rico 1% sample, and the 2000 Puerto Rico 5% sample. Some records have been recoded to NIU to address a universe restriction.
- An error was fixed in the variable NSUBFAM for samples from 1970 to 2013. Some records that had values of 2 now have values of 1.
- An error was corrected in the 1970 Puerto Rico samples in the variable COSTFUEL. Records that had a value of 0 are now coded as 9994 and 9996.
- An error was corrected in the 1970 Puerto Rico samples in the variable COSTWATR. Records that had a value of 0 are now coded as 9995.
- An error was corrected in the 1970 Puerto Rico samples in the variable COSTGAS. Records that had a value of 0 are now coded as 9994 and 9996. All other values are multiplied by 12.
- Two errors were corrected in the RACE variable. In the 1970 neighborhood samples, there is no way to differentiate different race codes for Alaska from the non-Alaska records. We are unable to differentiate Alaskan natives from Koreans and Native Hawaiians. As a result, values 371 and 372 are no longer available and these records now have values of 620 and 630. In the 2012 5-year ACS sample, programming was not being executed for records from years 2008-2010.
- The inability to differentiate between Alaska and non-Alaska records in the 1970 neighborhood samples affects RACAMIND, RACASIAN, RACPACIS, and PROBAI as well. In RACAMIND and PROBAI fewer records are identified as American Indian. More records are identified as Asian and Pacific Islanders in RACASIAN and RACPACIS, respectively.
- The inability to differentiate between Alaska and non-Alaska records in the 1970 neighborhood samples also affects RACESING. Records that used to have values of 33 or 34 now have values of 44 or 46 in these samples. RACESING in the 2012 5-year ACS sample is updated to match the fix to RACE in this sample.
- An error in programming was corrected in the LANGUAGE variable in the 2007 3-year ACS sample. Some records that had codes of 0 now have codes of 1.
- An error was corrected in the SCHLTYPE variable in the 1960 5% sample. All records with age over 34 now have a value of 0.
- An error in several occupation variables in Puerto Rico samples in 1970-1990. In OCC1950, OCC1990, OCCSCORE, and SEI. Armed forces are now separated from NIU records. These records used to have values of 999 and are now coded as 595.
- An improvement in data processing tools has increased the precision with which we are able to handle double type variables. This improvement has led to some records from multi-year samples having slightly different adjusted values for a number of variables. These differences are summarized below:
- MORTAMT1 in the 2009 ACS 5-year and PRCS 5-year samples
- PROPINSR in the 2009 ACS 5-year and PRCS 5-year samples
- RENTGRS in the 2009 ACS 5-year and PRCS 5-year samples
- CONDOFEE in the 2009 ACS 5-year and PRCS 5-year samples
- OWNCOST in the 2009 ACS 5-year and PRCS 5-year samples
- COSTFUEL in the 2009 ACS 5-year and PRCS 5-year samples
- COSTWATR in the 2009 ACS 5-year and PRCS 5-year samples
- HHINCOME in ACS and PRCS 3-year and 5-year samples from 2009-2011
- INCWAGE in ACS and PRCS 3-year and 5-year samples from 2009 and 2011.
- INCSS in the 2009 ACS 5-year and PRCS 5-year samples
- INCWELFR in the 2009 ACS 5-year and PRCS 5-year samples
- INCINVST in the 2009 ACS 5-year and PRCS 5-year samples
- INCEARN in ACS and PRCS 3-year and 5-year samples from 2009-2011
- INCOTHER in the 2009 ACS 5-year and PRCS 5-year samples
- INCBUS in ACS 5-year and PRCS 5-year in 2009 and ACS 3-year and 5-year samples from 2011
- INCRETIR in ACS 5-year and PRCS 5-year in 2009 and ACS 3-year and 5-year samples from 2011
- FTOTINC in ACS 3-year and 5-year and PRCS 3-year and 5-year samples from 2009-2011
- POVERTY in ACS 5-year in 2009, ACS 3-year and 5-year in 2010, the 2000 5% sample, and 1980 5%, 1%, Urban/Rural, Labor Market Area, and Detailed metro/non-metro samples
- An improvement in data processing tools and a more sophisticated random number generator has led to altered values for some records in the variables HHWT, PERWT, and SLWT in 1940-1960 samples.
- QMORTG2A has been updated to include the value 4 for allocated cases in ACS and PRCS samples.
- Due to a programming improvement, some records that had a value of 4 in QBPL in all 1980 samples are now correctly being coded to 1 "Allocated, original entry fell outside acceptable range (select variables only) ".
- Due to a programming improvement, the value 3 (Logical computer edit by IPUMS) is now available in QRELATE and QRACE in 1970-2015 samples.
- USA samples from 1940 to the present (not including the 100% 1940 file) now include source variables. Source variables are unique to each sample and generally correspond to the variables in the source datasets published by the Census Bureau. The source variable codes and labels are not consistent across samples. Each integrated IPUMS variable description has a link to the source variables that served as the inputs for it. The unharmonized variables are also accessible in a comprehensive list using the menu buttons on the variables page. Thus researchers can get both the integrated and source forms of specific variables.
Edited variables.
Added variables.
March 22, 2017
- CITYPOP is now available for the 2015 ACS sample.
- The 2015 ACS and PRCS samples' POVERTY values are now being generated with updated family income adjustment values to more closely resemble those used by the Census Bureau.
Expanded variables.
Edited variables.
February 13, 2017
- The 2011-2015 ACS/PRCS 5-Year files are now available. These files include all cases in the previously-released single-year files from the 2011, 2012, 2013, 2014, and 2015 ACS/PRCS. More information on the datasets can be found on the Sample Description page for The 2015 5-year ACS and The 2015 5-year PRCS.
- The Multi-year files differ in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to the most recent interview year, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
- Many improvements were made to the IPUMS-USA Geographic Tools resources and documentation including:
- Merged distinct Alaska and Hawaii boundary files into one file with the contiguous U.S.
- In several boundary files, corrected the boundaries for Ketchikan Gateway Borough, Alaska, to represent land and water areas accurately
- Standardized boundary file names, fields, and codes.
- Updated shapefile metadata to use consistent style and content in ArcGIS format and added current IPUMS-USA citation.
A note about these revisions has also been added to the IPUMS-USA shapefile metadata.
Added samples.
Updated Geographic Tools.
December 2, 2016
- Household-level full count data is now available for the years 1790-1840. This data was made available to researchers through IPUMS from Ancestry.com. County-level summary data is also available for these years to help researchers identify missing areas and other issues that are present in the household-level microdata. More information can be found here.
Added samples.
November 29, 2016
- IPUMS-USA posted new Health Insurance Unit variables for the 2015 1-year American Community Survey. The variables HIUFPGBASE, HIUFPGINC, HIURULE, HIUID, and HIUNPERS are now available for the 2015 1-year ACS sample.
- A small error in INCINVST has been corrected. Missing values were coded as -99999 in the 2015 ACS/PRCS when they should have been coded as 999999.
Expanded variables.
Edited variable.
November 2, 2016
- The 2015 1-year American Community Survey and Puerto Rican Community Survey data are now available.
- GQ and GQTYPE were improved based on entries from the original census form for the 1920 and 1930 100% databases. These changes affect variables that use GQ as a predictor such as family interrelationship variables and AGE, SEX, and RELATE allocation.
- These data quality flags were corrected for ACS/PRCS 2013-2014: QCIBRDBND, QCIDIAL, QCIDSL, QCIFIBER, QCIHAND, QCILAPTOP, QCIMODEM, QCINETHH, QCIOTHCOMP, QCIOTHSVC, QCISAT, QRWATPR. A value of one had been assigned instead of a value of four.
Added samples.
Edited variables.
September 12, 2016
- The 1920 and 1930 100% databases are now available. More information on these 100% Population Databases can be found on the Sample Description page and on the IPUMS Complete Count Data page.
- A 5% sample of the 1930 Puerto Rico census is now available. This a flat 1-in-20 sample drawn from the 1930 Puerto Rican census population.
- The variable VET1930 has been revised and expanded to more accurately reflect veteran status and additional data available in the 1930 100% database. VET1930 has now been expanded to 8 categories with the inclusion of new categories for Mexican Expedition and Civil War veterans respectively. The category "Spanish-American and World War I" has now been generalized to included veterans of "World War I and other war".
Added samples.
Edited variables.
March 8, 2016
- The 1960 5% sample was drawn from a restored and improved 25% internal long form microdata file after an extensive collaboration between the U.S. Census Bureau and the Minnesota Population Center to recover, integrate, and disseminate data from the 1960 decennial census. For more information on the 1960 Restoration Project and improvements in the new 5% sample, see the 1960 Restoration Project page.
- The following geography variables are now available for the 2012-2014 ACS/PRCS samples: COUNTYFIPS, COUNTY, PRCOUNTY, METRO, METRO, MET2013, MET2013ERR, CITY, CITYERR, HOMELAND
- Household serial numbers in 1850 100% were corrected to be unique across the full database.
- MIGCOUNTY was corrected for 1940 100%.
Added samples.
Expanded variables.
Edited variables.
February 17, 2016
- The Multi-year files differ in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to the most recent interview year, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
- The 2010-2014 ACS/PRCS 5-Year files are now available. These files include all cases in the previously-released single-year files from the 2010, 2011, 2012, 2013, and 2014 ACS/PRCS. More information on the datasets can be found on the Sample Description page for The 2014 5-year ACS and The 2014 5-year PRCS.
- The 2009-2013 ACS/PRCS 5-Year files are now available. These files include all cases in the previously-released single-year files from the 2009, 2010, 2011, 2012, and 2013 ACS/PRCS. More information on the datasets can be found on the Sample Description page for The 2013 5-year ACS and The 2013 5-year PRCS.
- The 2011-2013 ACS/PRCS 3-Year files are now available. These files include all cases in the previously-released single-year files from the 2011, 2012, and 2013 ACS/PRCS. More information on the datasets can be found on the Sample Description page for The 2013 3-year ACS and The 2013 3-year PRCS.
Added samples.
February 8, 2016
- The 1850 100% database is now available. More information on the 1850 100% Population Database can be found on the Sample Description page and on the IPUMS Complete Count Data page.
- The 1940 100% database was updated. The primary improvement to the 1940 100% database was the removal of a number of duplicate person records.
- MIGCOUNTY gives the county of residence 5 years prior to enumeration and is currently only available for the 1940 100% database.
- SPLIT, SPLITHID, and SPLITNUM were added as an integrated set of variables for identifying individuals from large dwellings that were split up during the processing stage of the complete count datasets (currently not available for 1880 100%, see SPLIT80, SERIAL80, and PERNUM80 for the 1880 100% counterparts).
Added samples.
Edited samples.
Added variables.
Edited variables.
December 17, 2015
- IPUMS-USA posted new Health Insurance Unit variables for the 2014 1-year American Community Survey. The variables HIUFPGBASE, HIUFPGINC, HIURULE, HIUID, and HIUNPERS are now available for the 2014 1-year ACS sample.
- RACESING and the related PROBAI, PROBAPI, PROBBLK, PROBOTH, and PROBWHT variables have been updated for USA samples from 2000 to 2012 to better integrate detailed race categories introduced after the initial creation of RACESING. These variables are now available for the 2013 and 2014 ACS samples as well.
- PUMA 77777 in Louisiana (STATEFIP 22) is now correctly associated with the "New Orleans-Metairie, LA" Metropolitan Area (code 35380) in MET2013 for samples from 2006 to 2011.
Expanded variables.
Edited variables.
November 17, 2015
- The 2014 1-year American Community Survey and Puerto Rican Community Survey data are now available. The 2014 ACS/PRCS includes a correction to the MIGPUMA1/PWPUMA00 variables. The 2012 and 2013 PUMS files released by the Census Bureau incorrectly assigned duplicate MIGPUMA1/PWPUMA00 codes to northwestern Wisconsin (Ashland, Bayfield, Burnett, Douglas, Iron, Price, Rusk, Sawyer, Taylor, and Washburn Counties) and Dane County [code 00100], as well as Gwinnett County and Richmond County, Georgia [code 04000]. These areas now have unique codes in the 2014 ACS PUMS file with Gwinnett County, Georgia taking the new code 04007 and Dane County, Wisconsin taking the new code 00104.
- The 2010 10% U.S./Puerto Rico samples are now available. These samples are considerably different from previous Decennial Census samples. As the ACS/PRCS has officially replaced the Decennial long-form survey, the 2010 Decennial Census only included the short-form survey. As a result, the 10% samples from 2010 include far fewer variables. More information can be found in the Sample Description for 2010.
- CPUMA0010 is a new, consistent PUMA variable, which harmonizes the 2000 and 2010 PUMA definitions to create consistent geographic boundaries. More information can be found on the CPUMA0010 geography page.
- ACS multi-year samples' POVERTY values now being generated with updated family income adjustment values to more closely resemble those used by the Census Bureau.
- Corrected MIGRATE1 programming error to accurately differentiate people who moved within MIGPUMAs from those who moved between MIGPUMAs.
- Non-earners in 2010, 2011, and 2012 multi-year files were being given values of -9999 for the variable INCEARN. These respondents are now being correctly given INCEARN values of 0.
- RENT in the 1940 100% dataset fixed to correctly identify individuals with "No Cash Rent".
- MET2013 and MET2013ERR have been extended to the 2005-2011 ACS/PRCS samples as well as the 2000 U.S./Puerto Rico 5% and U.S. unweighted 1% samples.
Added samples.
Added variables.
Edited variables.
Expanded variables.
July 1, 2015
- CITYPOP is now available for the 2012 and 2013 ACS samples.
- VET75X90 is now available for all samples from 1980 onward.
- CITY and CITYPOP in the 2000 Decennial Census and 2005 through 2011 ACS samples now identify Garland, TX (2450)
- MIGRATE5 now codes respondents who lived in American Samoa within the last five years as migrating from "Abroad" (MIGRATE5==40) in the 1990 Puerto Rican Census samples.
- A small number of unmarried householders receive SPLOC values in the 1880 10%, 1880 100%, 1900 5%, 1920 1%, 1920 Puerto Rico, 1930 1%, and 1930 5% samples due to conflicting RELATE information from other household members. Previously these households were given an HHTYPE code of 0 (N/A), and they are now given a code of 9 (HHTYPE could not be determined).
- Eighteen cases in the 1880 1% sample were corrected to resolve conflicts between Birth Quarter (BIRTHQTR) and Birth Month (BIRTHMO). This also resulted in updated Birth Year (BIRTHYR) values for 2 of these cases.
- A small number of cases were mistakenly being coded as Not In Universe for the variable CHBORN (Children Ever Born) in the 1900 5% sample. These cases have been updated resulting in adjusted CHSURV (Children Surviving) values as well.
- NUMHH now correctly topcoding households of more than 98 people in the 1930 1% and 5% samples.
- The 1930 1% and 5% samples previously included LANGUAGE values of 9900 and 9700. These values are now correctly recoded to 0 and 9600 respectively.
- Due to enumeration error, a small number of Black individuals in the 1880 100% dataset were incorrectly coded as Chinese. These cases have now been corrected.
- Twenty-eight cases in the 1900 5% sample previously had erroneous GQTYPE values. These cases have now been corrected.
- For 2005-onward CITYPOP is now based on the ACS estimates generated from the contemporary year.
- ELDCH and YNGCH now include topcoded values in 1960 and 1970 samples.
- QFBPL and QRELATE in the 1940 100% dataset were improved to identify specific types of editing of the FBPL and RELATE variables, respectively.
- QWKSWORK, the data quality flag of WKSWORK1 and WKSWORK2 now accurately indicates allocated values in the 2013 ACS and PRCS samples.
- VALUEH now using the correct NA value of 9999999 in 1940 1% sample (previously 09999999).
- SCHOOL now has the corrected universe restriction in the 1960 1% sample.
- SAMESEA5 now has the correct universe restriction in 1940 1% sample.
- QGRADEAT, the data quality flag for GRADEATT, now accurately identifying allocated values in the 1970 samples.
- A small number of occupations in the 2013 ACS sample were previously missing values for EDSCOR90 and ERSCOR90. This issue has been corrected and all occupations now have appropriate values.
- Errors in the COUNTY variable were corrected for the 2012 and 2013 ACS samples.
- CPI-U adjustment values were corrected in multi-year ACS samples. The affected variables are:
- CONDOFEE, COSTELEC, COSTFUEL, COSTGAS, COSTWATR, FDSTPAMT, HHINCOME, MOBLHOME, MORTAMT1, MORTAMT2, OWNCOST, PROPINSR, RENT, RENTGRS, FTOTINC, INCBUS00, INCEARN, INCINVST, INCOTHER, INCRETIR, INCSS, INCSUPP, INCTOT, INCWAGE, INCWELFR, and POVERTY.
Corrected values do not differ from the previously released data more than 1 or 2 dollars.
- CHBORN is used to refine the linking of children to mothers (MOMLOC), however, in 1960 CHBORN was not collected for never-married women resulting in far fewer never-married mother links. The MOMLOC logic has been adjusted to account for this. The adjustment only affected 818 households.
Expanded variables.
Edited variables.
December 17, 2014
- The preliminary version of the 1940 100% Database is available. More information on the preliminary 1940 100% Population Database can be found on the Sample Description page and on the IPUMS Complete Count Data page.
- The variables HIUFPGBASE, HIUFPGINC, HIURULE, HIUID, and HIUNPERS are now available for the 2013 1-year ACS sample.
- The IPUMS-USA family pointer variables have been improved to better accommodate same-sex married couple households introduced in the 2013 ACS/PRCS sample. Details about the interaction between the IPUMS family pointers and the same-sex married couple households can be found here.
Added samples.
Expanded variables.
Edited variables.
November 5, 2014
- The 2013 1-year American Community Survey and Puerto Rican Community Survey data is now available. The 2013 ACS/PRCS includes new variables and notable changes from previous ACS microdata samples, which are detailed below.
- The 2013 ACS/PRCS included 11 new variables addressing household computer/internet access: CILAPTOP, CIHAND, CIOTHCOMP, CINETHH, CIBRDBND, CIDIAL, CIDSL, CIFIBER, CIMODEM, CISAT, CIOTHSVC
- Starting with the 2013 PUMS release, Same-Sex Married Couples are no longer being recoded to unmarried partner. The new Household-level variable SSMC identifies same-sex married couple households, including respondents logically allocated as same-sex married couples. (The 2012 PUMS included a data quality flag identifying same-sex married couples that had been recoded, see QRELATE = 9 "Same sex spouse changed to unmarried partner").
- The 2013 PRCS no longer includes the variable HOTWATER. The new variable RWATPR indicates whether or not households in Puerto Rico have running water.
- The 2013 ACS introduced the new internet response mode (see RESPMODE). Additionally, the Census Bureau altered their "Failed Edit Follow Up" procedures, as addressed in the ACS User Notes section of the Census Bureau's website.
- Veteran Status questions were changed slightly. The detailed VETSTAT codes do not distinguish between veterans who were on active duty in the past year or prior to the past year. Also the questions associated with the variables VET75X80 and VET80X90 have been aggregated into the new variable VET75X90.
- Starting with the 2013 ACS/PRCS, IND and INDNAICS are now based on the 2012 NAICS coding scheme.
- The ACS/PRCS question addressing whether or not a household received Food Stamps (see FOODSTMP) was altered to include the new program name, the Supplemental Nutrition Assistance Program (SNAP).
Added samples.
Added variables.
Adjusted variables.
August 13, 2014
- Geography variables were updated for the 2012 1-year ACS sample using the new 2010 Decennial Census Based PUMAs. The updated variables include: METRO, CITY, CITYERR, CITYPOP, MET2013, MET2013ERR, COUNTY, HOMELAND, PWPUMA00, MIGPUMA1.
- COUNTY and HOMELAND had been released previously but have since been improved to include more identifiable counties.
- The variable METAREA, which identifies metro area based on the Office of Management and Budget (OMB) definitions in use at the time of the Decennial Census and the 1999 OMB definitions for the ACS/PRCS samples, was not created for the ACS 2012 sample. Instead, the new variable MET2013 was created, which identifies metro areas based on the OMB 2013 definitions. Furthermore, the new MET2013 variable allows for up to 15% combined omission and commission error, whereas METAREA identifies metro areas only for residents of areas that lie entirely within a single metro area For more information on how METAREA and MET2013 differ see the MET2013 Comparability Statement.
- Because the new MET2013 variable allows for a certain level of error, the variable MET2013ERR was created to allow user's to impose a more restrictive error limit if desired.
- The CITY variable was updated for 1990 onward U.S. samples and the 2000 onward Puerto Rico samples with less strict identification protocols and a CITYERR variable was generated to identify the level of mismatch error for each CITY code resulting from this new protocol.
- The new 100% 1880 Census file contains the new variable OCCHISCO, which provides occupation using the Historical International Standard Classification of Occupations (HISCO) coding scheme.
- Several minor edits to singular cases correcting entry errors in the 1880 100% file. Most notably, most of the cases formerly coded as "Adopted, n.s." (RELATE code 0304) have been changed to "Adopted Child". This change also affects the variables STEPMOM and STEPPOP as some children coded as "Adopted, n.s." were considered step-children when adoption status could not be determined.
- The variables PWPUMA00 and MIGPUMA1 had been released previously with truncated values. The truncation has been reversed and these variables now accurately represent the data.
- Some codes in the variable RACE were changed to more appropriately group some American Indian/Alaskan Native categories. The affected categories include: South American Indian, Mexican American Indian, Other Specified American Indian Tribe (2000, ACS), Two or more American Indian Tribes (2000, ACS), Alaskan Athabaskan, Aleut, Eskimo, Alaskan Mixed, Inupiat, Yup'ik, Other AN tribe(s) (2000, ACS), Both AI and AN (2000, ACS), and AIAN, Tribe not specified.
Expanded variables.
Added variables.
Edited variables.
April 8, 2014
- The 2008-2012 ACS/PRCS 5-Year files are now available. These files include all cases in the previously-released single-year files from the 2008, 2009, 2010, 2011, and 2012 ACS/PRCS. The 5-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2012 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Information specific to the new 2008-2012 release follows:
- There are many new variables available in the 2008-2012 ACS/PRCS that were not included in the 2007-2011 ACS/PRCS including: SHOWER, FRIDGE, HOTWATER, SINK, STOVE, TOILET, DIFFCARE, DIFFHEAR, DIFFSENS, DIFFEYE, DIFFMOB, DIFFPHYS, VETDISAB, DIFFREM, HINSEMP, HINSEMP2, HINSPUR, HINSPUR2, HINSCARE, HINSCARE2, HINSCAID, HINSCAID2, HINSTRI, HINSTRI2, HINSVA, HINSVA2, HINSIHS, HINSIHS2, HCOVANY, HCOVANY2, HCOVPRIV, HCOVPRIV2, HCOVPUB, HCOVPUB2, DIVINYR, MARRINYR, MARRNO, and WIDINYR.
- COUNTY, HOMELAND, HIUFPGBASE, HIUFPGINC, HIURULE, HIUID, and HIUNPERS, are now available in the extract system for the 2012 1-year ACS.
- PRCOUNTY is now available for the 2012 1-year PRCS.
- We are continuing to work on other improvements to the geography of the 2012 ACS data.
- Some codes in the variable BPL were changed to more appropriately group some Pacific Island nations. The affected categories include: Christmas Island, Cocos Island, Fiji, Melanesia n.s., Norfolk Islands, Niue, Pitcairn Island, Tokelau, and Tuvalu.
- Some codes in the variable RACE were changed to more appropriately group some American Indian/Alaskan Native categories. The affected categories include: South American Indian, Mexican American Indian, Other Specified American Indian Tribe (2000, ACS), Two or more American Indian Tribes (2000, ACS), Alaskan Athabaskan, Aleut, Eskimo, Alaskan Mixed, Inupiat, Yup'ik, Other AN tribe(s) (2000, ACS), Both AI and AN (2000, ACS), and AIAN, Tribe not specified.
- Corrected adjustment values for the 2011 3-year and 5-year ACS samples. This correction effects variables that represent dollar values such as INCWAGE.
- Corrected values of QHINSCAR, QHINSTRI, and QHINSCAI which were being incorrectly coded to zero.
- For samples using the 2000 Census based PUMAs, corrected mismapping of North Carolina PUMA 4300 to Super PUMA.
- Corrected miscodings for the variables MIGRATE1 and MIGRATE5 to accurately reflect moves between contiguous and non-contiguous states.
- For samples using the 2000 Census based PUMAs, identified Shelby COUNTY (1170) in Alabama.
- The variable SEI was corrected to accurately account for persons in the Armed Forces in every sample.
- Corrections were made to the CHBORN/CHSURV allocation process for the 1900 and 1910 1% sample and oversamples, such that most CHSURV values are now less than or equal to CHBORN.
- For the samples from 1880-1930, RELATE codes for some boarders and lodgers were being coded as "Relative of Employee" in cases where there was no employee in the household. These codes have been corrected. There were also several cases where an original RELATE code was missing and allocated, but the following individual RELATE codes within the household were incorrectly edited to Boarder/Lodger. This has been fixed to reflect the original input values for RELATE. This change to RELATE also affects variables that were generated based on family interrelationship.
Added samples.
Expanded variables.
Edited variables.
February 17, 2014
- Posted new 2010-2012 ACS/PRCS 3-Year data. These files include all cases in the previously-released single-year files from the 2010, 2011, and 2012 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2012 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. The 2010-2012 ACS/PRCS are generally similar to the 2009-2011 ACS/PRCS data, with several note-worthy differences:
- For the 2012 data, the Census Bureau changed the PUMA boundaries based on the 2010 Census data. As noted below for the 2012 ACS/PRCS, most of the 2012-based PUMAs cannot be mapped directly to the 2000-based PUMAS used in prior ACS releases. As a result of these changes the following IPUMS variables will be released at a later date: APPAL, CITY, CONSPUMA, COUNTY, HOMELAND, METAREA, METRO, PRCOUNTY, PUMASUPR, MIGMET1, MIGCITY1, MIGPUMS1, MIGTYPE1, PWCITY, PWMETRO, PWPUMAS, PWTYPE, and CITYPOP. The variables PUMA, MIGPUMA1, and PWPUMA00 are available. These variables contain the 2000 PUMA codes for year 2010 and 2011, and the 2010 PUMA codes for 2012. In addition to these differences, please see the revision note on December 27th for more information about coding changes to several variables that will affect the 2012 cases including PLUMBING, PHONE, FERTYR, RACE, TRIBE, OCC, ANCESTR1, ANCESTR2, BPL, MIGPLAC1, and LANGUAGE.
Added samples.
December 27, 2013
- Posted new 2012 1-year American Community Survey and Puerto Rican Community Survey data. Together, the 2012 samples contain over three million person records. The 2012 ACS is the seventh ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2012 data. The 2011 and 2012 ACS releases are similar, but there are a couple of notable differences:
- The Census Bureau changed the PUMA boundaries based on the 2010 Census data. Most of the 2010-based PUMAs cannot be mapped directly to the 2000-based PUMAs used in prior ACS releases. As a result of these changes the 2012 data for the following IPUMS variables will be released at later date: APPAL, CITY, CONSPUMA, COUNTY, HOMELAND, METAREA, METRO, PRCOUNTY, PUMASUPR, MIGMET1, MIGCITY1, MIGPUMS1, MIGTYPE1, PWCITY, PWMETRO, PWPUMAS, PWTYPE, and CITYPOP. The variables PUMA, MIGPUMA1, and PWPUMA00 are available and represent the 2010 PUMA codes.
- Due to data collection problems, several variables contain incomplete data. PLUMBING is not available on the 2012 PRCS. Data on PHONE is not available for 6 PUMAs in Georgia; the suppressed data is coded 8. FERTYR also contains a large number of suppressed cases.
- The Census Bureau continued the practice of recoding married same sex couples from married to unmarried partners. They now provide the information needed to identify the recoded couples which can be found in QRELATE (9 "Same sex spouse changed to unmarried partner")
- Several changes to the RACE and TRIBE variables should be noted. The Census Bureau added codes to RAC2P for Hopi alone, Mexican American Indian alone, Yup'ik alone, and South American Indian alone while collapsing codes for Colville, Delaware, Houma, Menominee, Paiute, Yakamia and Yuman into "Other specified American Indian tribes alone." Several other RACE codes were added, including Bhutanese, Burmese, Mongolian, Nepalese, Marshallese, and Fijian. The code for Chinese was split into two codes, one for Chinese and one for Taiwanese.
- Slight changes were made to the OCC codes, mainly the combining several lesser used occupation codes.
Added samples.
May 7, 2013
- New, final versions of 1930 sample data are available. These include 5% and 1% samples. The 1% sample was drawn from the 5%, but there are minor differences in allocated values. Modifications to the previous version of the 1930 sample data focused on the following areas:
- Detailed Geography. Much of this work related to reassessing breaks between enumeration districts (ENUMDIST). This also resulted in corrected values for minor civil division and incorporated municipality information (MCDSTR and INCSTR).
- Occupation and Industry Codes (OCC, OCC1930, OCC1950, IND, IND1930, and IND1950). Most of this work involved assigning codes to previously unclassified records. Consistency checks were also applied that resulted in the correction of some misclassified records. Changes to the occupation codes also resulted in modifications to variables that rely on the occupation codes as input (e.g., occupational standing variables such as OCCSCORE).
Edited samples.
April 30, 2013
- A new version of the 1940 1% sample is now available. The new 1940 release includes corrections as well as new data. Corrections were made to 374 person records that had been identified as living in Missouri that actually lived in Detroit, Michigan. Necessary changes were made to the relevant geographic and migration variables.
- New geographical variables were added to the 1940 1% data that are no longer restricted by confidentiality requirements: COUNTY, METDIST, CITYMETD, URBAN and URBPOP data are now available. County, city, minor civil division, ward, tract and enumeration district information has also been added as two new sets of string variables, one that contains "clean", standardized strings (STDCNTY, STDCITY, STDMCD, STDWARD, STDTRACT, STDED) and one that records the strings exactly as they were entered (CNTYSTR, MCDSTR, WARDSTR, INCSTR). Also for the household record, the string variable GQSTR has been added, which contains the original group quarters response as it was entered.
- New string variables have also been entered for all but 137,588 person-level records in the 1940 1% data. The records with the missing data can be identified using the SUBS4050 variable and selecting subsamples 2 and 20. The remaining data for those two subsamples will be added in the future. The new person-level string variables are: occupation and industry (OCCSTR and INDSTR), usual occupation and industry (UOCCSTR and UINDSTR), where the respondent was living in 1935 (MST5STR, MCNY5STR and MCIT5STR), and five other demographic variables (RELSTR, BPLSTR, FBPLSTR, MBPLSTR, MTONGSTR) .
- The variable EDUC has been updated to reflect corrections by the Census to ACS 2001 and 2002 single-year files. The educational attainment question changed on the 1999 ACS questionnaire, which modified the response categories and eliminated the choice of "Vocational, technical, or business school degree." Previously the 2001 and 2002 single-year IPUMS data dictionary incorrectly showed labels for categories 65, 71 and 82 as "1 or more years of college credit, no degree," "2 years of college: Associate's degree - occupational program," and "2 years of college: Associate's degree - academic program," respectively. The correct data dictionary labels for categories 65, 71, and 82 are "Some college, but less than 1 year," "1 or more years of college credit, no degree," and "2 years of college: Associate's degree, type not specified," respectively.
- For the 2007-2011 American Community Survey 5-year file the variable IND had incorrect values for the cases from 2011 due to a programming error. This error has been fixed.
Edited samples.
Edited variables.
February 4, 2013
- Posted new 2007-2011 ACS/PRCS 5-Year files. These files include all cases in the previously-released single-year files from the 2007, 2008, 2009, 2010, and 2011 ACS/PRCS. The new 5-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2011 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
Added samples.
January 23, 2013
Edited variables.
December 13, 2012
- Posted new 2009-2011 ACS/PRCS 3-Year files. These files include all cases in the previously-released single-year files from the 2009, 2010, and 2011 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2011 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. The 2009-2011 ACS/PRCS are quite similar to the 2008-2010 ACS/PRCS data, except that the variables WRKLSTWK, DEGFIELD, and DEGFIELD2 are now included in the 2009-2011 ACS/PRCS files.
- In addition, the supplementary health insurance variables have been added to the 2011 ACS 1-year file. These five new variables are: HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS. These summary health insurance variables were constructed by SHADAC. For more detailed information, consult the variable descriptions.
Added samples.
Expanded variables.
October 30, 2012
- Posted new 2011 American Community Survey and Puerto Rican Community Survey data. Together, the 2011 samples contain over three million person records. The 2011 ACS is the sixth ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2011 data. The 2010 and 2011 ACS releases are remarkably similar, but there are a couple of notable differences:
- Data quality flags are now available for the following variables: FTOTINC, RENTGRS, HHINCOME, OWNCOST, INCEARN, INCTOT.
- In order to address concerns about fluctuations in the group quarters populations of small areas, in 2011 the Census Bureau supplemented the group quarters population in the ACS with a large-scale whole person imputation. Roughly as many group quarters persons are imputed as interviewed. See the ACS Group Quarters Small Area Estimation user note for more details on this change. Although this should have little impact on weighted estimates of the group quarters population, users should note that unweighted frequencies of the group quarters population are larger which increases the unweighted counts of people in the "NA" category of most household variables.
Added samples.
September 7, 2012
- QMARINYR and QYRNATUR were not updated to include data from ACS/PRCS 2008 to 2010. These flags are now available.
- YRSUSA2 was wrong for a substantial number of cases in the 2006-2010 ACS file due to a programming error. This error has now been fixed.
- DEGFIELD and DEGFIELD2 were updated to include new codes for 2010 ACS/PRCS. Users interested in comparing DEGFIELD or DEGFIELD2 over time should know that there may be different codes for the same field of degree across samples. For example, Neuroscience changed from code 4003 in 2009 to 3611 in 2010. In DEGFIELD and DEGFIELD2, IPUMS preserves each sample's full range of codes.
Expanded variables.
Edited variables.
July 9, 2012
- Released supplementary health insurance variables for the 2008-2010 ACS 1-year files. These five new variables are: HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS. These summary health insurance variables were constructed by SHADAC. For more detailed information, consult the variable descriptions.
Added variables.
April 23, 2012
- Some of the codes for the variable PUMARES2MIG displayed incorrect values due to a programming error. This error has now been fixed. The PUMARES2MIG codes were previously incorrect for the following states and PUMAS: Arkansas - 1000; California - 2601, 2602, 6701, 6702, 8101-8116, 8200; Kansas - 1401-1403, 1500; New Jersey - 701-703; Oklahoma 1100, 1200; Washington 2001-2009.
Edited variables.
March 27, 2012
- Released a preliminary version of the 1930 5% sample. Coding of string variables is still ongoing, with much of this work focused on the occupation and industry variables. We expect to release the final version in July.
- Also, FARMSCHD in the 1930 1% sample had been coded incorrectly. The error has been corrected.
Added samples.
Edited variables.
March 21, 2012
- American Community Survey and Puerto Rican Community samples from 2006-2010 have been updated to include minor revisions to the POVERTY variable. For individuals with a group quarters (GQ ) code of 4, about 4.5% of individuals were incorrectly omitted from the universe. This error has been fixed .
Edited variables.
March 13, 2012
- IPUMS USA samples from 1960 to the present have been updated to include CLUSTER and STRATA variables. For the 1960-2000 samples, strata were created based on the stratification criteria used to select Public Use Microdata Samples such as household size, age, race, ethnicity, home ownership, group quarters membership, and vacancy status. For the American Community Survey (ACS) samples, strata were created based on the lowest level of geography available in each sample. For the 2000-2004 samples, each state forms a stratum. In the 2005 onward ACS samples, strata were defined as unique Public Use Micro-data areas (PUMA). For more information on the creation of STRATA, see this page: Construction of Strata in the IPUMS Samples.
Expanded variables.
Edited variables.
February 20, 2012
- For ACS and PRCS samples from 2000-2010, several allocation flag variables displayed incorrect values. The variables QINCOTHE and QEDUC were inaccurate for all of the ACS/PRCS samples. In addition, QHISPAN was incorrect for 2000-2004, QCOSTWAT had inaccurate codes for 2006-2010, and QDIFCARE was incorrect for the 2008-2010 samples.
- For all of the 1980-2010 samples, the COUNTY codes 90 and above (except Baltimore City) displayed inaccurate codes for Maryland state.
- For the 2008-2010 ACS and PRCS samples, parents were incorrectly coded as parents in law in the RELATE variable.
Edited variables.
January 23, 2012
- Added new 2006-2010 ACS/PRCS 5-Year files. These files include all cases in the previously-released single-year files from the 2006, 2007, 2008, 2009, and 2010 ACS/PRCS. The new 5-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2010 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Information specific to the new 2006-2010 release follows:
- Similar to the 2008-2010 ACS/PRCS, the data from the Census Bureau contained two different sets of occupation codes for the variables OCC and OCCSOC. The 2006-2009 cases contain the 2005-2009 ACS occupation codes, whereas the 2010 case contain the 2010 ACS occupation codes (a crosswalk of these changes is available at our Occupation and Industry Variables page). We provide a harmonized version in OCC1990. The original values can be found in the OCC and OCCSOC variables, although users should note that those variables contain codes that differ by the survey year, as described above. The 2006-2010 data also span minor changes made by the Census Bureau in 2008 to the classification of industries. The new classification system results in the addition of one industry code (6672), modification of one industry code (6670), and the deletion of two industry codes (6675, 6692) to the variables IND and INDNAICS.
- Imputed relationship variables (IMPREL, IMPMOM, IMPPOP, IMPSP were previously unavailable for the 1860 and 1870 samples with oversamples. They are now available.
Added samples.
Expanded variables.
December 21, 2011
- Added new 2008-2010 ACS/PRCS 3-Year files. These files include all cases in the previously-released single-year files from the 2008, 2009, and 2010 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2010 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
- One notable difference compared to the 2007-2009 3-Year file is that health insurance and disability variables are now included. Also, we have enhanced the data from the Census Bureau in a couple of important ways. We have included health insurance edits to the 2008 and 2009 cases and we provide integrated occupation codes. The data from the Census Bureau contained two different sets of occupation codes for the variables OCC and OCCSOC. The 2008-2009 cases contain the 2005-2009 ACS occupation codes, whereas the 2010 case contain the 2010 ACS occupation codes (a crosswalk of these changes is available here Occupation and Industry Variables). We provide a harmonized version in OCC1990. The original values can be found in the OCC and OCCSOC variables, although users should note that those variables contain codes that differ by the survey year, as described above.
- For the 2010 samples, a small number of values were reassigned to the variables OCC1990 and OCC1950 because of new information. This coding change affects less than one percent of cases and has a relatively minor impact on the occupational standing measures. Any extract including these measures made between November 2nd and December 20 should be requested again.
- For the US samples from 2000 to 2010, the values of QINCWAGE and QINCSS were reversed. The programming error has been fixed and the data now displays the correct values.
Added samples.
Edited variables.
November 17, 2011
- For the 1940-2010 samples, incorrect values were given to the variable CPI99, which provides the CPI-U multiples to convert dollar figures to constant 1999 dollars. The programming error has been fixed, and all samples now display the correct CPI-U multiplier value. Any extracts including CPI99 made between November 2 and November 17 should be requested again.
Edited variables.
November 2, 2011
- Posted new 2010 American Community Survey and Puerto Rican Community Survey data. Together, the 2010 samples contain over three million person records. The 2010 ACS is the fifth ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2010 data.
- The lowest level of geographic identifier in the 2010 ACS is the PUMA; 2010 PUMAs have the same boundaries as those in the 2005-2009 ACS and the 2000 census samples. The IPUMS version of the 2010 ACS provides the following additional geographic identifiers: CITY, COUNTY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. Additionally, information on unrelated subfamilies --a category not measured by the Census Bureau--is available. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
- The 2009 and 2010 ACS releases are quite similar, but there are some differences:
- New codes have been added to reflect Census Bureau changes to occupation (OCC). IPUMS is working on documenting the changes to the occupation codes; until then, users can consult the original Census Bureau data dictionaries.
- New codes have also been added to reflect Census Bureau changes to the field of degree first and second entry variables (DEGFIELD, DEGFIELD2). The code for "Neuroscience" has been changed from 4003 in 2009 to 3611 in 2010 and the code for "Multi-Disciplinary or General Science" has changed from 4008 in 2009 to 5098 in 2010. New codes have been added to reflect "Multi/Interdisciplinary Studies" (4000), "Materials Science" (5008), and "Miscellaneous Fine Arts" (6099). In addition, "Precision Technologies" was previously included in code 5801 and is now included in code 5701.
- For Puerto Rico only, the maximum category available changed for several variables. Individuals who are 93 or older are coded as 93 in 2010, number of bedrooms (BEDROOMS) is top-coded at 7 bedrooms, and number of rooms (ROOMS) has a maximum category of 11.
- For occupation, 1990 basis (OCC1990), judges could not be distinguished from lawyers in the original Census Bureau data between 2005 and 2009, so they were grouped with lawyers as code 178. Code 179 for judges is available again for 2010.
- One person was coded as 67 ("Two or more races") in the original Census Bureau variable RAC2P and 18 ("Filipino alone") in the original Census Bureau variable RAC3P. We assigned a code of 883 (Filipino and 'other race' write-in) in the IPUMS (RACE) variable.
- New versions of every sample have also been posted. The migration variables (MIGRATE1 and MIGRATE5) have been fundamentally revised. These variables have been available since 1940, but the original Census Bureau variables have contained progressively less information over time. For example, in the 2000-onward ACS/PRCS, individuals are simply coded "same house," "different house in the U.S.," or "different house outside the U.S." in the original census data. However, it is possible to construct additional detail about these movers from other census variables, in particular MIGPLAC1 and MIGPLAC5. In the past, users interested in additional migration detail in later samples have needed to manually recode these other variables. Users interested in comparing the migration variables across time have confronted a non-harmonized coding scheme and comparability differences across both years and samples. The current revised versions of MIGRATE1 and MIGRATE5 simplify both tasks by:
- Adding migration information by incorporating details from other variables. Without recoding or selecting other variables, users now have access in MIGRATE1 and MIGRATE5 to relevant detail from variables detailing previous place of residence (MIGPLAC1 and MIGPLAC5), current state of residence (STATEFIP), PUMAs of migration (MIGPUMA1 and MIGPUMA), and PUMAs (PUMA and PUMARES2MIG).
- Adapting a harmonized coding scheme across years and samples. Users interested in comparing the variables across time now may now use general and detailed codes that are consistent across samples to the extent that information is available in given samples. For example, whereas state contiguity was previously available for only the 1950 1% PUMS, all samples now contain codes that distinguish movers between contiguous states and movers between non-contiguous states.
- Including codes for movers who moved between PUMAs or moved within PUMAs for 2000 and 2005-onward samples.
See MIGRATE1 and MIGRATE5 for more information about these changes.
- Starting with the 2003 ACS, the Census Bureau began providing four-digit occupation codes (OCC). Because the first three digits replicated the previous occupation codes and the fourth digit was always a zero, the IPUMS eliminated that fourth digit for greater comparability with previous codes. The 2010 ACS/PRCS is the first sample to include substantive detail in the fourth digit. For greater comparability across ACS/PRCS samples, the IPUMS versions of the 2003-2009 ACS/PRCS data now include the fourth digit. Users who need to replicate analyses from prior extracts can safely drop the fourth digit, as it contains no necessary detail. As a reminder, the IPUMS also includes fully harmonized versions of occupation (OCC1950 and OCC1990).
- Due to a programming error, missing values of INCWAGE received codes of 996441 instead of the correct 999999 codes for all 2008 cases in the 2007-2009 and 2005-2009 multi-year data. This has been corrected.
- To avoid the potential identification of individuals, the Census Bureau collapses some Public Use Microdata Areas (PUMA) into larger PUMAs of migration (MIGPUMA1 and MIGPUMA). A new variable, PUMARES2MIG, adapts the Public Use Microdata Area codes for individuals' place of residence to the scheme for PUMA of previous residence. This allows the PUMA in which individuals lived previously (MIGPUMA/MIGPUMA1) to be compared directly to the PUMA in which individuals currently reside (PUMARES2MIG).
Added samples.
Edited variables.
Added variable.
August 18, 2011
- The 2005-2009 Puerto Rican Community Survey 5-year PUMS data are now available.
- The following errors in the 2007-2009 3-year ACS/PRCS and the 2005-2009 5-year ACS have been corrected:
- Persons with an employment status (EMPSTAT) detailed code of 14 ("Armed forces, at work") or 15 ("Armed forces, with job but not at work") should have received OCC1950 codes of 595 ("Members of the armed services") but instead received other occupational codes.
- The code for calculating POVERTY was not fully updated for these new multi-year samples. As a result, POVERTY codes were too high by about 65 percent on average in some cases. This error affected only persons who are not related to the householder, about 79 percent of whom (3 percent of the total cases) received erroneous POVERTY values.
- The sample density of the 2005-2009 5-year ACS was incorrectly specified in our metadata, and any users who customized their sample sizes in our extract system received numbers different from what they had requested.
- Subfamily measures are available for households with more than 9 persons unrelated to the householder (GQ codes of 5). This affects only the 2000-onward samples, and fewer than 200 households in each sample.
- Birth year (BIRTHYR) was not made available for the 1900 5% sample. It is now available.
Added samples.
Edited variables.
August 10, 2011
- The 2007-2009 and 2005-2009 multi-year American Community Survey data are now available, along with the 2007-2009 multi-year Puerto Rican Community Survey data. (Technical problems prevented the release of the 2005-2009 Puerto Rican Community Survey data; it will be released by August 18, 2011.) The 2007-2009 3-year file includes all cases in the previously-released single-year files from the 2007, 2008, and 2009 ACS/PRCS; the 2005-2009 5-year file includes all cases in the previously-released single-year files from the 2005, 2006, 2007, 2008, and 2009 ACS/PRCS. Yet the new multi-year files differ in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2009 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Information specific to the new 2007-2009 and 2005-2009 files follows:
- The original Census Bureau variable SMOCP (the IPUMS analogue is OWNCOST) contains erroneous values for mobile homes among the 2005 and 2006 cases in the 2005-2009 multi-year ACS data. Their suggested edit has been applied in the IPUMS data.
- The Census Bureau has documented data processing errors in the 2008 single-year ACS because questionnaire changes were not reflected in their editing procedures. While they state that the multi-year data products incorporate the corrected data, this does not seem to be the case, as the Census Bureau codes for the 2008 data appear to be the same in the single-year and multi-year files (except for necessary harmonization).
- Revised 2000 census data that includes the Census Bureau's corrections for incorrect AGE data are now available via the IPUMS. For the 5 percent United States, the 5 percent Puerto Rico, and the 1 percent unweighted US sample, the correct AGE values are now part of the AGE variable, while the previous+C165, incorrect values are contained in AGEORIG. All variables constructed by IPUMS are based on AGE, and IPUMS recommends that AGEORIG not be used for purposes other than testing the sensitivity of previous analyses. Uncorrected errors in the person identifiers of the original Census Bureau 1 percent sample for the United States and Puerto Rico prevented the revised data from being linked to the original data, and revised 2000 1 percent samples for the United States and Puerto Rico have now replaced the old files entirely. The original files with erroneous age values remain available for testing purposes.
- Information on years residing in the United States (YRSUSA1 and YRSUSA2) is constructed from survey year (YEAR / MULTYEAR) and year of migration (YRIMMIG) for all ACS/PRCS samples. The actual year of survey for the multi-year ACS/PRCS files was contained in YEAR until January 2010, when it was shifted to MULTYEAR. Our programming for YRSUSA1 and YRSUSA2 did not account for this shift. As a result, YRSUSA1 and YRSUSA2 contained errors (and conflicted with YRIMMIG) for all 2005 and 2006 cases in the 2005-2007 multi-year files, and for all 2006 and 2007 cases in the 2006-2008 multi-year files. This has been corrected.
- In all 2000-onward samples, IPUMS applied an inappropriate universe edit to the units in structure variable (UNITSSTR). The correct universe is all housing units that are not group quarters (GQ), but all units used for commercial purposes (identified in COMMUSE) were also recoded to the N/A category (UNITSSTR codes of 0). This error has been corrected, and we now perform a simple recode of the original Census Bureau variable without any additional edits. The overall distribution of UNITSSTR is very similar whether or not units used for commercial purposes are included in the universe, although estimates of the number of each type of unit in a given area will be different.
- A programming error resulted in incorrect values of GQ for households with many in-laws and other non-relatives in the 2008 and 2009 single-year ACS/PRCS. Many of these households are actually households under the 1970 definition but were instead classified as "additional households under 1990 definition" or "additional households under 2000 definition". The error affects approximately 6,500 (unweighted) person records and has been corrected. This error also affected GQTYPE and other variables where GQ is used in programming. In particular, there were erroneous POVERTY values of 0 for approximately 200 (unweighted) children under 14 in these affected households. This has also been corrected.
- Vacant units in the 1860 and 1870 samples were erroneously coded as non-vacant in the group quarters variable (GQ). This has been corrected.
- Housing units using a miscellaneous "other" fuel type for cooking should have received FUELCOOK values of 10. Instead, they received values of 1, the code reserved for housing units that do not use cooking fuel (FUELCOOK values of 1). This has been corrected.
- We have made miscellaneous improvements and clarifications to our documentation.
- Our extract system has been streamlined so that users see a summary of their extract first. If you just want a standard rectangular extract with no extra features, you can make it immediately. Or you can change aspects of your extract (data structure, customized sample sizes, case selection, attached characteristics of other household members, and data quality flags).
- Birth year (BIRTHYR), previously included in only the 1900 and 1910 samples, is now available for all samples. In most cases, it is calculated simply as the difference between survey year (YEAR/MULTYEAR) and AGE, although we have refined this crude calculation where additional information on the quarter of birth (BIRTHQTR) is available. Because of these inaccuracies and the tendency of people to report their age as a round number (particularly in the older samples), IPUMS recommends caution when using this variable. It can be used quite effectively in the extract system, however, to select synthetic cohorts (e.g., all people born between 1928 and 1932) that can then be followed through multiple census years.
Added samples.
Edited variables.
Edited website.
Edited extract system.
Expanded variables.
July 12, 2011
- The linked representative samples have been updated. The update primarily affects variables that were not present in the original Church of Jesus Christ of Latter-day Saints complete-count database for 1880: DEAF, BLIND, MAIMED, IDIOTIC, INSANE, SICKNESS, MARRINYR, SCHOOL, LIT, MOUNEMP, and QTRUNEMP. Previous versions of the data contained information for these variables if the record was part of the 1880 10% sample. The updated versions now contain information for these variables for all 1880 records.
Expanded variables.
June 15, 2011
- The IPUMS-USA extract system now allows users to customize their sample sizes. This is useful for researchers who do not need or cannot use the large number of cases contained in some IPUMS-USA samples. It can also be used to obtain small testing datasets before running a program on a large dataset. For more information, see the FAQ.
- In all 1950-2009 samples, some individuals under age 15 and in an unrelated subfamily were erroneously given POVERTY values of 0 because of a programming error. This has been corrected, and such individuals now receive the correct poverty value based on their subfamily's total income. This affects up to 20,000 cases in the 1980-2000 decennial samples, and approximately 5,000 cases in each of the single-year ACS files.
- In all years except 1940 and 1950, SLWT contains the contents of PERWT to facilitate cross-temporal analysis of variables on the sample line in 1940 and 1950. Due to a programming error, this was not implemented correctly for the 1900 1% and 1910 1.4% samples with oversamples, along with the 1900 1% sample. This has been corrected.
Edited extract system.
Edited variables.
May 19, 2011
- The Census Bureau's November 2010 revisions to the ACS/PRCS samples are now incorporated into the IPUMS:
- To address problems in the Census Bureau's disclosure avoidance procedures, AGE has changed in the 2003-2005 ACS/PRCS samples and for the 2005 cases in the 2005-2007 3-year file. Ages were subject to change only among people who were formerly coded as being at least 65 years old. (The exception is the 2004 ACS, in which several people who formerly had ages of less than 65 now have ages of at least 65.) Because the new ages appear to have been created via synthetic data techniques, IPUMS has marked AGE as allocated (QAGE codes of 4) for all people aged 65 and up. The former, erroneous ages are now contained in a new variable, AGEORIG, which allows users to analyze the effects of the age revisions on their own research. Please note that the age revisions for the 2006 ACS have been available via the IPUMS since January 2010, with the former, erroneous values of AGE contained in AGEORIG06. AGEORIG subsumes the variable AGEORIG06 and provides original values for all of the affected samples and years, including the original values contained in AGEORIG06. Researchers should use AGEORIG only for sensitivity analyses; AGE contains more plausible values for people's true ages.
- The Census Bureau's errors in adapting their editing procedures to the new 2008 ACS questionnaire have been corrected for the 2008 ACS/PRCS and for the 2008 cases in the 2006-2008 multi-year ACS/PRCS. (For more information about these errors, see ACS Errata note 53, note 54, and note 64.) These include the following household-level variables in the IPUMS:
- Number of rooms (ROOMS).
So that users can assess the sensitivity of their results to these corrections, the original, erroneous values are also provided (PHONEORIG, QPHONEORIG, KITCHENORIG, QKITCHENORIG, FRIDGEORIG, QFRIDGEORIG, BEDROOMSORIG, QBEDROOMORIG, and ROOMSORIG).
- Because of an unspecified Census Bureau error, the variable represented by MOVEDIN contained erroneous values in the 2004 ACS. This has been fixed; the original variable is contained in MOVEDINORIG so that users can assess how their results may have changed.
- In the processing of these revisions, several other variables changed slightly in the original Census Bureau data. These changes are not due to documented Census Bureau errors, and original variables were not created to preserve the changes. These changes fall into three broader categories:
- The Census Bureau veteran variables represented by VET01LTR, VET47X50, VET90X01, VETKOREA, VETOTHER, VETSTAT, and VETWWII changed slightly for some of the samples in which the Census Bureau revised AGE, partly as a result of the age revisions.
- Topcoded values in CONDOFEE, FTOTINC, HHINCOME, INCEARN, INCINVST, INCOTHER, INCWAGE, OWNCOST, RENTGRS, and RENT changed slightly in some of the revised samples.
- The changes in the AGE values affected the IPUMS family interrelationship variables, which rely on information about household members' ages. The IPUMS also uses AGE to classify the householder's in-laws, since the original Census Bureau data does not distinguish among parents-in-law, siblings-in-law, and children-in-law. Consequently, some people's values of RELATE may differ, although only in-laws are affected. See "In-Law Classification Procedures" for details.
The values contained in these variables are from the latest, revised Census Bureau PUMS file. Users interested in specific differences between the original release and the revised release may use the IPUMS archive page to compare the revised data with the older data.
- Several improvements were made to the 1900 census 5% sample:
- Dwelling size (DWSIZE) was corrected (about 16% of all values have changed).
- ENUMDIST and SUPDIST were reversed in the last version (i.e., ENUMDIST contained data for SUPDIST, and vice versa). This has been corrected.
- Some geography variables were improved. For example, about 900 household COUNTY, and 9500 household METRO codes changed (the METRO changes were mostly in Washington, DC.). About 6% more CITY values were populated.
- Data in PFARMSCH was corrected.
- PERWT values were integers in the old version. They now correctly have two decimal places.
- There were small corrections made to NSIBS.
- The allocation for OWNERSHP was corrected. In the old version the distribution between 'owned' and 'rented' for allocated did not follow the distribution for non-allocated values.
- The distribution for RELATE changed slightly; most notably, there are hundreds fewer foster children and hundreds more servants, "other probable domestic employees" and "other non-relatives".
- IPUMS now offers integrated versions of the original Census Bureau subfamily variables that parallel the subfamily variables constructed by IPUMS. Newly available variables include CBSFRELATE (relationship within the subfamily), CBSFTYPE (type of subfamily), CBSUBFAM (subfamily number), and CBNSUBFAM (number of subfamilies in the household). Users should note that the Census Bureau's procedures for classifying subfamilies have changed dramatically over time, so these variables are useful mainly for the comparability they offer with the Census Bureau's summary files. See our subfamilies page for more information.
- Information on the TRIBE of American Indians has been improved as well. Most notably, persons who were previously classified (incorrectly) as "Alaska Native, tribe not reported" in all 2000-onward samples are now classified correctly as "American Indian or Alaska Native, tribe not reported." Recoding improvements were made to the 1990, 2000, and ACS samples; labeling improvements were made to the 1900 and 1910 samples.
- Finally, because of improvements to our data construction, editing, and allocation procedures, many variables have been refined. In particular:
- In the 1940 and 1950 samples, Charleston, WV (CITY codes of 1070) was mistakenly identified as Charleston, SC (CITY codes of 1050) and was coded as having the population of Charleston, SC in CITYPOP. Charleston, WV is now identifiable in the IPUMS 1940 and 1950 samples and has the correct population in CITYPOP. Charleston, SC is not identifiable as a city in 1940 and 1950, and this is now reflected in CITY and CITYPOP.
- PUMA 00201 in Ohio was erroneously coded as being part of the Texarkana (Texas) metropolitan area in METAREA. It is now properly included in the Toledo (Ohio) metropolitan area.
- Riverside City (California) was improperly coded as Riverside County via COUNTY in the 1980 5% sample. Riverside County is not identifiable in this sample, and this is now reflected in the data.
- SURSIM contained nonsensical values in the Hispanic oversample (SAMP1910 codes of 7) in the 1910 1.4% sample with oversamples. This has been corrected, as have all variables based on SURSIM (such as the family interrelationship variables).
- QGQ and QGQTYPE were previously coded non-zero for vacant units (which are not in these variables' universes. This has been corrected.
- VETCIVWR contained nonsensical values in all 1910 samples. This has been corrected.
- CHBORN contained nonsensical values in the 1970 Puerto Rico samples. This has been corrected, as have all variables based on CHBORN (such as the family interrelationship variables).
- All values that are allocated by IPUMS-USA in the 1850-1930 samples have been assigned more accurately. In most samples, very few cases have actually changed. The improvements are most noticeable in the 1900 and 1910 samples, although typically fewer than 100 cases have different values of any one variable.
Edited samples.
Added variables.
Edited variables.
November 18, 2010
- Posted new 2008-2009 ACS data. Because of miscommunications with Census Bureau staff, the health insurance edit for VA (HINSVA) and Indian Health Service (HINSIHS) insurance was performed incorrectly. These variables and their accompanying flags are now correct, and documentation of the edit has been updated on the ACS health insurance page. Additionally, the edits for all health insurance variables have been applied to Puerto Rico data (this were not done before).
Edited variables.
November 10, 2010
- Posted new 2008-2009 ACS data. Because of a programming error, the data posted on Nov. 9 contained incorrectly edited variables for Medicaid (HINSCAID), Medicare (HINSCARE), and military insurance (HINSTRI) coverage, which affected the summary variables for any (HCOVANY), private (HCOVPRIV), and public (HCOVPUB) coverage. These variables and their accompanying flags are now correct.
Edited variables.
November 9, 2010
- Posted new 2009 American Community Survey and Puerto Rican Community Survey data, along with revised data for 2008. Together, the 2009 samples contain over three million person records. The 2009 ACS is the fourth ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2009 data.
- The lowest level of geographic identifier in the 2009 ACS is the PUMA; 2009 PUMAs have the same boundaries as those in the 2005-2008 ACS and the 2000 census samples. The IPUMS version of the 2009 ACS provides the following additional geographic identifiers: CITY, COUNTY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. Additionally, information on unrelated subfamilies--a category not measured by the Census Bureau--is available. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
- The 2008 and 2009 ACS releases are quite similar, but there are some differences:
- Health insurance variables are now edited for consistency, and these edits have also been applied to the 2008 ACS/PRCS data. The original (unedited) variables are also available through our extract system. For more information, see the ACS health insurance page.
- A new RACE code (869) describing persons who identify as both Japanese and Korean is available.
- Because of several errata in the original 2008 ACS PUMS, users should exercise caution in interpreting change over time in the number of rooms (ROOMS), the number of bedrooms (BEDROOMS), telephone service (PHONE), and kitchen facilities (KITCHEN). The revised version of the 2008 ACS PUMS will be available in IPUMS-USA in January 2011.
Added samples.
Edited variables.
October 20, 2010
- ENUMDIST from the 1880 IPUMS complete count database was updated in all 21 of the linked representative samples. In the previous versions, ENUMDIST from 1880 had a large proportion of missing values. This has been corrected.
Edited variables.
October 13, 2010
- Weights in the linked representative sample for males, 1880-1930 have been revised. In the previous version of the male file, PERWT was constructed with erroneous age proportions for 1930. The problem was corrected and PERWT recalculated.
Edited variables.
September 7, 2010
- Geographic identifiers are now available for selected counties (COUNTY) for 1940-1950, 1970-2000, and 2005-onward. Although they are not identified in the original Census Bureau PUMS, counties with populations of at least 100,000 can be identified via other geographic identifiers in the data (the other counties receive codes of "0000"). COUNTY thus joins CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS in the list of variables constructed from census data and unique to the IPUMS.
- A new variable (NHGISJOIN) provides an alternate way of identifying counties in IPUMS-USA data from 1850-1930. It can be used to link IPUMS-USA microdata with aggregate data from the National Historical Geographic Information System (NHGIS), thus making contextual analysis easier for historical data.
- A new variable describing Appalachian residence (APPAL) is now available for 1850-1950, 1980-2000, and 2005-onward. Like COUNTY, it is built from available geographic identifiers.
- The syntax statements for YRIMMIG contained incorrect value labels. Single years were unaffected, but single years that also represented ranges of years in some samples were off by one year. For example, the code 1931 means that the respondent came to America in 1931 for all samples in which the code appears. However, for the 2005-onward ACS, the code 1932 means that the respondent came to America in 1931 or 1932 (in addition to its standard meaning of 1932 in other samples). The value label for codes of 1932 should have read "1932 (2005-onward ACS: 1931-1932)", but a formatting error yielded "1933 (2005-onward ACS: 1931-1932)" instead. The problem affected the value labels only; YRIMMIG data remain the same. All codes that represent ranges of years rather than individual years are now visible on the YRIMMIG codes page.
- YRIMMIPR has been streamlined to mimic YRIMMIG: ranges of years are now standardized such that the 4-digit code represents the latest year in which the respondent could have moved to Puerto Rico.
- Due to a programming omission, respondents classified as having a RACE of Aleut were erroneously coded as Hawaiian in RACESING for all 1970 samples, and Eskimos were coded as Korean. Aleuts and Eskimos now appear as Alaska Natives in RACESING.
- OWNCOST contained nonsensical data for the 2003 and 2004 ACS samples. Correct data are now available via the extract system.
- Two disability variables (DIFFREM and DIFFSENS) were omitted from the IPUMS versions of the 2003 and 2004 ACS. They are now available; however, users should exercise care when comparing them to earlier surveys because of question changes.
Added variables.
Edited variables.
June 8, 2010
- An error in the programming for the June 4 revision resulted in PERWT values of 0 for all cases in the 1880 1% and 1880 10% samples. These samples now contain the correct PERWT values.
Edited variables.
June 4, 2010
- Posted final versions of the IPUMS Linked Representative Samples. More information available here.
- CPI99 provides inflation factors to adjust dollar amounts into constant 1999 dollars. It is a constant value within each census year. CPI99 will be especially useful for extracts containing multiple years of data; users will need only to multiply dollar variables by a single inflation variable instead of manually multiplying different years of data by different constants. For more information, see our CPI adjustment page.
- For cases sampled from large households, NUMPERHH provides the number of people in that larger household. Several samples from 1850-1930 had errors in NUMPERHH. See the variable description for more information.
- For users' convenience, the standard weight variables for 1850-1930 will contain the more detailed weighting information previously available only in the detailed weights. Specifically, PERWT (and, aside from the 1940 and 1950 samples, SLWT) will contain values previously available in PERWTDET, and HHWT will contain values previously available in HHWTDET. PERWTDET and HHWTDET are no longer available. See the variable descriptions for full information.
- The new versions of PERWT and HHWT provide weight values to two decimals of precision for 1850-1930 data. Default weighting procedures in SPSS and SAS can work with fractional weights. When tabulating variables in Stata, however, users will need to specify that the weights are importance weights, which allow decimals. (The default weight in Stata's tabulate command is a frequency weight, which does not allow decimals.)
- There are slight improvements to the imputation of relationship to head (IMPREL), available only for 1850-1930 samples.
- Some residents of institutions (GQ codes of 3) were coded erroneously as being in the labor force (LABFORCE codes of 2) in the 1850-1930 samples. All such cases are now coded as NIU (LABFORCE = 0) or as not in the labor force (LABFORCE = 1).
- IMPREL, IMPMOM, IMPPOP, and IMPSP are now available in the 1880 10% sample.
- Data from one missing reel of microfilm was restored to the 1880 complete-count database.
- Parental and spousal links were occasionally illogical for households in the 1900 and 1910 data due to an error in how information on surviving children (CHSURV) and children ever born (CHBORN) was handled. This has been corrected.
- Six cases in the 1930 sample had RELATE codes of 0 (not a valid relate code). They now have the correct codes.
- RESPMODE (available only in the 2005-onward ACS/PRCS) is now a household variable rather than a person variable. No data have changed, however.
- INDNAICS in the 2008 ACS contained unnecessary characters around the correct values. It is now consistent with other years.
- There is a new code (did not work last year, but did work in the past five years) for WORKEDYR.
Added samples.
Added variables.
Edited variables.
March 4, 2010
- Following conversations with Census Bureau staff about the calculation of dollar adjustment factors in the ACS/PRCS, the IPUMS no longer automatically applies the Census Bureau's adjustment factor to any dollar-amount variable in the ACS/PRCS. Such variables now exist in the IPUMS exactly as they were released by the Census Bureau, and IPUMS recommends that users analyze them without applying the adjustment factor. For more information, see the ACS adjustment page. Users who want to adjust dollar amounts may use the new variable ADJUST, which provides the adjustment factor as provided by the Census Bureau.
- Although adjustment factors are no longer applied automatically, users should know that there were problems in implementation for the 2005-2007 and 2006-2008 3-year files available between Jan. 12 and March 4, 2010. The adjustment factors in these samples should have varied with MULTYEAR. However, a programming error applied only the 2007 adjustment factor to all cases in the 2005-2007 3-year ACS/PRCS, and only the 2008 adjustment factor to all cases in the 2006-2008 3-year ACS/PRCS. Because these adjustment factors contained the CPI-U values to convert dollar amounts to constant dollars, dollar values were inaccurate. In the 2005-2007 3-year file, all dollar amounts should have been expressed in 2007 dollars, but dollar values for 2005 cases were too small by 6.1%, and dollar values for 2006 cases were too small by 2.7%. In the 2006-2008 3-year file, all dollar amounts should have been expressed in 2008 dollars, but dollar values for 2006 cases were too small by 6.1%, and dollar values for 2007 cases were too small by 3.5%. This error carried through to POVERTY values for nonrelatives of the householder (RELATE codes of 11, 12, and 13).
- In the 2003-onward ACS/PRCS, industry codes contain four digits of detail, but IPUMS codes (IND) formerly contained only three digits. All four digits are provided now. IND1950 and IND1990 (recoded, harmonized versions of industry) remain the same. New codes pages are available through IND and INDNAICS.
- OWNCOST was mistakenly omitted from the 2005-2007 and 2006-2008 ACS 3-year files. It is now available.
- Inspection of the revised 2006 AGE data revealed that almost all cases 65 years and older have changed in value, suggesting that the revised AGE data were imputed or created using synthetic data techniques. All persons age 65 or older in the 2006 ACS/PRCS (and all 2006 cases in the 2006-2008 3-year file) now have values of 4 in QAGE to indicate probable allocation.
Added variables.
Edited variables.
February 10, 2010
- Due to a programming error, respondents in the 2000-2008 ACS/PRCS files with total family incomes (FTOTINC) of 0 or very small amounts received POVERTY codes of 0, which are reserved for N/A cases (group quarters and unrelated individuals not in a subfamily under age 15). They now receive POVERTY codes of 1.
- Respondents who were unrelated to the householder, under the age of 15, and not in an unrelated subfamily should have received POVERTY codes of 0 (N/A); instead, they were coded as 1. This has been corrected.
- Respondents in the 1970 and 1980 Puerto Rico samples who should have received GRADEATT codes of 0 instead received codes of "ZZ". This has been corrected.
- Posted new versions of 1870 and 1900 datasets. In the new 1870 data, OCC and OCC1950 values are included for people under the age of 16 who reported an occupation. The previous version coded these people as "not in universe." In 1900, several cases previously having invalid SPEAKENG values are now coded properly.
Edited variables.
January 28, 2010
- Extra information on military service is now available in VETSTAT. Since 1990, the non-veteran category has included detail on people without military service, service members currently on active duty, and (since 2003) people whose only service is training in the National Guard or Reserves. The "Yes, Veteran" category has included detail on both service members who were previously on active duty at any time and (in 1990 only) people who were activated from the National Guard or Reserves. This detail is now contained in the new detailed version of VETSTAT; the general version coding remains the same as the previous VETSTAT coding. However, the actual frequencies in the general version differ from the previous one-digit version of VETSTAT for three reasons:
- In 1940, VETSTAT also contained valid codes for persons younger than 18, contrary to the stated universe for that year. All persons under age 18 are now classified as N/A in 1940.
- In 1950-1970, persons currently on active duty were not in the universe. They are now classified as non-veterans.
- Persons who should have been coded as non-veterans because they had undergone only training in the National Guard or the Reserves were erroneously classified as veterans in the 2000 census and 2000-2002 ACS data. As a result, the number of veterans was overestimated by 19.0 percent in the 2000 census and by 15.5 percent in the 2000-2002 ACS data (the difference comes from the fact that the ACS did not include group quarters in their sample). This has been corrected.
For more information, see the VETSTAT variable description.
- STATEFIP identifiers are now available in the 1% and 1.4% 1900 and 1910 samples for Alaska and Hawaii, and STATEICP is now available in 1900 for these two states (it has long been available in 1910).
- The 2006-2008 3-year file contained incorrect data for RACESING, with the result that many RACESING codes were inconsistent with RACE responses. RACESING is now correct.
- AGE has been replaced in the 2006 ACS/PRCS file. The Census Bureau released revised 2006 data in December 2009 to fix a problem with disclosure control techniques. The original age variable is still available as AGEORIG06; for full details, see the AGEORIG06 variable description. Because the family interrelationship pointer variables (MOMLOC, POPLOC, and SPLOC) rely on age, some of these have changed as well; STEPMOM, STEPPOP, and other variables based on family units also stand to be affected.
- The DIFFCARE variable in the 2008 ACS/PRCS contained data for DIFFMOB instead; the correct variable is now available.
- QAGE was mistakenly not made available for the 2005-onward PRCS or for the 2006-2008 3-year files. It may now be downloaded.
Expanded variables.
Edited variables.
January 12, 2010
- Added 2006-2008 ACS/PRCS 3-Year files. These files include all cases in the previously-released single-year files from the 2006, 2007, and 2008 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2008 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Another notable difference is that some values of AGE in the 2006 portion of the 3-year file differ from those in the 2006 single-year file because of a change to the Census Bureau's disclosure avoidance methods. The Census Bureau has re-released the 2006 single-year file with the revised AGE variable, and it will be added to the IPUMS database soon.
- An expanded version of the 1880 100% database is now available. The new release contains a variety of improvements over the previously-available version of this data:
- New variables were added, including BPLSTR, DWSIZE, FBPLSTR, GQSTR, INCORP, INCSTR, LINE, MBPLSTR, MCD, MCD, METDIST, OCC, PROBAI, PROBAPI, PROBBLK, PROBOTH, PROBWHT, QNAMELST, QQTRUNEM, QSURSIM, RACAMIND, RACASIAN, RACBLK, RACOTHER, RACPACIS, RACWHT, SFRELATE, SFTYPE, STREET, SUBFAM, SURSIM, URBAN, and URBPOP.
- The total number of records changed slightly. The change is partly due to the removal of duplicate records. Enumerators or census clerks crossed out records when they found a duplicate or when the respondent was deceased on census day. For the new 100% database, we have removed these crossed out records. Also, the number of records changed for the city of St. Louis (see below). Overall the total number of households decreased by 26,089, and the total number of person records decreased by 42,572.
- The city of St. Louis was enumerated twice for the 1880 census. The previous 100% database contained data from the second enumeration. Although the second enumeration contains more records, we decided that for the final database, in the interest of consistency with the data from the rest of the country, we would release the first enumeration, as the first enumeration occurred during the U.S. Census Bureau's designated timeframe for enumeration while the second took place 6 months later. As a result, the new release contains 7,573 fewer household records and 29,310 fewer person records for St. Louis.
- Improvements were made to geographic data. An audit conducted by Professor Michael Haines (Colgate University) revealed numerous county codes that were blank, invalid, or incorrect. This resulted in approximately 21,100 changes to the county variable and to variables that derive from county information (such as METRO, METAREA, etc.).
- We were able to take advantage of some additional detail in the RELSTR variable to better place individuals' detailed RELATE code, particularly in the 1200's (non-related individuals).
- Numerous other minor corrections were made to individual coding decisions.
- A new variable, HOMELAND, identifies PUMAs that contain a Census Bureau-designated American Indian, Alaska Native, or Native Hawaiian homeland area.
- A new variable, EDUC, contains all information on educational attainment that was previously spread across four variables (HIGRADE, EDUC99, EDUC00, and EDUC08). HIGRADE contains additional detail on educational attendance and has been retained. The other three variables are now superfluous and have been removed along with EDUCREC, the previous summary variable for educational attainment. EDUC has both general and detailed versions. The general version is the equivalent of EDUCREC in that it provides the set of general categories that can be identified in each year of data, but these general categories are more detailed than those formerly contained in EDUCREC. Additionally, all samples in and after 2000 contain the detailed category "Some college, but less than 1 year". In the old EDUCREC variable, this category was classified as "1-3 years of college". Because the people in this category have completed 12th grade but not 1 year of college, they are now classified as "12th grade" in the general version of the new EDUC variable.
- QGCHOUSE (the data quality flag for GCHOUSE) was mistakenly not made available before; it may now be downloaded.
- The multigenerational household variable (MULTGEN), previously released for the first time in the 2008 ACS/PRCS exactly as provided by the Census Bureau, has been revised by IPUMS to contain additional detail and is now available for samples from 1880 onward. See the variable description for full information.
- The occupational standing measures EDSCOR50, EDSCOR90, NPBOSS50, and NPBOSS90, which were based on EDUCREC, are now based on the more precise variable EDUC (general version). This has resulted in some changes. The EDSCOR measures are unchanged between 1950 and 1990; in all samples 2000 and after, though, they have decreased by an average of 6 to 7 points because of the code shift. The NPBOSS measures, which rely in part on the ordering of occupations' educational composition, have also changed, but by no more than 10 points in either direction.
- GRADEATT (grade of school now attending) has been expanded to a general/detailed coding scheme to accommodate the increased detail available in the 2008 ACS/PRCS. The variable GRADE08, which previously contained this detail, has been removed. Additionally, information from HIGRADE has been used to expand GRADEATT's availability to the 1960-1980 samples.
- Migration variables have been streamlined. Because of a programming error, YRSPR contained incorrect data for all observations, and data for year of immigration to Puerto Rico was split between YRIMMIG (for 1910-1920) and YRIMMIPR (for 1980 onward). This has been corrected. Additionally, YRSPR is limited to 1910, 1920, and 2000 onward; but YRSPR2 (an intervalled version of YRSPR) is now available and makes this information available in a less detailed form for 1980 and 1990 as well.
- For the ACS/PRCS multi-year samples, YEAR previously gave the actual year of survey (e.g., 2005, 2006, or 2007 for the 2005-2007 3-year file). To ensure that the combination of YEAR, DATANUM, SERIAL, and PERNUM uniquely identifies individuals, YEAR now provides the last year of data (e.g., 2007 for the 2005-2007 3-year file). Information on the actual year of survey has been shifted to a new household-level variable called MULTYEAR, valid only for the multi-year ACS/PRCS.
- QYRSPR (the data quality flag for YRSPR) contained all 0's due to a programming error. It now contains correct data.
- All negative values of replicate weights (REPWT and REPWTP) had been recoded to zero for ease of use in statistical software packages. However, in further discussions with StataCorp technical staff, it emerged that Stata can handle negative replicate weights. (Neither SAS nor SPSS can automatically process the kind of replicate weights included in the ACS and PRCS data.) The original replicate weight values are now provided, and IPUMS now provides an FAQ page on replicate weights that contains directions for using ACS/PRCS replicate weights in Stata.
- Because of a programming error, many cases that should have been coded as 0 on YRSUSA2 in the 2008 ACS were instead coded as 5, and many other cases that should have been coded as 1, 2, or 3 were instead coded as 0. This has been corrected. YRIMMIG and YRSUSA1 were accurate and remain unchanged, except for correcting another programming error that coded valid YRSUSA1 values of 0 as 1 in 1910-1930 samples. (YRSUSA1 codes of 0 contain both N/A cases and cases that arrived in America less than one year ago; they can be distinguished using BPL . See the YRSUSA1 codes page for more information.)
- Because of a programming error, all commutes (TRANTIME) over 99 minutes were too small by a factor of 10. This affects approximately the longest 0.5% of commutes in all ACS and PRCS samples. This has been corrected, and TRANTIME has been widened to three digits.
- QMIGRAT1 (the data quality flag for MIGRATE1) contained incorrect data for the 2006-2008 ACS samples, instead duplicating QMARST (the data quality flag for MARST). This has been corrected.
- Persons with OCC1950 values of 595 (armed services), 997 (missing/unknown), and 999 (N/A) received 59.5, 99.7, and 99.9 respectively as their NPBOSS50 scores. They now receive the appropriate N/A codes of 999.9.
- In the 2003 and 2004 ACS/PRCS samples, respondents with "some college but less than one year" were erroneously classified in EDUC (and the former EDUC99) as having an "associate's degree, occupational program", while those with "one or more years of college, no degree" were erroneously classified as having an "associate's degree, academic program". This has been corrected.
- In the 2008 ACS, three respondents with RACE codes of 827 ("White and one or more major race groups, n.e.c.") and one respondent with a code of 991 ("White race; Some other race; Black or African American race and/or American Indian and Alaska Native race and/or Asian groups and/or Native Hawaiian and Other Pacific Islander groups") received incorrect codes of "0" on RACESING. This has been corrected.
- Several other variables have been widened:
- BUILTYR2 was previously available in 2000 only for that year's ACS sample; it is now available for all samples in that year.
Added samples.
Expanded sample.
Added variables.
Edited variables.
Expanded variables.
November 9, 2009
Edited variables.
November 6, 2009
- Posted new versions of the 2008 ACS/PRCS. HHINCOME, FTOTINC, INCEARN, and INCTOT were not provided before due to errors in the original Census Bureau data; they are now available because the Census Bureau has released new data. POVERTY was previously provided just as the Census Bureau released it, with different values for each nonrelative of the householder. The POVERTY variable is now calculated as in all previous samples, where people in unrelated subfamilies have the same value.
- The Census Bureau has not documented their data update, so users should know that any 2008 PUMS files downloaded from the Census Bureau's website between October 30 and November 3 have incorrect data for the four income summary variables with the Census Bureau names of FINCP, HINCP, PERNP, and PINCP. And, as of November 6, the DataFerrett data had not been updated; they still contain incorrect codes of -$59,999 (HINCP and FINCP), -$10,000 (PERNP), and -$19,999 (PINCP).
- YRSUSA1 was calculated incorrectly for the 2008 samples. It is now aligned with YRIMMIG.
- The documentation for MULTGEN, a new variable measuring multigenerational households that IPUMS provides without modification from the original Census Bureau data, has been updated to reflect the results of preliminary examination by IPUMS staff. We recommend that researchers use this variable with caution.
Edited samples.
Edited variables.
November 4, 2009
- Added 1% samples from the 2008 American Community Survey (ACS) and the 2008 Puerto Rico Community Survey (PRCS). Together, the samples contain approximately three million person records. The 2008 ACS is the third ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2008 data.
- The lowest level of geographic identifier in the 2008 ACS is the PUMA; 2008 PUMAs have the same boundaries as those in the 2005-2007 ACS and the 2000 census samples. The IPUMS version of the 2008 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
- There are several noteworthy changes to the IPUMS data that stem from the Census Bureau's modifications to the ACS/PRCS questionnaire; users should consult our page on the 2008 ACS/PRCS.
- Incorrect adjustment factors were used for all dollar amounts in ACS and PRCS samples. This resulted in dollar amounts that were nearly 1 percent smaller than they should have been in 2006 ACS/PRCS data and 2006 cases in the 2005-2007 3-Year ACS/PRCS files. For all other ACS/PRCS data, dollar amounts were off by less than 0.5 percent. This was due to an error in the Census Bureau documentation, which states that the adjustment variables convert dollar amounts into July dollars (for instance, see the 2005 ACS PUMS Accuracy Statement, p. 13). Through further conversations with Census Bureau staff, we have realized that these variables actually convert dollar amounts into calendar year dollars. This is true for all ACS samples since 2000. All dollar amounts in IPUMS ACS/PRCS samples are pre-adjusted to reflect calendar year dollars.
- The occupational standing measures have been refined.
- NPBOSS50 and NPBOSS90, which are based on median educational attainment and median earned income, have been re-calculated using the standard formula for calculating medians from grouped data. These were previously calculated using the GMEDIAN function in the SPSS MEANS command, which appears to have a programming error that causes its calculations to diverge from its stated method. The changes are quite small.
- ERSCOR50 and ERSCOR90, previously calculated as if the income data were not grouped, have been re-calculated using the above procedure for calculating medians from grouped data. Also, IPUMS now weights occupations by the number of workers they contain when calculating the standardized median incomes on which the ERSCOR measures are based. Because this weighting was not carried out before, scores for ERSCOR50 and ERSCOR90 did not truly represent the percentage of workers in occupations having lower median earnings than a given occupation, as stated in the documentation. Rather, they represented the percentage of occupations having lower median earnings than a given occupation. Together, these two improvements have the potential to make significant changes to the ERSCOR measures: computing medians without attention to the grouped nature of the data resulted in many ties in the positioning of occupations, which were broken by the (arbitrary) occupational category numbers. The increased precision alters the relative and absolute position of the occupations in the distribution of standardized median incomes, while weighting for the size of the occupations further alters the absolute position of each occupation in this distribution.
- The EDSCORE, ERSCORE, and NPBOSS measures rely on statistics that are derived from the PUMS data (for instance, the proportion of people in a given occupation who have a college education). Samples containing data from 2006-2007 were previously based on analysis of the 2006 1-year data. Components for these variables are now calculated separately for each sample.
- Some replicate weight values (REPWT and REPWTP) in the original Census Bureau 2006 and 2007 ACS/PRCS data files are negative. However, many statistical procedures balk at negative weights. In this latest revision, IPUMS has recoded all REPWT and REPWTP values to 0 where they were less than 0 in the original Census Bureau file. This affects very few cases, typically fewer than 40 for any one set of replicate weights. According to Census Bureau staff, this introduces no bias into replicate standard errors, and tests of IPUMS data confirmed this.
- In the 1850-1870 samples, blanks on the REALPROP and PERSPROP items are now coded as 0. Some of the samples previously coded blanks as 999999.
- For household fragments in the 1910 1% data that were reunited with their proper household (SAMPRULE = 6), persons that fell outside the sampling window now receive PERWT, SLWT, and PERWTDET values of 0, in accordance with IPUMS documentation. Previously, such persons were inadvertently given the same weights as everyone else in the sample. This affects 573 household fragments.
- HWSEI, PERWTDET, and HHWTDET, which have two implied decimal places, are now divided by 100 automatically in the command setup files. EDSCOR50, EDSCOR90, ERSCOR50, ERSCOR90, NPBOSS50, NPBOSS90, PRESGL, and PRENT, all of which have one implied decimal place, are now divided by 10. Previously, users were required to perform these calculations.
- Year of naturalization (YRNATUR) now indicates the full four-digit year, rather than the last three digits (as was the case previously).
- The names of some disability variables available in the ACS have been streamlined, and minor errors in the IPUMS documentation have been corrected; for more information, see our page on the 2008 ACS/PRCS.
- The coding of BUILTYR2 (age of structure) has been changed. Previously, higher values of this variable represented older structures, and buildings constructed most recently had lower values. Because the most recent year of construction needs to be represented by 1, each additional year of data would otherwise have required frequent and inconvenient code changes. Now, higher values of this variable represent younger structures, and future revisions will merely add (not change) codes.
- For 2005-2007 ACS and PRCS samples, the data quality flag for CONDOFEE (QCONDOFE) contained data for QCOSTGAS instead. This has been corrected.
Added samples.
Edited variables.
October 9, 2009
- Added two new variables for 1850-1930. In these samples, dwellings could include more than one household. Households have always been uniquely identified by SERIAL; the new variable DWELLING is a unique identifier for dwellings. Within each value of DWELLING, there may be more than one SERIAL. The new variable DWSEQ indicates the order in which households were enumerated within the dwelling. For more information, please see the variable descriptions.
Added variables.
October 8, 2009
- Added higher-density samples for 1880 and 1900. The 1880 10% sample has replaced the preliminary 1880 5% sample, while the 1900 5% sample has replaced the preliminary 1900 2.5% sample. The final samples contain all cases from the preliminary samples (which came from odd-numbered microfilm reels only) as well as new cases from even-numbered microfilm reels. For more details, see the sample description page for 1880 and 1900.
Added samples.
August 11, 2009
- Posted new versions of all linked data samples. The LINKWT variable has been corrected in all samples. Due to a processing error, LINKWT values were low by an order of magnitude ranging from 2x to 50x. Any data that was downloaded previously should be replaced with these new data.
Edited samples.
June 17, 2009
- Additional detail is now available on total family income (FTOTINC). Previously, this variable was part of the household record and described only members of the primary family, those persons related to the head (FAMUNIT =1). FTOTINC is now part of the person record and describes the family income of everyone in the household, even if they are unrelated to the head. The width of FTOTINC has been increased from 6 digits to 7 digits, which provides additional detail on very high incomes; however, users should remember that such incomes have been top-coded by the Census Bureau. Users should note two minor errors in the previous version of FTOTINC that have now been corrected; both affected only 2000-2007 ACS and 2005-2007 PRCS data:
- For household heads living with at least one nonrelative but no relatives, the IPUMS adjustment factor was erroneously applied twice, resulting in FTOTINC values that were slightly larger than they should have been for everyone in those households.
- FTOTINC values were too small by a factor of 10 for families with a total family income of $999,999 or more.
- Additional detail is also available on total household income (HHINCOME): the width of this variable has also been increased from 6 digits to 7 digits. Users should note one minor error in the previous version of HHINCOME that has now been corrected: in households where applying the IPUMS adjustment factor pushed HHINCOME values from below $999,999 to at least $999,999, HHINCOME values were too small by a factor of 10. The error affected only 2000-2007 ACS and 2005-2007 PRCS data.
- For the 2000 Census samples and ACS sample, the variables EDUC99 and EDUC00 were changed to reflect errors found in the data dictionary. In EDUC99, respondents in the 2000 - 2004 ACS samples that were previously coded as having an 'Associate degree - occupational program' were coded as 'Some college, no degree'. A value for "Associate degree, type not specified' was added in EDUC99 for classification of the respondents with associates degrees in the 2000 samples and the ACS. The respondents with a value for 'Associate degree, academic program' for the ACS 2001-2004 samples now are coded to have an 'Associates degree, type not specified'. For EDUC00, respondents in the 2000 - 2004 ACS samples that were previously coded as having an 'Associate degree - occupational program' were coded as 'One or more years of college but no degree'. The respondents with a value for 'Associate degree, academic program' are now coded to have an 'Associates degree".
Edited variables.
June 11, 2009
- Minor correction to 2007 ACS/PRCS data. In the 2007 ACS (and the 2007 cases in the 2005-2007 ACS 3-year file), some cases in Florida had missing values of PROPINSR. These are now coded as 9999, which is the correct PROPINSR topcode for Florida. The documentation of topcodes has also been updated to reflect this change.
Edited variables.
May 29, 2009
- In the 2000 PUMS and all ACS/PRCS data, persons with OCC codes of 384 ("miscellaneous law enforcement officers") received OCC1990 codes of 405 ("housekeepers, maids, butlers, stewards, and lodging quarters cleaners"). They now receive OCC1990 codes of 423 ("Other law enforcement: sheriffs, bailiffs, correctional institution officers"). (Note that this change diverges from the BLS working paper on which OCC1990 is based.)
- In all ACS/PRCS data, persons related to the household head were erroneously coded as 0 (N/A) for POVERTY if their total family income (FTOTINC) was negative. They now receive the proper codes of 1.
- In all PRCS data, POVERTY values for all cases were based on IPUMS calculations from topcoded income data. For members of the primary family in the household, POVERTY values now reflect the original Census Bureau values (based on non-topcoded income data), in accordance with IPUMS' treatment of ACS data. The effect of this alteration is small; for 92 percent of such cases, POVERTY values change by no more than three percentage points. For unrelated individuals and members of any secondary families, POVERTY values continue to be based on IPUMS calculations (see the variable description for background).
- In the 2006 ACS (and the 2006 cases in the 2005-2007 ACS 3-year file), group-quarters residents were erroneously coded as missing for QMOVEDIN (the data quality flag for MOVEDIN). They are now coded as 0.
Edited variables.
May 13, 2009
- Added four new variables describing subfamilies to 1880-2007 IPUMS samples: SFTYPE (subfamily type), SFRELATE (relationship within subfamily), SUBFAM (subfamily membership), and NSUBFAM (total number of subfamilies in the household). For more information, see the subfamilies overview page.
- Also, documentation for other family interrelationship variables has been updated to conform to longstanding IPUMS procedures:
- When linking under the third rule for MOMRULE or POPRULE, the IPUMS uses an additional condition in surveys where respondents can give multiple responses (2000, ACS, and PRCS): persons for whom a single race is listed may not be linked to potential parents of a different race. Users should note that this condition has long been applied to 2000 and ACS data, but is now applied to the PRCS for the first time.
- Persons receive STEPMOM codes of 1 when the difference in ages between them and their mother is less than 12 years or greater than 54 years--not less than 15 years or greater than 49 years, as the documentation previously stated.
- Persons receive STEPPOP codes of 1 when the difference in ages between them and their father is less than 14 years--not less than 15 years or greater than 64 years, as the documentation previously stated.
Added variables.
Edited variables.
April 21, 2009
- Corrected missing values and other minor inaccuracies in several samples. First, several variables contained missing data for some cases. Missing data has been assigned to the proper codes as follows:
- In the 1880 100% database, 21 cases that were mistakenly coded as missing on ENUMDIST are now been coded as "0", and SUBSAMP (formerly unavailable in these data) is now provided.
- In the 1900 1% sample (both with and without oversamples), four cases contained missing data for SUPDIST. One of these is now coded as "7", one as "14", and two as "73". Additionally, five cases contained missing data for DWSIZE. Two of these are now coded as "3", one as "5", and two as "7".
- In the 1920 Puerto Rico sample, two cases that were mistakenly coded as missing on ENUMMO are now coded as "01".
- In the 1930 1% sample, values of IND1930 and OCC1930 that contained non-numeric characters were mistakenly coded as missing; the proper values are now available.
- In the 1950 sample, missing data for WKSWORK1 has been assigned to "00" (N/A).
- In the 1980 urban/rural sample, cases that were coded as "3570" (Lexington, KY) for CITY have been switched to "3590" (Lexington-Fayette, KY) to account for the 1974 merger of Lexington and Fayette County. Additionally, city populations (CITYPOP) are now identified for this city as well as for city codes 6410 (Scranton, PA) and 6650 (Springfield, IL).
- In the 1980 labor market sample, the variables MIGCZ5 and PWCZ were mistakenly coded as missing for a large number of cases; the proper values are now available.
- Second, all housing units with 10 or more persons unrelated to the household head have been re-classified as group quarters in all American Community Survey and Puerto Rican Community Survey samples, consistent with the treatment of such households in the 2000 census. For more information, see GQ . The cases in such housing units are now coded as 5 in the GQ variable and 9 in the GQTYPE variable (900 in the detailed version GQTYPED).
- Third, information on variable availability has been updated as follows:
- The 2000 1% Puerto Rico sample does not contain PUMA information, and all cases were coded as missing for this variable. PUMA may no longer be downloaded with this sample.
- All ACS/PRCS samples now include GQTYPE to accommodate the aforementioned change in GQ coding (see above).
- Finally, the IPUMS variables FDSTPAMT and OWNCOST are now adjusted to calendar-year dollars in all ACS/PRCS samples; see the ACS income variables note for more information.
Edited variables.
April 1, 2009
- Improved and updated the coding of in-laws in the 2000-2007 American Community Survey (ACS) and 2005-2007 Puerto Rican Community Survey (PRCS) samples. In these samples, the Census Bureau's relationship variable includes only a global "in-law" category. IPUMS attempts to provide a more detailed classification of parents-in-law, siblings-in-law, and children-in-law in the RELATE variable. The new release of the ACS and PRCS datasets improves the procedures for making these detailed in-law assignments. More information on the new procedures is available here. Additionally, users should take note of three coding errors in the old classification scheme that have been corrected and/or no longer apply in the new classification scheme:
- Many never-married in-laws, all of whom should have been classified as siblings-in-law under the old classification scheme, were instead classified as parents-in-law or children-in-law. This condition no longer applies.
- In households containing unmarried partners of the head, the classification of in-laws departed from the stated rules and was likely to be particularly inaccurate. This is no longer the case.
- In the 2005-2007 PRCS, all in-laws were mistakenly classified as parents-in-law. This has been corrected.
- Additionally, in the 2005-2007 ACS and PRCS 3-Year samples, the person weights (PERWT) for individuals in group quarters were not copied to the household weight variable (HHWT). This has been corrected.
Edited variables.
March 5, 2009
- Posted the 2005-2007 American Community Survey/Puerto Rican Community Survey 3-year file. This file includes all cases from the previously-released single-year files from the 2005-2007 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2007 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
Added samples.
March 1, 2009
- PRCS data from 2005-2007 have been altered to resolve small coding differences across survey years. All 1,093 cases previously coded as 2 on the ABSENT variable in the 2006 and 2007 PRCS single-year files are now coded as 3, and one individual previously coded as 899 on the RACE variable in the 2005 PRCS single-year file is now coded as 943.
- There was also a slight change in the PRCS's main immigration variable. PRCS data previously available in YRSUSA1 has been shifted to YRSPR, and the flag associated with this variable has changed from QYRIMM to QYRSPR.
Edited variables.
February 9, 2009
- Posted new versions of linked data samples for males from 1860-1880 and 1870-1880. In the 1860 data, 303 cases and in the 1870 data, 302 cases were removed after applying a filter for records where there was a middle initial mismatch that previously had not been applied properly.
Edited samples.
December 20, 2008
- Posted new version of the linked data sample for males from 1850-1880, 1880-1900, and 1880-1910:
- Changes to the 1850-1880 data: the dataset was increased by 49 records. Some records were removed and some added as the result of 1) rerunning one of the classifiers and 2) properly applying a middle initial mismatch filter.
- Changes to the 1880-1900 and 1880-1910 data: 282 cases were removed from 1900 and 215 cases were removed from 1910, after applying a middle initial mismatch filter that previously had not been applied properly.
Edited samples.
December 11, 2008
- Posted remaining linked data samples. Also posted new versions of samples linking couples from 1870-1880 and 1880-1910. In the 1870 data, 10 cases that previously had a LINKWT of 0 were given the correct non-0 LINKWT values. In the 1910 data, 140 cases that had LINKWT values greater than 5 were assigned values of 5 (the maximum allowable LINKWT).
Added samples.
November 11, 2008
- Posted new versions of samples for 1970-2007. Improvements were made to the 1970 samples to correct the variable INCOTHER. Samples from 1980-2007 were expanded to include the variable OWNCOST.
- Posted new version of the 1880 100% database. Fixed problems with the MCDSTR and PAGENO variables. Group quarters units containing more than 60 people were split into 1-person households. Researchers needing to study these units intact can use SERIAL80 and PERNUM80.
Edited variables.
October 11, 2008
- Added 1880 100% population database. This dataset was originally entered for genealogical purposes, by the Church of Jesus Christ of Latter Day Saints (LDS). Data cleaning and harmonization took place at the Minnesota Population Center (MPC). Versions of this data are also available from the MPC's North Atlantic Population Project and the LDS's genealogical website FamilySearch.org.
- The IPUMS-USA version of the data contains fully integrated codes and labels, newly-constructed family inter-relationship variables, and missing data allocation for key demographic variables. Since the dataset was first constructed for genealogy, several variable groups were never entered. Excluded variables include items relating to school, literacy, unemployment, disability, month of birth, marriage within the past year, and street address. The most detailed geographic variables are MCDSTR and INCSTR.
- Added 2.5% preliminary sample of the 1900 census. This sample is "preliminary" because the final version will contain 5% of the population. The preliminary sample includes data only from odd-numbered microfilm reels. Counties on even-numbered reels are not represented in this dataset. Alaska and Hawaii are also excluded from the preliminary dataset. The final 5% dataset will be released in early 2009.
Added samples.
September 26, 2008
- Added 1% samples from the 2007 American Community Survey (ACS) and the 2007 Puerto Rico Community Survey (PRCS). The samples have approximately three million person records. The 2007 ACS is the second ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2007 data.
- The lowest level of geographic identifier in the 2007 ACS is the PUMA; 2007 PUMAs have the same boundaries as those in the 2005-2006 ACS and the 2000 census samples. The IPUMS version of the 2007 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
- Note that the name of the IPUMS variable describing military service September 2001 and later has been changed from VET01X03 to VET01LTR. This name more accurately reflects the information contained in the variable.
- More information on the background of and future plans for the American Community Survey is available at the ACS information page.
Added samples.
April 11, 2008
- Posted IPUMS Version 4.0, the first major revision of the IPUMS files since 2004. Includes revised versions of all samples from 1850-1930, a new 1880 5% sample, and 13 new samples from the Puerto Rican Censuses of 1910-2000 and the Puerto Rican Community Survey.
- IPUMS 4.0 contains many new variables, including long-term Hispanic identification back to 1850 (HISPAN), a consistent single-race identification variable from 1850-2006 (RACESING), a battery of socioeconomic indices, original strings for occupation (OCCSTR) and industry (INDSTR), and new detailed weight variables for the historical samples (HHWTDET and PERWTDET), and new standardized low-level geographic identifiers (MCD and INCORP). More information is available on the IPUMS 4.0 release page.
- The most recent previous version of IPUMS data and documentation (IPUMS 3.0) is still available via the IPUMS archive page at ICPSR. The archive page permits users to revise old extracts, create new extracts, and download data and documentation. The link titled "IPUMS-USA website as of March, 2008" leads to a fully-functioning mirror of the IPUMS website as it existed prior the release of IPUMS 4.0. The archive page contains versions of the website from previous years as well.
Added samples.
Added variables.
February 14, 2008
- Posted a new version of the 1950 census sample, with a correction made to the BPL variable. Several cases that had been erroneously coded "Missing/blank" are now coded correctly as follows: 94 cases coded "Israel," 9 coded "Byelorussia," and 3 coded "Pakistan." In the 2000 census samples, changed the MIGMET5 code for Hattiesburg, MS from 3285 to 3300 to be consistent with our METAREA coding.
- Re-released VALUEH for the 2006 ACS sample; during a recent website update, VALUEH had inadvertently been removed from the data extract system.
Edited variables.
December 14, 2007
- Posted new versions of the 2005 and 2006 ACS sample: released CITYPOP for both samples. Fixed a small error in QCONDOFE and QVALUEH in the 2006 sample. Prior to the update, a small number of cases had missing values for these two variables.
- Posted new versions of the 1% and 5% census samples for 2000: fixed PUMALAND, PUMAAREA, and ACREPROP. Prior to this correction, these three variables contained incorrect data.
Edited variables.
November 15, 2007
- Posted new versions of all ACS samples; a correction was made to the BUILTYR2 variable. Previously, households built prior to 1939 or earlier (BUILTYR2 = 10) were grouped with those reported as being built in 2005 or later (BUILTYR2 = 1).
Edited variables.
October 15, 2007
- Added a 1% sample from the 2006 American Community Survey (ACS). The sample has approximately 2,970,000 person records. The 2006 ACS is the first ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006 data.
- The lowest level of geographic identifier in the 2006 ACS is the PUMA; 2006 PUMAs have the same boundaries as those in the 2005 ACS and the 2000 census samples. The IPUMS version of the 2006 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
- More information on the background and future plans for the American Community Survey is available at the ACS information page.
Added samples.
August 17, 2007
- Added HHTYPE for all samples from 1940 to 2005. In the future, HHTYPE will also be made available for all samples from 1850-1930.
Expanded variables.
July 25, 2007
Expanded variables.
July 19, 2007
- Posted updated versions of all samples from 1900, 1910, and 1930. Corrections were made to the CHSURV variable in the 1900 and 1910 samples. All values were previously 0 or 1. The updated samples contain correct values. Corrections were made to the NSIBS variable in the 1900, 1910, and 1930 samples. Previously, a small number of persons identified as "siblings" in the RELATE variable (code 701) incorrectly received a value of 0 for NSIBS. This error has been corrected.
Edited variables.
July 10, 2007
- Posted updated versions of all samples from 1970 and 1980, and the ACS samples from 2000, 2001, and 2002. The updated samples include fixes to COSTELEC, COSTGAS, COSTFUEL, and COSTWATR. These variables did not properly identify cases having values of greater than 9990. All cases in this range are in the universe but have unreported values, usually because utility costs were included in rent payments. The old versions of the datasets incorrectly identified these cases as not being in the universe. COSTELEC and COSTGAS had the additional problem of presenting monthly values instead of annual values. These problems are now fixed.
- The new 1980 5% sample additionally fixes a problem in the CITY variable. In the old sample, San Francisco was incorrectly identified. It has been corrected.
Edited variables.
June 21, 2007
- Posted an updated version of the 1930 1% sample. The updated sample includes fixes of minor problems in OCCSCORE (missing occupation data was not being allocated), YRSUSA2 (some allocated values were inconsistent with YRSUSA1), and QMARST (this variable indicated that we made more logical edits than were actually made).
- Released new version of the 2005 American Community Survey sample that includes 160 replicate weight variables (see REPWT and REPWTP).
- Released CITYPOP for 1850-1930 samples. Due to a technical problem, we had not been offering CITYPOP in these samples since February 2007. The CITYPOP values that we are providing now are not different from the values that were available prior to February.
- Released new data extraction system with the "Attach Variables" feature, which allows researchers to create variables specifying characteristics of respondents' spouses, mothers, fathers, and household heads.
Edited variables.
Edited extract system.
June 7, 2007
Added samples.
April 26, 2007
- Added new occupation crosswalks (OCC to OCCSOC) for the 2000 census samples and the ACS samples; these are available via links from our Occupation and Industry documentation page. Also improved our OCC and OCCSOC code lists (available from the respective variable descriptions) for the 2000 census and ACS samples.
Edited variables.
April 24, 2007
- Posted a new version of the 2005 ACS; a correction was made to the MORTAMT1 variable.
Edited variables.
April 9, 2007
- Added Consistent PUMA variable and shapefiles (see CONSPUMA). CONSPUMA reconciles differences in low-level geographic identifiers in the 5% samples from 1980, 1990, and the 2005 ACS. Also released all new shapefiles for low-level geographic identifiers from 1970-2005. Changes to the previous shapefiles were minor: numerous "holes" in the maps were assigned to their appropriate PUMA, County Group, or SEA. All files are accessible via the links on our geographic tools page.
Added variable.
March 27, 2007
- Changed the name of the RACHIST variable to RACESING.
Edited variables.
March 21, 2007
- Added QHISPAN, the data quality flag for HISPAN, to the and ACS samples. Posted new versions of the 1940 and 1950 samples: a minor correction was made to the CHBORN variable.
- Posted new versions of the 1910 samples: we corrected a problem with SERIAL so that households within multi-household dwellings are now uniquely identified. The problem had affected less than .13% of households in the 1910 1.4% sample.
Edited variables.
February 15, 2007
- Created HISPAN and HISPRULE variables for the 1900 and 1930 samples. A later data release will create these variables for the 1850-1880 and 1910-1920 samples.
Added variables.
Edited variables.
January 31, 2007
- Released new harmonized occupation and industry variables for 1950-2005: OCC1990 and IND1990. The OCC1990 variable was created in collaboration with researchers at the Bureau of Labor Statistics. Both variables are available only via the IPUMS.
- Added metropolitan area designations to the 2003 ACS, in METAREA and MET2003. Metropolitan areas are also identified in the 2005 ACS IPUMS sample.
- Created HISPAN and HISPRULE variables for the 1940-1970 samples.
- Added RACHIST values to the 1950-1990 samples and the 2005 ACS IPUMS sample. RACHIST adapts an algorithm developed at the National Center for Health Statistics to assign single races to persons who reported more than one race from 2000 onward.
Added variables.
Expanded variables.
December 19, 2006
- Replaced the 1-in-250 1910 sample with two new samples: the 1910 1% sample and the 1910 1.4% sample with oversamples. The 1% sample includes a 1-in-100 national population sample, including Alaskans, Hawaiians, and persons enumerated on the American Indian Schedules. The 1.4% sample with oversamples includes a 1-in-70 national population sample that has been combined with large oversamples of Blacks, Hispanics, Alaskans, Hawaiians, and persons enumerated on the American Indian schedules. The 1910 Weighted sample must be used with weighting variables (see PERWT and HHWT).
- Replaced the 1900 General sample with two new samples: the 1900 1% sample and the 1900 1% sample with oversamples. The 1900 1% sample is a 1-in-100 national sample, including Alaskans, Hawaiians, and persons enumerated on the American Indian Schedules. This sample has the same cases as the former "1900 General sample" did, though some variables and values have been modified in minor ways. The 1900 1% sample with oversamples is a 1% national sample that has been merged with 1-in-5 oversamples of Alaskans, Hawaiians, and persons enumerated on the American Indian schedules. The 1900 1% sample with oversamples must be used with weighting variables (see PERWT and HHWT).
- More information about these samples is available in the 1900 and 1910 sections of the sample descriptions page. We expect to release a revised version of these samples in March 2007. The revised samples will included detailed geography at the minor civil division level and integrated versions of variables specific to the Alaskan, Hawaiian, and American Indian populations.
Added samples.
December 10, 2006
- Posted new version of the 2005 ACS sample that includes the following new geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
Expanded variables.
November 29, 2006
- Posted new versions of all samples from 1940 through 2005. The new samples include several minor improvements to SPLOC, MOMLOC, and POPLOC. These modifications have resulted in minor changes to the constructed household variables, the family interrelationship variables, POVERTY, and FTOTINC. Detailed information on these variables can be found in the family interrelationships documentation.
- An error was corrected in the POVERTY variable for all samples. In two-person families where one person was over age 65 and the other person was under age 65, we sometimes used slightly different poverty thresholds for each member of the family. We should have applied the same threshold to both members of the family. This resulted in several thousand cases in each sample having a poverty value that was off by an average of two percent (10 points on POVERTY's 1-500 scale). We have corrected the problem.
- The new samples also include a small number of corrected income values in the 1950, 1960, and 1970 samples. The majority of cases affected have negative income values.
Edited variables.
November 20, 2006
- Corrected a problem in the RACE variable in the 2005 ACS sample. There were approximately 3,000 cases with missing values. All of the cases were multi-racial persons. All cases are now assigned to the appropriate categories.
Edited variables.
October 11, 2006
- Posted 1% sample from the 2005 American Community Survey (ACS). The 2005 sample is the first ACS microdata to identify sub-state geography, including PUMA, MIGPUMA1, and PWPUMA00. The IPUMS version of the 2005 ACS also identifies metropolitan status (METRO). A December 2006 release of the IPUMS 2005 ACS sample will identify CITY, METAREA, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWMETRO, PWCITY, PWTYPE, and PWPUMAS. These variables are being constructed at the Minnesota Population Center and are not available via the Census Bureau.
- The base data for the IPUMS 2005 sample is the ACS data that the Census Bureau released on October 5th, 2006. The Census Bureau had originally released a version of the dataset on September 11th, 2006. The September release contained several small errors, so the Census Bureau updated the dataset in October. The erroneous dataset was never available via the IPUMS data extraction system.
- More information on the background and future plans for the American Community Survey is available at the ACS information page.
Added samples.
October 1, 2006
- Posted 0.5% sample from the 1930 census (up from the previous 0.2% 1930 sample).
Added samples.
September 6, 2006
- Posted new version of IPUMS-USA website. The website has a new design, and the content of most variable descriptions has changed at least slightly. Users can still access all extract requests made on the old website.
Edited website.
June 30, 2006
- Posted new versions of the 2000 1%, 5%, and Unweighted samples: a correction was made to the MIGPLAC5 variable.
Edited variables.
April 27, 2006
- Posted new versions of all ACS samples: a correction was made to the INCBUS00 variable.
Edited variables.
April 7, 2006
- Posted new versions of all 2000 Census samples and all ACS samples: a correction was made to the OCC variable.
Edited variables.
January 20, 2006
- Posted new versions of the 2000 1%, 5%, and Unweighted samples, as well as the 2000 ACS: a correction was made to the MARST variable.
Edited variables.
November 30, 2005
- Posted 13 new samples on the IPUMS-USA website. All samples were previously available on the IPUMS-USA Beta site, which was shut down. The new samples combined add nearly 15 million cases to the IPUMS database. For more details on this data release, see the sample information page.
Added samples.
October 7, 2005
- Posted new versions of the 2000 1%, 5%, and Unweighted samples: a correction was made to the VET55x64 variable. New versions of the 2000-2004 ACS samples were also posted. In all eight samples above, improvements were made to the INDNAICS and OCCSOC variables.
- Corrections were made to the YRIMMIG, YRSUSA1, and YRSUSA2 variables in the ACS 2001-2004 samples.
Edited variables.
September 16, 2005
- Released the 2004 American Community Survey (ACS) sample on the IPUMS Beta site.
Added samples.
September 9, 2005
- Released a new 2000 1% flat sample on the IPUMS Beta site. This is a national random sample drawn from the 2000 5% Census sample.
Added samples.
September 1, 2005
- Posted a new version of the 1930 1-in-500 sample. Corrections were made to the VET1930 variable and the AGEMARR variable.
Edited variables.
June 27, 2005
- Posted new versions of the 2000 1% and 5% samples, and the 2000-2003 ACS samples. Added the following variables: RACHIST, PROBAI, PROBAPI, PROBBLK, PROBOTH, and PROBWHT. RACHIST is an historically compatible race variable which 'bridges' multiple-race responses into their most likely single race category. The other variables give detailed probabilities of each single-race response and are best used in combination with one another.
Added variables.
Edited variables.
May 20, 2005
- Released a revised version of the 1-in-100 sample of the 1900 census (see the August 21, 2003 revision note for information on the previous version of this sample). The revised dataset includes records extracted from Alaska, Hawaii, and the American Indian 1-in-5 oversamples (the complete oversample datasets are available via the IPUMS raw data download page).
- Users should also be aware that the smaller 1900 sample previously available (the 1-in-750 "Preston" sample) will no longer be available via the IPUMS extract system. Users wishing to access this data can still download the entire dataset and SPSS command file via the IPUMS raw data download page).
Edited samples.
May 13, 2005
- Released a revised version of the preliminary 1-in-500 sample of the 1930 census. Corrected a major error in the race variable. The April 25th sample gave the "White" code (detailed race code 100) to all persons who reported their race as "Mexican." The revised sample gives these persons the new "Mexican" race code (detailed race code 140). The revised sample also corrects minor coding and labelling errors in the following variables: RENT30, GQTYPE D, NUMHHTAK, FARMSCHD, ENUMMO, RADIO, HOMEMKR, VET1930, IND1950, MTONGUE, FBPL, MBPL, CITY, METRO, METAREA, URBAREA, and MDSTATUS.
Edited variables.
April 25, 2005
- Released preliminary 1-in-500 sample of the 1930 census. We expect to release a final 1-in-100 sample of the 1930 census by late 2007.
Added samples.
February 23, 2005
- Posted new versions of the 2000-2003 ACS samples: a correction was made to the STATEICP variable.
Edited variables.
February 1, 2005
- Removed the POV2000 variable from the documentation and data. POV2000 was redundant with the IPUMS POVERTY variable. Both variables use the poverty matrix developed by the Social Security Administration in 1964 (and revised twice in the years since). The Office of Management and Budget's Directive 14 prescribes this definition as the official poverty measure for federal agencies to use in their statistical work.
Edited variables.
November 23, 2004
- Released the following samples on the IPUMS Beta site: the 2003 American Community Survey (ACS) sample, the 1990 Labor Market Areas sample, the 1980 Labor Market Areas sample, and the 1980 Detailed Metro/Nonmetro sample.
Added samples.
October 13, 2004
- Posted new versions of the 2000 1% and 5% samples, and the 2000-2002 ACS samples. The following variables were improved: OCC1950, SEI, OCCSCORE, and IND1950. The new variables utilize the Census Bureau's recently published occupation and industry crosswalks between the 1990 and 2000 censuses.
- Made a slight correction to the multipliers used to construct the POVERTY variable in the 2000-2002 samples (for more information see the 1990 poverty status definition).
Edited variables.
August 27, 2004
- Posted a new version of the 2000 5% sample: a correction was made to the METAREA variable.
Edited variables.
August 6, 2004
- Posted a new version of the 2000 5% sample: a corrections was made to the PWCITY variable.
Edited variables.
June 28, 2004
- Posted new versions of all the 2000 and ACS samples. The RACE variable has been expanded to incorporate all information from the new multiple-race variables. Details about multiple-race responses are now included, some value labels were clarified, and a few other categories were added. Also, CITYPOP was added to the 2000 1% and 5% samples, and corrections were made to MOBLHOME and METAREA.
Edited variables.
June 17, 2004
- Released American Community Survey (ACS) samples for 2000, 2001, and 2002 on the IPUMS Beta site.
Added samples.
May 6, 2004
- Made 2000 5% sample available via the main IPUMS-USA site.
Added samples.
May 1, 2004
- Posted new versions of all of the 2000 samples. The 2000 5% sample now includes variables for Super-PUMA of Work (PWPUMAS) and Super-PUMA of Migration (MIGPUMAS). For the 2000 1% sample, Super-PUMA information that was previously in the PWPUMA00 and MIGPUMA variables is now in the new PWPUMAS and MIGPUMAS variables. A new version of the INCRETIR variable in all three 2000 samples now includes retirement incomes of greater than $99,998 (the previous Top code). All three samples include a corrected version of the POV2000 variable. Posted new versions of all 1990 samples that account for the greater width of INCRETIR (see above).
Edited variables.
April 22, 2004
- Posted a new version of the 2000 1% sample: a correction has been made to the MIGCITY5 variable.
Edited variables.
March 10, 2004
- Posted new versions of the 2000 1% sample and the 2000 5% sample. Both samples now include the PWCITY variable. For those living in group quarters, the variable HHWT now has the PERWT value, rather than a value of 0. In addition, corrections were made to the following variables: BPL, STEPMOM, STEPPOP, MARST, and PUMASUPR.
- Posted new versions of the 1990 State, Metro, Elderly, and Unweighted samples. A problem in the MORTGAGE variable was corrected in the new samples.
Edited variables.
January 30, 2004
- Posted new versions of the 2000 1% sample, the 2000 5% sample, and the Census 2000 Supplementary Survey (C2SS). The 2000 1% and 5% samples now include variables for CITY and MIGCITY5. Minor problems in PWPUMAS, PWPUMA00, MIGPUMA, YRIMMIG, and MORTGAGE have also been corrected in the new samples. The new C2SS sample includes corrected values for INCBUS00 (all values were 0 before).
Edited variables.
September 9, 2003
Edited variables.
August 21, 2003
- Penultimate 1-in-100 version of the 1900 Minnesota sample released on the IPUMS Beta site. The dataset includes 170,438 households containing 754,631 individuals. This version has a number of flaws that will be corrected for the ultimate final version of the 1900 Minnesota sample, which we anticipate releasing in the Spring of 2004. The older 1-in-200 preliminary sample is still available via the data extract system at the main IPUMS-USA site.
- No cases from Alaska and Hawaii are included in the current sample.
- Data quality flags are not yet available.
- Detailed geographic variables are not yet available (these include MDSTATUS, METDIST, URBAREA, MCD, INCPLACE, and INCORP).
- Coding is not yet complete on the occupation variable (OCC).
- Native Americans enumerated on the special 1900 Indian Schedules are not included in the current sample (although the current version does contain Native Americans enumerated as part of the general population). The 1900 Indian Schedules contained questions not asked on the general schedule, including tribe, percentage Indian blood, and tax status, among others.
- Detailed German birthplaces in the current 1900 sample are coded according to the new scheme developed for the 1860 and 1870 samples. Users of this data should note that these codes do NOT correspond to those listed in the BPL variable description. Detailed German birthplace codes for the 1860-70 and 1900 samples are available here.
Users should also be aware that the smaller 1900 sample previously available (the 1-in-750 "Preston" sample) will no longer be available via the IPUMS extract system. Users wishing to access this data can still download the entire dataset and SPSS command file via the IPUMS raw data download page.
Added samples.
October 11, 2002
- Reposted preliminary version of 1900 Minnesota sample. The previous version had incorrect values for children ever born (CHBORN). The new dataset contains corrected values. No other variables have been changed.
Edited variables.
July 11, 2002
- Final versions of the 1860 and 1870 samples released. The final 1-in-100 1860 IPUMS sample includes 54,094 households containing 273,947 free individuals and an additional 1,343 unoccupied dwellings. The final 1-in-100 1870 IPUMS sample includes 79,023 households containing 383,308 individuals and an additional 1,447 unoccupied dwellings. Frequencies in the on-line documentation will be updated in the next few months. Both the 1860 and 1870 IPUMS samples are also available with oversamples of the black population. Sample weights for the flat and black oversamples have been adjusted to be representative of the total population.
- The final 1860 and 1870 IPUMS samples now include occupation codes based on the U.S. Census Office's 1880 classification system and detailed birthplace codes for individuals born in Germany. Several other changes have also been made, including a slightly modified urban/rural definition, minor changes in birthplace and occupation coding, and small changes in personal estate and real estate values. In addition, the final samples incorporate a few data additions and subtractions from the preliminary samples. For details of these changes and a listing of the new Germany detailed birthplace codes, click here.
Edited samples.
May 7, 2002
- Released preliminary version of the 1900 Minnesota sample. This 1900 Minnesota sample is a 1-in-200 nationally representative sample of dwellings taken from the 1900 U. S. Census of Population. The final version is scheduled to be released in 2004 and will have a 1-in-100 sampling density. Frequencies for this sample will be added to the documentation summer 2002. Currently both the 1900 Minnesota and the 1-in-760 1900 Preston sample are available. Ultimately the 1900 Minnesota sample will replace the 1900 Preston sample, although the Preston sample will be available by request.
- The fundamental difference between the two 1900 samples pertains to sample design. In the 1900 Preston sample nonfamily individuals--boarders, lodgers, inmates, and military personnel--were sampled as individuals regardless of household size. In contrast, the 1900 Minnesota sample follows the general sample design used for the 1850-1880 and 1920 samples. For a discussion of issues relating to sample design see Chapter 2 of the IPUMS documentation.
Added samples.
July 11, 2001
- The IPUMS extract system upgrade was successfully installed on Wednesday, July 11, 2001. No changes were made to the IPUMS data. The new extract system will process user data requests faster than the previous system and will prevent small jobs from being continually sidetracked for large data requests in the queue. Since this upgrade affects only the behind-the-scenes data extraction system, users will notice little change in the request process, itself. Re-registration is not required; previous jobs will be available for revision; and new jobs will begin numbering from the user's last completed job in the old system.
Edited extract system.
March 7, 2001
- Released new preliminary (penultimate) versions of the 1860 and 1870 samples. Frequencies in the documentation will not be changed until release of final versions of these datasets, scheduled for summer 2002. Two versions of the 1860 and 1870 samples are now available:
- a flat 1-in-100 sample of all dwellings, and
- a black oversample containing a 1-in-50 sample of dwellings containing one or more blacks and a 1-in-100 sample of all other dwellings.
The sample weights in both the flat and black oversamples of the preliminary 1860 and 1870 PUMS have been adjusted to be representative of the total population. Although we believe that the new samples are near their final form-we expect only minor changes in the number of cases and the coding of a few variables between the current and final versions of the samples--users are advised that the current releases have a few known problems. In particular, the occupation ("OCC") variable in 1860/1870 is not coded. Users should rely on the occupation 1950 basis ("OCC1950") variable for studying occupation and labor force participation. In addition, detailed birthplace codes are not available for individuals born in Germany. Users may still use the birthplace variable (BPL), but no detail will be returned for German birthplaces.
Edited samples.
August 18, 2000
- The old IPUMS extract system was replaced by a new system incorporating enhanced features requested by users. One of the key features of the new system is the ability to modify and resubmit previous jobs. Data files from the two systems have been combined on a user-specific summary site. IPUMS data users previously registered in either extract system will not have to reregister to use the new extract system. Extract requests in the new system will begin numbering jobs from the highest numbered job in a user's personal extract summary.
Edited extract system.
July 1, 1999
- New geographic variables (METDIST, MDSTATUS, MCIVDIV, INCSTR, INCORP, URBAREA) were added to 1850, 1880, and 1910 samples.
- Minor fixes to OCC1950, IND1950, CITIZEN, LIT, COUNTY, SEA, GQTYPE, GQFUNDS, NATIVITY, VOTE, MARRINYR, NAMEFRST, and NAMELAST.
- Missing age allocation procedures fixed to allow age 0 to be allocated. Improved rules for spouse imputation (IMPSP).
- Added cases from Bradley county, TN to 1850 that had been inadvertently dropped from the 1850 sample. PERWT adjusted slightly.
Added variables.
Edited variables.
January 22, 1999
- Major error in the November 25 version of 1860 and 1870 samples corrected. The 1860/70 samples had an error in SURSIM, which in turn created errors in all the family interrelationship variables (IMPMOM, IMPPOP, IMPSP) and in the variables constructed from them (NCHILD, NCHLT5, FAMSIZE, ELDCH, and so on). The error could also have implications for missing data allocation; we recommend tossing out any previous versions of 1860 and 1870.
Edited variables.
November 25, 1998
November 6, 1998
- Revised preliminary samples of the 1860 and 1870 census released. Two versions of both the 1860 and 1870 PUMS are now available: (1) a flat 1-in-200 sample of all dwellings, and (2) a black oversample containing a 1-in-100 sample of dwellings containing one or more blacks and a 1-in-200 sample of all other dwellings.
- The sample weights in both the flat and black oversamples of the preliminary 1860 and 1870 PUMS have been adjusted to be representative of the total population.
Added samples.
Edited variables.
August 20, 1998
- AGE Allocations 1850-1920. There was an error in the missing data allocation procedure for AGE affecting all pre-1940 samples. Since age is used as a predictor in many other allocations, constructed variables, and universe checks, the frequencies for many variables in the earlier samples have changed slightly from the original iteration of IPUMS-98.
- Split YRSINUSA into two separate variables--YRSUSA1 and YRSUSA2-- to enhance compatibility over time. YRSUSA1 (columns 145-146 in the raw data files) contains the unrecorded continuous measure of years in the U.S. from the 1900-1920 samples. YRSUSA2 recodes 1900-1920 and 1970-1990 into five intervals compatible among all sample intervals. Users desiring greater detail on the original 1970-1990 intervals can refer to YRIMMIG, which retains all of the original detail recorded in the variable discussion. Documentation change: the universe for 1980 should have excluded foreign-born persons who were citizens at birth.
- OCCSCORE, SEI. In 1850-1870, laborers who were changed via logical edit to farm laborers (i.e., they lived on a farm), continued to receive the OCCSCORE and SEI for laborers. They will now receive the score for farm laborers. The original 1900 sample incorrectly classified many domestics as "service workers, nec" in their original 1950 occupation classification. The IPUMS fixed the occupational code, but neglected to assign the appropriate SEI and OCCSCOREs for the new occupation. This has been rectified.
- RACE. In 1990, persons who indicated Hispanic origin were recoded out of "other race, nec" in the race variable into the category "Spanish write-in." Persons of Mexican origin were mistakenly excluded from this recode. This is now fixed.
- PERWT and HHWT in 1990. Previously, the IPUMS adjusted the 1990 weights so that the total weighted sample would yield the same population count as the published census returns. We removed this programming, since users could not reverse this change is they desired to, and because there seemed no reason to assert the accuracy of the 1990 count at this level of detail.
- CITYPOP, SIZEPL. In 1980, households in New York City received the code for "not identifiable" (codes 00000, 00) in the city population variables. New York can be identified, and we have changed the population codes accordingly.
- ANCESTR1 and ANCESTR2. An error in the 1990 PUMS documentation slipped into the IPUMS. Anyone with a code of 0324 (West German) should have been coded 0460 (Greek). This is now fixed.
- MBPL, FBPL. In 1970, recoded "U.S. possessions, n.s." to match the documentation (code 12091); it was incorrectly coded 13000 in the data.
- YRIMMIG documentation change: the universe for 1980 excludes foreign-born persons who were citizens at birth. Changed 969 code to 970; it refers to 1965-1970, not 1965-1969. Added 914, which refers to the period before 1915 in the 1970 sample. We also changed the data, recoding 969 to 970.
- EDUCREC and HIGRADE. In 1980, N/A (under age 3) and "no schooling" were combined. We have separated them.
- BPL. In 1850, some persons with a birthplace of Iowa should have been coded as being born in Indiana (a confusion over the interpretation of the abbreviation "IA"). We have added programming to separate these codes.
- CLASSWKR. Removed new workers (persons looking for work but who have never obtained their first job) from the universe for 1940 and 1950 in order to increase compatibility. In 1990, reassigned unemployed persons who last worked over five years ago to the N/A category. In all years, the relevant information is preserved in other variables (EMPSTAT and YRLASTWK).
- IND1950. The original 1940 contained an undocumented industry category. We determined that this is the category for "miscellaneous machinery" (code 358) The IPUMS had coded this category to "office and store machines" (code 357); we have recoded it to 358. In addition, the IND (contemporary industry classification) appendix for 1940 did not document this category. It has been added to the documentation.
Edited variables.
May 20, 1998
- OCC, OCC1950, FARM. Fixed a significant error in occupation coding in the 1860 sample (which also affected 1870, though to a much lesser degree). The missing data allocation procedure changed most persons with a blank response (no occupation) to having an occupation. This greatly overstated female occupational responses in 1860, particularly for married women. Since FARM status is inferred from occupation, and many of the allocated cases were farmers, the 1860 and 1870 samples overstated the number of farms. Both the 1860 and 1870 samples have been reconstructed to rectify this problem.
Edited variables.
March 24, 1998
- Made a significant, if somewhat subtle, change to the way the extraction system works. Altered the extraction system to zero out any variables that were "stacked" in the same column location as a requested variable. Previously, if you selected a variable that was not available in every sample chosen for extraction, the system would include whatever other variable was located in those columns in the raw IPUMS data files. For example, if you selected 1880 along with more modern samples and requested the variable Migration Status, 5 Years, the system would include the alphabetic data from the 1880 variable Last Name in those same extract columns. This caused considerable confusion among users.
Edited extract system.
March 1, 1998
- Changed weights in "small" and "tiny" samples to be representative of total population.
- Created a new Flat 1990 sample.
Edited variables.
Added samples.
February 17, 1998
- Changed the weights in the 1860 and 1870 files to account for oversample of blacks.
Edited variables.
January 1, 1998
- IPUMS-98 is available. For prior revisions, see Changes from IPUMS-95 to IPUMS-98.
Added samples.