Errata in and Revisions Made to IPUMS
Below is a list of upcoming changes to the IPUMS, along with significant changes made to the IPUMS since 1998. The IPUMS archive page contains fully-functioning versions of earlier IPUMS websites.
In addition to upcoming data releases, the following revisions will be made later in 2013:
- House numbers will become available in the STREET variable for the 1880 10% sample, and STREET will be made available without house numbers for the 1880 complete-count database.
- Surname similarity (SURSIM) will become available for the 1880 100 percent database.
- RECTYPEP and DATANUMP are required for the proper use of hierarchical extracts but are unavailable in the 1910 and 1920 Puerto Rico samples. They will become available.
Please check back later for updates, as well as any other errata and future revisions.
Revisions Made Previously
Feb. 17, 2014. Posted new 2010-2012 ACS/PRCS 3-Year data. These files include all cases in the previously-released single-year files from the 2010, 2011, and 2012 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2012 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
The 2010-2012 ACS/PRCS are generally similar to the 2009-2011 ACS/PRCS data, with several note-worthy differences:
- For the 2012 data, the Census Bureau changed the PUMA boundaries based on the 2010 Census data. As noted below for the 2012 ACS/PRCS, most of the 2012-based PUMAs cannot be mapped directly to the 2000-based PUMAS used in prior ACS releases. As a result of these changes the following IPUMS variables will be released at a later date: APPAL, CITY, CONSPUMA, COUNTY, HOMELAND, METAREA, METRO, PRCOUNTY, PUMASUPR, MIGMET1, MIGCITY1, MIGPUMS1, MIGTYPE1, PWCITY, PWMETRO, PWPUMAS, PWTYPE, and CITYPOP. The variables PUMA, MIGPUMA1, and PWPUMA00 are available. These variables contain the 2000 PUMA codes for year 2010 and 2011, and the 2010 PUMA codes for 2012. In addition to these differences, please see the revision note on December 27th for more information about coding changes to several variables that will affect the 2012 cases including PLUMBING, PHONE, FERTYR, RACE, TRIBE, OCC, ANCESTR1, ANCESTR2, BPL, MIGPLAC1, and LANGUAGE.
Dec. 27, 2013. Posted new 2012 1-year American Community Survey and Puerto Rican Community Survey data. Together, the 2012 samples contain over three million person records. The 2012 ACS is the seventh ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2012 data.
The 2011 and 2012 ACS releases are similar, but there are a couple of notable differences:
- The Census Bureau changed the PUMA boundaries based on the 2010 Census data. Most of the 2010-based PUMAs cannot be mapped directly to the 2000-based PUMAs used in prior ACS releases. As a result of these changes the 2012 data for the following IPUMS variables will be released at later date: APPAL, CITY, CONSPUMA, COUNTY, HOMELAND, METAREA, METRO, PRCOUNTY, PUMASUPR, MIGMET1, MIGCITY1, MIGPUMS1, MIGTYPE1, PWCITY, PWMETRO, PWPUMAS,PWTYPE, and CITYPOP. The variables PUMA, MIGPUMA1, and PWPUMA00 are available and represent the 2010 PUMA codes.
- Due to data collection problems, several variables contain incomplete data. PLUMBING is not available on the 2012 PRCS. Data on PHONE is not available for 6 PUMAs in Georgia; the suppressed data is coded 8. FERTYR also contains a large number of suppressed cases.
- The Census Bureau continued the practice of recoding married same sex couples from married to unmarried partners. They now provide the information needed to identify the recoded couples which can be found in QRELATE (9 "Same sex spouse changed to unmarried partner")
- Several changes to the RACE and TRIBE variables should be noted. The Census Bureau added codes to RAC2P for Hopi alone, Mexican American Indian alone, Yup'ik alone, and South American Indian alone while collapsing codes for Colville, Delaware, Houma, Menominee, Paiute, Yakamia and Yuman into "Other specified American Indian tribes alone." Several other RACE codes were added, including Bhutanese, Burmese, Mongolian, Nepalese, Marshallese, and Fijian. The code for Chinese was split into two codes, one for Chinese and one for Taiwanese.
- Slight changes were made to the OCC codes, mainly the combining several lesser used occupation codes.
- New codes were added to ANCESTR1 and ANCESTR2, BPL, MIGPLAC1 and LANGUAGE.
May 7, 2013. New, final versions of 1930 sample data are available. These include 5% and 1% samples. The 1% sample was drawn from the 5%, but there are minor differences in allocated values. Modifications to the previous version of the 1930 sample data focused on the following areas:
Detailed Geography. Much of this work related to reassessing breaks between enumeration districts (ENUMDIST). This also resulted in corrected values for minor civil division and incorporated municipality information (MCDSTR and INCSTR).
Occupation and Industry Codes (OCC, OCC1930, OCC1950, IND, IND1930, and IND1950). Most of this work involved assigning codes to previously unclassified records. Consistency checks were also applied that resulted in the correction of some misclassified records. Changes to the occupation codes also resulted in modifications to variables that rely on the occupation codes as input (e.g., occupational standing variables such as OCCSCORE).
The new 1940 release includes corrections as well as new data. Corrections were made to 374 person records that had been identified as living in Missouri that actually lived in Detroit, Michigan. Necessary changes were made to the relevant geographic and migration variables.
New geographical variables were added to the 1940 1% data that are no longer restricted by confidentiality requirements: COUNTY, METDIST, CITYMETD, URBAN and URBPOP data are now available. County, city, minor civil division, ward, tract and enumeration district information has also been added as two new sets of string variables, one that contains "clean", standardized strings (STDCNTY, STDCITY, STDMCD, STDWARD, STDTRACT, STDED) and one that records the strings exactly as they were entered (CNTYSTR, MCDSTR, WARDSTR, INCSTR). Also for the household record, the string variable GQSTR has been added, which contains the original group quarters response as it was entered.
New string variables have also been entered for all but 137,588 person-level records in the 1940 1% data. The records with the missing data can be identified using the SUBS4050 variable and selecting subsamples 2 and 20. The remaining data for those two subsamples will be added in the future. The new person-level string variables are: occupation and industry (OCCSTR and INDSTR), usual occupation and industry (UOCCSTR and UINDSTR), where the respondent was living in 1935 (MST5STR, MCNY5STR and MCIT5STR), and five other demographic variables (RELSTR, BPLSTR, FBPLSTR, MBPLSTR, MTONGSTR).
Two corrections were made to other IPUMS-USA samples:
The variable EDUC has been updated to reflect corrections by the Census to ACS 2001 and 2002 single-year files. The educational attainment question changed on the 1999 ACS questionnaire, which modified the response categories and eliminated the choice of "Vocational, technical, or business school degree." Previously the 2001 and 2002 single-year IPUMS data dictionary incorrectly showed labels for categories 65, 71 and 82 as "1 or more years of college credit, no degree," "2 years of college: Associate's degree - occupational program," and "2 years of college: Associate's degree - academic program," respectively. The correct data dictionary labels for categories 65, 71, and 82 are "Some college, but less than 1 year," "1 or more years of college credit, no degree," and "2 years of college: Associate's degree, type not specified," respectively.
For the 2007-2011 American Community Survey 5-year file the variable IND had incorrect values for the cases from 2011 due to a programming error. This error has been fixed.
Feb. 4, 2013. Posted new 2007-2011 ACS/PRCS 5-Year files. These files include all cases in the previously-released single-year files from the 2007, 2008, 2009, 2010, and 2011 ACS/PRCS. The new 5-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2011 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
Jan. 23, 2013. POVERTY thresholds were incorrect for individuals in families with over 7 people due to a programming error in all 1950-2011 files. This error has now been fixed.
Dec. 13, 2012. Posted new 2009-2011 ACS/PRCS 3-Year files. These files include all cases in the previously-released single-year files from the 2009, 2010, and 2011 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2011 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. The 2009-2011 ACS/PRCS are quite similar to the 2008-2010 ACS/PRCS data, except that the variables WRKLSTWK, DEGFIELD, and DEGFIELD2 are now included in the 2009-2011 ACS/PRCS files.
In addition, the supplementary health insurance variables have been added to the 2011 ACS 1-year file. These five new variables are: HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS. These summary health insurance variables were constructed by SHADAC. For more detailed information, consult the variable descriptions.
Oct. 30, 2012. Posted new 2011 American Community Survey and Puerto Rican Community Survey data. Together, the 2011 samples contain over three million person records. The 2011 ACS is the sixth ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2011 data.
The 2010 and 2011 ACS releases are remarkably similar, but there are a couple of notable differences:
- Data quality flags are now available for the following variables: FTOTINC, RENTGRS, HHINCOME, OWNCOST, INCEARN, INCTOT.
- In order to address concerns about fluctuations in the group quarters populations of small areas, in 2011 the Census Bureau supplemented the group quarters population in the ACS with a large-scale whole person imputation. Roughly as many group quarters persons are imputed as interviewed. See the ACS Group Quarters Small Area Estimation user note for more details on this change. Although this should have little impact on weighted estimates of the group quarters population, users should note that unweighted frequencies of the group quarters population are larger which increases the unweighted counts of people in the "NA" category of most household variables.
YRSUSA2 was wrong for a substantial number of cases in the 2006-2010 ACS file due to a programming error. This error has now been fixed.
DEGFIELD and DEGFIELD2 were updated to include new codes for 2010 ACS/PRCS. Users interested in comparing DEGFIELD or DEGFIELD2 over time should know that there may be different codes for the same field of degree across samples. For example, Neuroscience changed from code 4003 in 2009 to 3611 in 2010. In DEGFIELD and DEGFIELD2, IPUMS preserves each sample's full range of codes.
July 9, 2012 Released supplementary health insurance variables for the 2008-2010 ACS 1-year files. These five new variables are: HIURULE, HIUFPGBASE, HIUFPGINC, HIUID, and HIUNPERS. These summary health insurance variables were constructed by SHADAC. For more detailed information, consult the variable descriptions.
April 23, 2012 Some of the codes for the variable PUMARES2MIG displayed incorrect values due to a programming error. This error has now been fixed. The PUMARES2MIG codes were previously incorrect for the following states and PUMAS: Arkansas - 1000; California - 2601, 2602, 6701, 6702, 8101-8116, 8200; Kansas - 1401-1403, 1500; New Jersey - 701-703; Oklahoma 1100, 1200; Washington 2001-2009.
March 27, 2012 Released a preliminary version of the 1930 5% sample. Coding of string variables is still ongoing, with much of this work focused on the occupation and industry variables. We expect to release the final version in July.
Also, FARMSCHD in the 1930 1% sample had been coded incorrectly. The error has been corrected.
March 21, 2012 American Community Survey and Puerto Rican Community samples from 2006-2010 have been updated to include minor revisions to the POVERTY variable. For individuals with a group quarters (GQ) code of 4, about 4.5% of individuals were incorrectly omitted from the universe. This error has been fixed.
March 13, 2012 IPUMS USA samples from 1960 to the present have been updated to include CLUSTER and STRATA variables. For the 1960-2000 samples, strata were created based on the stratification criteria used to select Public Use Microdata Samples such as household size, age, race, ethnicity, home ownership, qroup quarters membership, and vacancy status. For the American Community Survey (ACS) samples, strata were created based on the lowest level of geography available in each sample. For the 2000-2004 samples, each state forms a stratum. In the 2005 onward ACS samples, strata were defined as unique Public Use Micro-data areas (PUMA). For more information on the creation of STRATA, see this page: Construction of Strata in the IPUMS Samples.
In addition, the US and Puerto Rican 2000 1% samples had incorrect OCC codes due to a programming error. This error has now been fixed. Extracts including OCC made between November 2nd 2011 and March 12th should be revised.
- For ACS and PRCS samples from 2000-2010, several allocation flag variables displayed incorrect values. The variables QINCOTHE and QEDUC were inaccurate for all of the ACS/PRCS samples. In addition, QHISPAN was incorrect for 2000-2004, QCOSTWAT had inaccurate codes for 2006-2010, and QDIFCARE was incorrect for the 2008-2010 samples.
- For all of the 1980-2010 samples, the COUNTY codes 90 and above (except Baltimore City) displayed inaccurate codes for Maryland state.
- For the 2008-2010 ACS and PRCS samples, parents were incorrectly coded as parents in law in the RELATE variable.
Jan. 23, 2012. Added new 2006-2010 ACS/PRCS 5-Year files. These files include all cases in the previously-released single-year files from the 2006, 2007, 2008, 2009, and 2010 ACS/PRCS. The new 5-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2010 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Information specific to the new 2006-2010 release follows:
Similar to the 2008-2010 ACS/PRCS, the data from the Census Bureau contained two different sets of occupation codes for the variables OCC and OCCSOC. The 2006-2009 cases contain the 2005-2009 ACS occupation codes, whereas the 2010 case contain the 2010 ACS occupation codes (a crosswalk of these changes is available at our Occupation and Industry Variables page). We provide a harmonized version in OCC1990. The original values can be found in the OCC and OCCSOC variables, although users should note that those variables contain codes that differ by the survey year, as described above. The 2006-2010 data also span minor changes made by the Census Bureau in 2008 to the classification of industries. The new classification system results in the addition of one industry code (6672), modification of one industry code (6670), and the deletion of two industry codes (6675, 6692) to the variables IND and INDNAICS.
Dec. 21, 2011. Added new 2008-2010 ACS/PRCS 3-Year files. These files include all cases in the previously-released single-year files from the 2008, 2009, and 2010 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2010 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. One notable difference compared to the 2007-2009 3-Year file is that health insurance and disability variables are now included. Also, we have enhanced the data from the Census Bureau in a couple of important ways. We have included health insurance edits to the 2008 and 2009 cases and we provide integrated occupation codes. The data from the Census Bureau contained two different sets of occupation codes for the variables OCC and OCCSOC. The 2008-2009 cases contain the 2005-2009 ACS occupation codes, whereas the 2010 case contain the 2010 ACS occupation codes (a crosswalk of these changes is available here Occupation and Industry Variables). We provide a harmonized version in OCC1990. The original values can be found in the OCC and OCCSOC variables, although users should note that those variables conain codes that differ by the survey year, as described above.
In addition, the following errors have been fixed:
- For the 2010 samples, a small number of values were reassigned to the variables OCC1990 and OCC1950 because of new information. This coding change affects less than one percent of cases and has a relatively minor impact on the occupational standing measures. Any extract including these measures made between November 2nd and December 20 should be requested again.
- For the US samples from 2000 to 2010, the values of QINCWAGE and QINCSS were reversed. The programming error has been fixed and the data now displays the correct values.
- For the 1940-2010 samples, incorrect values were given to the variable CPI99, which provides the CPI-U multiples to convert dollar figures to constant 1999 dollars. The programming error has been fixed, and all samples now display the correct CPI-U multiplier value. Any extracts including CPI99 made between November 2 and November 17 should be requested again.
Nov. 2, 2011. Posted new 2010 American Community Survey and Puerto Rican Community Survey data. Together, the 2010 samples contain over three million person records. The 2010 ACS is the fifth ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2010 data.
The lowest level of geographic identifier in the 2010 ACS is the PUMA; 2010 PUMAs have the same boundaries as those in the 2005-2009 ACS and the 2000 census samples. The IPUMS version of the 2010 ACS provides the following additional geographic identifiers: CITY, COUNTY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. Additionally, information on unrelated subfamilies--a category not measured by the Census Bureau--is available. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
The 2009 and 2010 ACS releases are quite similar, but there are some differences:
- New codes have been added to reflect Census Bureau changes to occupation (OCC). IPUMS is working on documenting the changes to the occupation codes; until then, users can consult the original Census Bureau data dictionaries.
- New codes have also been added to reflect Census Bureau changes to the field of degree first and second entry variables (DEGFIELD, DEGFIELD2). The code for "Neuroscience" has been changed from 4003 in 2009 to 3611 in 2010 and the code for "Multi-Discplinary or General Science" has changed from 4008 in 2009 to 5098 in 2010. New codes have been added to reflect "Multi/Interdisciplinary Studies" (4000), "Materials Science" (5008), and "Miscellaneous Fine Arts" (6099). In addition, "Precision Technologies" was previously included in code 5801 and is now included in code 5701.
- For Puerto Rico only, the maximum category available changed for several variables. Individuals who are 93 or older are coded as 93 in 2010, number of bedrooms (BEDROOMS) is top-coded at 7 bedrooms, and number of rooms (ROOMS) has a maximum category of 11.
- For occupation, 1990 basis (OCC1990), judges could not be distinuished from lawyers in the original Census Bureau data between 2005 and 2009, so they were grouped with lawyers as code 178. Code 179 for judges is available again for 2010.
- One person was coded as 67 ("Two or more races") in the original Census Bureau variable RAC2P and 18 ("Filipino alone") in the original Census Bureau variable RAC3P. We assigned a code of 883 (Filipino and 'other race' write-in) in the IPUMS (RACE) variable.
New versions of every sample have also been posted. The migration variables (MIGRATE1 and MIGRATE5) have been fundamentally revised. These variables have been available since 1940, but the original Census Bureau variables have contained progressively less information over time. For example, in the 2000-onward ACS/PRCS, individuals are simply coded "same house," "different house in the U.S.," or "different house outside the U.S." in the original census data. However, it is possible to construct additional detail about these movers from other census variables, in particular MIGPLAC1 and MIGPLAC5. In the past, users interested in additional migration detail in later samples have needed to manually recode these other variables. Users interested in comparing the migration variables across time have confronted a non-harmonized coding scheme and comparability differences across both years and samples. The current revised versions of MIGRATE1 and MIGRATE5 simplify both tasks by:
- Adding migration information by incorporating details from other variables. Without recoding or selecting other variables, users now have access in MIGRATE1 and MIGRATE5 to relevant detail from variables detailing previous place of residence (MIGPLAC1 and MIGPLAC5), current state of residence (STATEFIP), PUMAs of migration (MIGPUMA1 and MIGPUMA), and PUMAs (PUMA and PUMARES2MIG).
- Adapting a harmonized coding scheme across years and samples. Users interested in comparing the variables across time now may now use general and detailed codes that are consistent across samples to the extent that information is available in given samples. For example, whereas state contiguity was previously available for only the 1950 1% PUMS, all samples now contain codes that distinguish movers between contiguous states and movers between non-contiguous states.
- Including codes for movers who moved between PUMAs or moved within PUMAs for 2000 and 2005-onward samples.
There are three other major changes to the data:
- To avoid the potential identification of individuals, the Census Bureau collapses some Public Use Microdata Areas (PUMAs) into larger PUMAs of migration ((MIGPUMA1 and MIGPUMA). A new variable, PUMARES2MIG, adapts the Public Use Microdata Area codes for individuals' place of residence to the scheme for PUMA of previous residence. This allows the PUMA in which individuals lived previously (MIGPUMA/MIGPUMA1) to be compared directly to the PUMA in which individuals currently reside (PUMARES2MIG).
- Starting with the 2003 ACS, the Census Bureau began providing four-digit occupation codes (OCC). Because the first three digits replicated the previous occupation codes and the fourth digit was always a zero, the IPUMS eliminated that fourth digit for greater comparability with previous codes. The 2010 ACS/PRCS is the first sample to include substantive detail in the fourth digit. For greater comparability across ACS/PRCS samples, the IPUMS versions of the 2003-2009 ACS/PRCS data now include the fourth digit. Users who need to replicate analyses from prior extracts can safely drop the fourth digit, as it contains no necessary detail. As a reminder, the IPUMS also includes fully harmonized versions of occupation (OCC1950 and OCC1990).
- Due to a programming error, missing values of INCWAGE received codes of 996441 instead of the correct 999999 codes for all 2008 cases in the 2007-2009 and 2005-2009 multi-year data. This has been corrected.
August 18, 2011. Posted new versions of the 1900 5% sample and all 2000-onward samples:
- The following errors in the 2007-2009 3-year ACS/PRCS and the 2005-2009 5-year ACS have been corrected:
- Persons with an employment status (EMPSTAT) detailed code of 14 ("Armed forces, at work") or 15 ("Armed forces, with job but not at work") should have received OCC1950 codes of 595 ("Members of the armed services") but instead received other occupational codes.
- The code for calculating POVERTY was not fully updated for these new multi-year samples. As a result, POVERTY codes were too high by about 65 percent on average in some cases. This error affected only persons who are not related to the householder, about 79 percent of whom (3 percent of the total cases) received erroneous POVERTY values.
- The sample density of the 2005-2009 5-year ACS was incorrectly specified in our metadata, and any users who customized their sample sizes in our extract system received numbers different from what they had requested.
- The 2005-2009 Puerto Rican Community Survey 5-year PUMS data are now available.
- Subfamily measures are available for households with more than 9 persons unrelated to the householder (GQ codes of 5). This affects only the 2000-onward samples, and fewer than 200 households in each sample.
- Birth year (BIRTHYR) was not made available for the 1900 5% sample. It is now available.
August 10, 2011. The 2007-2009 and 2005-2009 multi-year American Community Survey data are now available, along with the 2007-2009 multi-year Puerto Rican Community Survey data. (Technical problems prevented the release of the 2005-2009 Puerto Rican Community Survey data; it will be released by August 18, 2011.) The 2007-2009 3-year file includes all cases in the previously-released single-year files from the 2007, 2008, and 2009 ACS/PRCS; the 2005-2009 5-year file includes all cases in the previously-released single-year files from the 2005, 2006, 2007, 2008, and 2009 ACS/PRCS. Yet the new multi-year files differ in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2009 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Information specific to the new 2007-2009 and 2005-2009 files follows:
- The original Census Bureau variable SMOCP (the IPUMS analogue is OWNCOST) contains erroneous values for mobile homes among the 2005 and 2006 cases in the 2005-2009 multi-year ACS data. Their suggested edit has been applied in the IPUMS data.
- The Census Bureau has documented data processing errors in the 2008 single-year ACS because questionnaire changes were not reflected in their editing procedures. While they state that the multi-year data products incorporate the corrected data, this does not seem to be the case, as the Census Bureau codes for the 2008 data appear to be the same in the single-year and multi-year files (except for necessary harmonization).
- Revised 2000 census data that includes the Census Bureau's corrections for incorrect AGE data are now available via the IPUMS. For the 5 percent United States, the 5 percent Puerto Rico, and the 1 percent unweighted US sample, the correct AGE values are now part of the AGE variable, while the previous, incorrect values are contained in AGEORIG. All variables constructed by IPUMS are based on AGE, and IPUMS recommends that AGEORIG not be used for purposes other than testing the sensitivity of previous analyses. Uncorrected errors in the person identifiers of the original Census Bureau 1 percent sample for the United States and Puerto Rico prevented the revised data from being linked to the original data, and revised 2000 1 percent samples for the United States and Puerto Rico have now replaced the old files entirely. The original files with erroneous age values remain available for testing purposes.
- Birth year (BIRTHYR), previously included in only the 1900 and 1910 samples, is now available for all samples. In most cases, it is calculated simply as the difference between survey year (YEAR / MULTYEAR) and AGE, although we have refined this crude calculation where additional information on the quarter of birth (BIRTHQTR) is available. Because of these inaccuracies and the tendency of people to report their age as a round number (particularly in the older samples), IPUMS recommends caution when using this variable. It can be used quite effectively in the extract system, however, to select synthetic cohorts (e.g., all people born between 1928 and 1932) that can then be followed through multiple census years.
- Information on years residing in the United States (YRSUSA1 and YRSUSA2) is constructed from survey year (YEAR / MULTYEAR) and year of migration (YRIMMIG) for all ACS/PRCS samples. The actual year of survey for the multi-year ACS/PRCS files was contained in YEAR until January 2010, when it was shifted to MULTYEAR. Our programming for YRSUSA1 and YRSUSA2 did not account for this shift. As a result, YRSUSA1 and YRSUSA2 contained errors (and conflicted with YRIMMIG) for all 2005 and 2006 cases in the 2005-2007 multi-year files, and for all 2006 and 2007 cases in the 2006-2008 multi-year files. This has been corrected.
- In all 2000-onward samples, IPUMS applied an inappropriate universe edit to the units in structure variable (UNITSSTR). The correct universe is all housing units that are not group quarters (GQ), but all units used for commercial purposes (identified in COMMUSE) were also recoded to the N/A category (UNITSSTR codes of 0). This error has been corrected, and we now perform a simple recode of the original Census Bureau variable without any additional edits. The overall distribution of UNITSSTR is very similar whether or not units used for commercial purposes are included in the universe, although estimates of the number of each type of unit in a given area will be different.
- A programming error resulted in incorrect values of GQ for households with many in-laws and other non-relatives in the 2008 and 2009 single-year ACS/PRCS. Many of these households are actually households under the 1970 definition but were instead classified as "additional households under 1990 definition" or "additional households under 2000 definition". The error affects approximately 6,500 (unweighted) person records and has been corrected. This error also affected GQTYPE and other variables where GQ is used in programming. In particular, there were erroneous POVERTY values of 0 for approximately 200 (unweighted) children under 14 in these affected households. This has also been corrected.
- Vacant units in the 1860 and 1870 samples were erroneously coded as non-vacant in the group quarters variable (GQ). This has been corrected.
- Housing units using a miscellaneous "other" fuel type for cooking should have received FUELCOOK values of 10. Instead, they received values of 1, the code reserved for housing units that do not use cooking fuel (FUELCOOK values of 1). This has been corrected.
- We have made miscellaneous improvements and clarifications to our documentation.
Finally, our extract system has been streamlined so that users see a summary of their extract first. If you just want a standard rectangular extract with no extra features, you can make it immediately. Or you can change aspects of your extract (data structure, customized sample sizes, case selection, attached characteristics of other household members, and data quality flags).
July 12, 2011. The linked representative samples have been updated. The update primarily affects variables that were not present in the original Church of Jesus Christ of Latter-day Saints complete-count database for 1880: DEAF, BLIND, MAIMED, IDIOTIC, INSANE, SICKNESS, MARRINYR, SCHOOL, LIT, MOUNEMP, and QTRUNEMP. Previous versions of the data contained information for these variables if the record was part of the 1880 10% sample. The updated versions now contain information for these variables for all 1880 records.
June 15, 2011. The IPUMS-USA extract system now allows users to customize their sample sizes. This is useful for researchers who do not need or cannot use the large number of cases contained in some IPUMS-USA samples. It can also be used to obtain small testing datasets before running a program on a large dataset. For more information, see the FAQ.
The following errors have also been fixed:
- In all 1950-2009 samples, some individuals under age 15 and in an unrelated subfamily were erroneously given POVERTY values of 0 because of a programming error. This has been corrected, and such individuals now receive the correct poverty value based on their subfamily's total income. This affects up to 20,000 cases in the 1980-2000 decennial samples, and approximately 5,000 cases in each of the single-year ACS files.
- In all years except 1940 and 1950, SLWT contains the contents of PERWT to facilitate cross-temporal analysis of variables on the sample line in 1940 and 1950. Due to a programming error, this was not implemented correctly for the 1900 1% and 1910 1.4% samples with oversamples, along with the 1900 1% sample. This has been corrected.
Most notably, the Census Bureau's November 2010 revisions to the ACS/PRCS samples are now incorporated into the IPUMS:
- To address problems in the Census Bureau's disclosure avoidance procedures, AGE has changed in the 2003-2005 ACS/PRCS samples and for the 2005 cases in the 2005-2007 3-year file. Ages were subject to change only among people who were formerly coded as being at least 65 years old. (The exception is the 2004 ACS, in which several people who formerly had ages of less than 65 now have ages of at least 65.) Because the new ages appear to have been created via synthetic data techniques, IPUMS has marked AGE as allocated (QAGE codes of 4) for all people aged 65 and up. The former, erroneous ages are now contained in a new variable, AGEORIG, which allows users to analyze the effects of the age revisions on their own research. Please note that the age revisions for the 2006 ACS have been available via the IPUMS since January 2010, with the former, erroneous values of AGE contained in AGEORIG06. AGEORIG subsumes the variable AGEORIG06 and provides original values for all of the affected samples and years, including the original values contained in AGEORIG06. Researchers should use AGEORIG only for sensitivity analyses; AGE contains more plausible values for people's true ages.
- The Census Bureau's errors in adapting their editing procedures to the new 2008 ACS questionnaire have been corrected for the 2008 ACS/PRCS and for the 2008 cases in the 2006-2008 multi-year ACS/PRCS. (For more information about these errors, see ACS Errata note 53, note 54, and note 64.) These include the following household-level variables in the IPUMS:
- Telephone service (PHONE and its data quality flag QPHONE);
- Kitchen facilities (KITCHEN and its data quality flag QKITCHEN);
- Refrigerator presence (FRIDGE and its data quality flag QFRIDGE);
- Number of bedrooms (BEDROOMS and its data quality flag QBEDROOM); and
- Number of rooms (ROOMS).
- Because of an unspecified Census Bureau error, the variable represented by MOVEDIN contained erroneous values in the 2004 ACS. This has been fixed; the original variable is contained in MOVEDINORIG so that users can assess how their results may have changed.
- In the processing of these revisions, several other variables changed slightly in the original Census Bureau data. These changes are not due to documented Census Bureau errors, and original variables were not created to preserve the changes. These changes fall into three broader categories:
- The Census Bureau veteran variables represented by VET01LTR, VET47X50, VET90X01, VETKOREA, VETOTHER, VETSTAT, and VETWWII changed slightly for some of the samples in which the Census Bureau revised AGE, partly as a result of the age revisions.
- Topcoded values in CONDOFEE, FTOTINC, HHINCOME, INCEARN, INCINVST, INCOTHER, INCWAGE, OWNCOST, RENTGRS, and RENT changed slightly in some of the revised samples.
- The changes in the AGE values affected the IPUMS family interrelationship variables, which rely on information about household members' ages. The IPUMS also uses AGE to classify the householder's in-laws, since the original Census Bureau data does not distinguish among parents-in-law, siblings-in-law, and children-in-law. Consequently, some people's values of RELATE may differ, although only in-laws are affected. See "In-Law Classification Procedures" for details.
IPUMS now offers integrated versions of the original Census Bureau subfamily variables that parallel the subfamily variables constructed by IPUMS. Newly available variables include CBSFRELATE (relationship within the subfamily), CBSFTYPE (type of subfamily), CBSUBFAM (subfamily number), and CBNSUBFAM (number of subfamilies in the household). Users should note that the Census Bureau's procedures for classifying subfamilies have changed dramatically over time, so these variables are useful mainly for the comparability they offer with the Census Bureau's summary files. See our subfamilies page for more information.
Information on the TRIBE of American Indians has been improved as well. Most notably, persons who were previously classified (incorrectly) as "Alaska Native, tribe not reported" in all 2000-onward samples are now classified correctly as "American Indian or Alaska Native, tribe not reported." Recoding improvements were made to the the 1990, 2000, and ACS samples; labeling improvements were made to the 1900 and 1910 samples.
Several improvements were made to the 1900 census 5% sample:
- Dwelling size (DWSIZE) was corrected (about 16% of all values have changed).
- ENUMDIST and SUPDIST were reversed in the last version (i.e., ENUMDIST contained data for SUPDIST, and vice versa). This has been corrected.
- About 1800 changes were made to GQ and GQTYPE.
- Some geography variables were improved. For example, about 900 household COUNTY, and 9500 household METRO codes changed (the METRO changes were mostly in Washington, DC.). About 6% more CITY values were populated.
- Data in PFARMSCH was corrected.
- About 52,000 changes were made to OCC, OCC1950, and IND1950, reflecting more complete coding.
- PERWT values were integers in the old version. They now correctly have two decimal places.
- There were small corrections made to NSIBS.
- The allocation for OWNERSHP was corrected. In the old version the distribution between 'owned' and 'rented' for allocated did not follow the distribution for non-allocated values.
- The distribution for RELATE changed slightly; most notably, there are hundreds fewer foster children and hundreds more servants, "other probable domestic employees" and "other non-relatives".
Finally, because of improvements to our data construction, editing, and allocation procedures, many variables have been refined. In particular:
- In the 1940 and 1950 samples, Charleston, WV (CITY codes of 1070) was mistakenly identified as Charleston, SC (CITY codes of 1050) and was coded as having the population of Charleston, SC in CITYPOP. Charleston, WV is now identifiable in the IPUMS 1940 and 1950 samples and has the correct population in CITYPOP. Charleston, SC is not identifiable as a city in 1940 and 1950, and this is now reflected in CITY and CITYPOP.
- PUMA 00201 in Ohio was erroneously coded as being part of the Texarkana (Texas) metropolitan area in METAREA. It is now properly included in the Toledo (Ohio) metropolitan area.
- Riverside City (California) was improperly coded as Riverside County via COUNTY in the 1980 5% sample. Riverside County is not identifiable in this sample, and this is now reflected in the data.
- SURSIM contained nonsensical values in the Hispanic oversample (SAMP1910 codes of 7) in the 1910 1.4% sample with oversamples. This has been corrected, as have all variables based on SURSIM (such as the family interrelationship variables).
- QGQ and QGQTYPE were previously coded non-zero for vacant units (which are not in these variables' universes. This has been corrected.
- VETCIVWR contained nonsensical values in all 1910 samples. This has been corrected.
- CHBORN contained nonsensical values in the 1970 Puerto Rico samples. This has been corrected, as have all variables based on CHBORN (such as the family interrelationship variables).
- All values that are allocated by IPUMS-USA in the 1850-1930 samples have been assigned more accurately. In most samples, very few cases have actually changed. The improvements are most noticeable in the 1900 and 1910 samples, although typically fewer than 100 cases have different values of any one variable.
Nov. 18, 2010. Posted new 2008-2009 ACS data. Because of miscommunications with Census Bureau staff, the health insurance edit for VA (HINSVA) and Indian Health Service (HINSIHS) insurance was performed incorrectly. These variables and their accompanying flags are now correct, and documentation of the edit has been updated on the ACS health insurance page. Additionally, the edits for all health insurance variables have been applied to Puerto Rico data (this were not done before).
Nov. 10, 2010. Posted new 2008-2009 ACS data. Because of a programming error, the data posted on Nov. 9 contained incorrectly edited variables for Medicaid (HINSCAID), Medicare (HINSCARE), and military insurance (HINSTRI) coverage, which affected the summary variables for any (HCOVANY), private (HCOVPRIV), and public (HCOVPUB) coverage. These variables and their accompanying flags are now correct.
Nov. 9, 2010. Posted new 2009 American Community Survey and Puerto Rican Community Survey data, along with revised data for 2008. Together, the 2009 samples contain over three million person records. The 2009 ACS is the fourth ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2009 data.
The lowest level of geographic identifier in the 2009 ACS is the PUMA; 2009 PUMAs have the same boundaries as those in the 2005-2008 ACS and the 2000 census samples. The IPUMS version of the 2009 ACS provides the following additional geographic identifiers: CITY, COUNTY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. Additionally, information on unrelated subfamilies--a category not measured by the Census Bureau--is available. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
The 2008 and 2009 ACS releases are quite similar, but there are some differences:
- Health insurance variables are now edited for consistency, and these edits have also been applied to the 2008 ACS/PRCS data. The original (unedited) variables are also available through our extract system. For more information, see the ACS health insurance page.
- A new RACE code (869) describing persons who identify as both Japanese and Korean is available.
- Because of several errata in the original 2008 ACS PUMS, users should exercise caution in interpreting change over time in the number of rooms (ROOMS), the number of bedrooms (BEDROOMS), telephone service (PHONE), and kitchen facilities (KITCHEN). The revised version of the 2008 ACS PUMS will be available in IPUMS-USA in January 2011.
October 20, 2010. ENUMDIST from the 1880 IPUMS complete count database was updated in all 21 of the linked representative samples. In the previous versions, ENUMDIST from 1880 had a large proportion of missing values. This has been corrected.
October 13, 2010. Weights in the linked representative sample for males, 1880-1930 have been revised. In the previous version of the male file, PERWT was constructed with erroneous age proportions for 1930. The problem was corrected and PERWT recalculated.
- Geographic identifiers are now available for selected counties (COUNTY) for 1940-1950, 1970-2000, and 2005-onward. Although they are not identified in the original Census Bureau PUMS, counties with populations of at least 100,000 can be identifed via other geographic identifiers in the data (the other counties receive codes of "0000"). COUNTY thus joins CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS in the list of variables constructed from census data and unique to the IPUMS.
- A new variable (NHGISJOIN) provides an alternate way of identifying counties in IPUMS-USA data from 1850-1930. It can be used to link IPUMS-USA microdata with aggregate data from the National Historical Geographic Information System (NHGIS), thus making contextual analysis easier for historical data.
- A new variable describing Appalachian residence (APPAL) is now available for 1850-1950, 1980-2000, and 2005-onward. Like COUNTY, it is built from available geographic identifiers.
Several miscellaneous errors have also been corrected:
- The syntax statements for YRIMMIG contained incorrect value labels. Single years were unaffected, but single years that also represented ranges of years in some samples were off by one year. For example, the code 1931 means that the respondent came to America in 1931 for all samples in which the code appears. However, for the 2005-onward ACS, the code 1932 means that the respondent came to America in 1931 or 1932 (in addition to its standard meaning of 1932 in other samples). The value label for codes of 1932 should have read "1932 (2005-onward ACS: 1931-1932)", but a formatting error yielded "1933 (2005-onward ACS: 1931-1932)" instead. The problem affected the value labels only; YRIMMIG data remain the same. All codes that represent ranges of years rather than individual years are now visible on the YRIMMIG codes page.
- YRIMMIPR has been streamlined to mimic YRIMMIG: ranges of years are now standardized such that the 4-digit code represents the latest year in which the respondent could have moved to Puerto Rico.
- Due to a programming omission, respondents classified as having a RACE of Aleut were erroneously coded as Hawaiian in RACESING for all 1970 samples, and Eskimos were coded as Korean. Aleuts and Eskimos now appear as Alaska Natives in RACESING.
- OWNCOST contained nonsensical data for the 2003 and 2004 ACS samples. Correct data are now available via the extract system.
- Two disability variables (DIFFREM and DIFFSENS) were omitted from the IPUMS versions of the 2003 and 2004 ACS. They are now available; however, users should exercise care when comparing them to earlier surveys because of question changes.
June 8, 2010. An error in the programming for the June 4 revision resulted in PERWT values of 0 for all cases in the 1880 1% and 1880 10% samples. These samples now contain the correct PERWT values.
June 4, 2010. Posted final versions of the IPUMS Linked Representative Samples. More information available here.
Posted new versions of all IPUMS samples. There are several new variables:
- CPI99 provides inflation factors to adjust dollar amounts into constant 1999 dollars. It is a constant value within each census year. CPI99 will be especially useful for extracts containing multiple years of data; users will need only to multiply dollar variables by a single inflation variable instead of manually multiplying different years of data by different constants. For more information, see our CPI adjustment page.
- For cases sampled from large households, NUMPERHH provides the number of people in that larger household. Several samples from 1850-1930 had errors in NUMPERHH. See the variable description for more information.
Other variables have been modified:
- For users' convenience, the standard weight variables for 1850-1930 will contain the more detailed weighting information previously available only in the detailed weights. Specifically, PERWT (and, aside from the 1940 and 1950 samples, SLWT) will contain values previously available in PERWTDET, and HHWT will contain values previously available in HHWTDET. PERWTDET and HHWTDET are no longer available. See the variable descriptions for full information.
- The new versions of PERWT and HHWT provide weight values to two decimals of precision for 1850-1930 data. Default weighting procedures in SPSS and SAS can work with fractional weights. When tabulating variables in Stata, however, users will need to specify that the weights are importance weights, which allow decimals. (The default weight in Stata's tabulate command is a frequency weight, which does not allow decimals.)
- There are slight improvements to the imputation of relationship to head (IMPREL), available only for 1850-1930 samples.
- Some residents of institutions (GQ codes of 3) were coded erroneously as being in the labor force (LABFORCE codes of 2) in the 1850-1930 samples. All such cases are now coded as NIU (LABFORCE = 0) or as not in the labor force (LABFORCE = 1).
- IMPREL, IMPMOM, IMPPOP, and IMPSP are now available in the 1880 10% sample.
- Data from one missing reel of microfilm was restored to the 1880 complete-count database.
- Parental and spousal links were occasionally illogical for households in the 1900 and 1910 data due to an error in how information on surviving children (CHSURV) and children ever born (CHBORN) was handled. This has been corrected.
- Six cases in the 1930 sample had RELATE codes of 0 (not a valid relate code). They now have the correct codes.
- RESPMODE (available only in the 2005-onward ACS/PRCS) is now a household variable rather than a person variable. No data have changed, however.
- INDNAICS in the 2008 ACS contained unnecessary characters around the correct values. It is now consistent with other years.
- There is a new code (did not work last year, but did work in the past five years) for WORKEDYR.
March 4, 2010. Posted new versions of 2003-onward files.
- Following conversations with Census Bureau staff about the calculation of dollar adjustment factors in the ACS/PRCS, the IPUMS no longer automatically applies the Census Bureau's adjustment factor to any dollar-amount variable in the ACS/PRCS. Such variables now exist in the IPUMS exactly as they were released by the Census Bureau, and IPUMS recommends that users analyze them without applying the adjustment factor. For more information, see the ACS adjustment page. Users who want to adjust dollar amounts may use the new variable ADJUST, which provides the adjustment factor as provided by the Census Bureau.
- Although adjustment factors are no longer applied automatically, users should know that there were problems in implementation for the 2005-2007 and 2006-2008 3-year files available between Jan. 12 and March 4, 2010. The adjustment factors in these samples should have varied with MULTYEAR. However, a programming error applied only the 2007 adjustment factor to all cases in the 2005-2007 3-year ACS/PRCS, and only the 2008 adjustment factor to all cases in the 2006-2008 3-year ACS/PRCS. Because these adjustment factors contained the CPI-U values to convert dollar amounts to constant dollars, dollar values were inaccurate. In the 2005-2007 3-year file, all dollar amounts should have been expressed in 2007 dollars, but dollar values for 2005 cases were too small by 6.1%, and dollar values for 2006 cases were too small by 2.7%. In the 2006-2008 3-year file, all dollar amounts should have been expressed in 2008 dollars, but dollar values for 2006 cases were too small by 6.1%, and dollar values for 2007 cases were too small by 3.5%. This error carried through to POVERTY values for nonrelatives of the householder (RELATE codes of 11, 12, and 13).
- In the 2003-onward ACS/PRCS, industry codes contain four digits of detail, but IPUMS codes (IND) formerly contained only three digits. All four digits are provided now. IND1950 and IND1990 (recoded, harmonized versions of industry) remain the same. New codes pages are available through IND and INDNAICS.
- OWNCOST was mistakenly omitted from the 2005-2007 and 2006-2008 ACS 3-year files. It is now available.
- Inspection of the revised 2006 AGE data revealed that almost all cases 65 years and older have changed in value, suggesting that the revised AGE data were imputed or created using synthetic data techniques. All persons age 65 or older in the 2006 ACS/PRCS (and all 2006 cases in the 2006-2008 3-year file) now have values of 4 in QAGE to indicate probable allocation.
Future revisions of the IPUMS data will be posted on a quarterly schedule (the beginning of March, June, September, and December). The next update will be on June 1, 2010.
February 10, 2010. Posted new versions of 1940-onward files.
- Due to a programming error, respondents in the 2000-2008 ACS/PRCS files with total family incomes (FTOTINC) of 0 or very small amounts received POVERTY codes of 0, which are reserved for N/A cases (group quarters and unrelated individuals not in a subfamily under age 15). They now receive POVERTY codes of 1.
- Respondents who were unrelated to the householder, under the age of 15, and not in an unrelated subfamily should have received POVERTY codes of 0 (N/A); instead, they were coded as 1. This has been corrected.
- Respondents in the 1970 and 1980 Puerto Rico samples who should have received GRADEATT codes of 0 instead received codes of "ZZ". This has been corrected.
Posted new versions of 1870 and 1900 datasets. In the new 1870 data, OCC and OCC1950 values are included for people uder the age of 16 who reported an occupation. The previous version coded these people as "not in universe." In 1900, several cases previously having invalid SPEAKENG values are now coded properly.
January 28, 2010. Posted new versions of 1900-1910 and 1940-onward files. Two minor changes have been made:
Extra information on military service is now available in VETSTAT. Since 1990, the non-veteran category has included detail on people without military service, service members currently on active duty, and (since 2003) people whose only service is training in the National Guard or Reserves. The "Yes, Veteran" category has included detail on both service members who were previously on active duty at any time and (in 1990 only) people who were activated from the National Guard or Reserves. This detail is now contained in the new detailed version of VETSTAT; the general version coding remains the same as the previous VETSTAT coding. However, the actual frequencies in the general version differ from the previous one-digit version of VETSTAT for three reasons:
- In 1940, VETSTAT also contained valid codes for persons younger than 18, contrary to the stated universe for that year. All persons under age 18 are now classified as N/A in 1940.
- In 1950-1970, persons currently on active duty were not in the universe. They are now classified as non-veterans.
- Persons who should have been coded as non-veterans because they had undergone only training in the National Guard or the Reserves were erroneously classified as veterans in the 2000 census and 2000-2002 ACS data. As a result, the number of veterans was overestimated by 19.0 percent in the 2000 census and by 15.5 percent in the 2000-2002 ACS data (the difference comes from the fact that the ACS did not include group quarters in their sample). This has been corrected.
For more information, see the VETSTAT variable description.
- STATEFIP identifiers are now available in the 1% and 1.4% 1900 and 1910 samples for Alaska and Hawaii, and STATEICP is now available in 1900 for these two states (it has long been available in 1910).
Other changes are limited to the ACS/PRCS:
- The 2006-2008 3-year file contained incorrect data for RACESING, with the result that many RACESING codes were inconsistent with RACE responses. RACESING is now correct.
- AGE has been replaced in the 2006 ACS/PRCS file. The Census Bureau released revised 2006 data in December 2009 to fix a problem with disclosure control techniques. The original age variable is still available as AGEORIG06; for full details, see the AGEORIG06 variable description. Because the family interrelationship pointer variables (MOMLOC, POPLOC, and SPLOC) rely on age, some of these have changed as well; STEPMOM, STEPPOP, and other variables based on family units also stand to be affected.
- The DIFFCARE variable in the 2008 ACS/PRCS contained data for DIFFMOB instead; the correct variable is now available.
- QAGE was mistakenly not made available for the 2005-onward PRCS or for the 2006-2008 3-year files. It may now be downloaded.
January 12, 2010. Added 2006-2008 ACS/PRCS 3-Year files. These files include all cases in the previously-released single-year files from the 2006, 2007, and 2008 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2008 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Another notable difference is that some values of AGE in the 2006 portion of the 3-year file differ from those in the 2006 single-year file because of a change to the Census Bureau's disclosure avoidance methods. The Census Bureau has re-released the 2006 single-year file with the revised AGE variable, and it will be added to the IPUMS database soon.
- New variables were added, including BPLSTR, DWSIZE, FBPLSTR, GQSTR, INCORP, INCSTR, LINE, MBPLSTR, MCD, MCD, METDIST, OCC, PROBAI, PROBAPI, PROBBLK, PROBOTH, PROBWHT, QNAMELST, QQTRUNEM, QSURSIM, RACAMIND, RACASIAN, RACBLK, RACOTHER, RACPACIS, RACWHT, SFRELATE, SFTYPE, STREET, SUBFAM, SURSIM, URBAN, AND URBPOP.
- The total number of records changed slightly. The change is partly due to the removal of duplicate records. Enumerators or census clerks crossed out records when they found a duplicate or when the respondent was deceased on census day. For the new 100% database, we have removed these crossed out records. Also, the number of records changed for the city of St. Louis (see below). Overall the total number of households decreased by 26,089, and the total number of person records decreased by 42,572.
- The city of St. Louis was enumerated twice for the 1880 census. The previous 100% database contained data from the second enumeration. Although the second enumeration contains more records, we decided that for the final database, in the interest of consistency with the data from the rest of the country, we would release the first enumeration, as the first enumeration occurred during the U.S. Census Bureau's designated timeframe for enumeration while the second took place 6 months later. As a result, the new release contains 7,573 fewer household records and 29,310 fewer person records for St. Louis.
- Improvements were made to geographic data. An audit conducted by Professor Michael Haines (Colgate University) revealed numerous county codes that were blank, invalid, or incorrect. This resulted in approximately 21,100 changes to the county variable and to variables that derive from county information (such as METRO, METAREA, etc.).
- We were able to take advantage of some additional detail in the RELSTR variable to better place individuals' detailed RELATE code, particularly in the 1200's (non-related individuals).
- Numerous other minor corrections were made to individual coding decisions.
Also posted new versions of all IPUMS-USA samples. There are several notable changes to variable availability and ease of use:
- The multigenerational household variable (MULTGEN), previously released for the first time in the 2008 ACS/PRCS exactly as provided by the Census Bureau, has been revised by IPUMS to contain additional detail and is now available for samples from 1880 onward. See the variable description for full information.
- A new variable, HOMELAND, identifies PUMAs that contain a Census Bureau-designated American Indian, Alaska Native, or Native Hawaiian homeland area.
- A new variable, EDUC, contains all information on educational attainment that was previously spread across four variables (HIGRADE, EDUC99, EDUC00, and EDUC08). HIGRADE contains additional detail on educational attendance and has been retained. The other three variables are now superfluous and have been removed along with EDUCREC, the previous summary variable for educational attainment. EDUC has both general and detailed versions. The general version is the equivalent of EDUCREC in that it provides the set of general categories that can be identified in each year of data, but these general categories are more detailed than those formerly contained in EDUCREC. Additionally, all samples in and after 2000 contain the detailed category "Some college, but less than 1 year". In the old EDUCREC variable, this category was classified as "1-3 years of college". Because the people in this category have completed 12th grade but not 1 year of college, they are now classified as "12th grade" in the general version of the new EDUC variable.
- The occupational standing measures EDSCOR50, EDSCOR90, NPBOSS50, and NPBOSS90, which were based on EDUCREC, are now based on the more precise variable EDUC (general version). This has resulted in some changes. The EDSCOR measures are unchanged between 1950 and 1990; in all samples 2000 and after, though, they have decreased by an average of 6 to 7 points because of the code shift. The NPBOSS measures, which rely in part on the ordering of occupations' educational composition, have also changed, but by no more than 10 points in either direction.
- GRADEATT (grade of school now attending) has been expanded to a general/detailed coding scheme to accommodate the increased detail available in the 2008 ACS/PRCS. The variable GRADE08, which previously contained this detail, has been removed. Additionally, information from HIGRADE has been used to expand GRADEATT's availability to the 1960-1980 samples.
- Migration variables have been streamlined. Because of a programming error, YRSPR contained incorrect data for all observations, and data for year of immigration to Puerto Rico was split between YRIMMIG (for 1910-1920) and YRIMMIPR (for 1980 onward). This has been corrected. Additionally, YRSPR is limited to 1910, 1920, and 2000 onward; but YRSPR2 (an intervalled version of YRSPR) is now available and makes this information available in a less detailed form for 1980 and 1990 as well.
- For the ACS/PRCS multi-year samples, YEAR previously gave the actual year of survey (e.g., 2005, 2006, or 2007 for the 2005-2007 3-year file). To ensure that the combination of YEAR, DATANUM, SERIAL, and PERNUM uniquely identifies individuals, YEAR now provides the last year of data (e.g., 2007 for the 2005-2007 3-year file). Information on the actual year of survey has been shifted to a new household-level variable called MULTYEAR, valid only for the multi-year ACS/PRCS.
There are also several more minor changes:
- QGCHOUSE (the data quality flag for GCHOUSE) was mistakenly not made available before; it may now be downloaded.
- QYRSPR (the data quality flag for YRSPR) contained all 0's due to a programming error. It now contains correct data.
- BUILTYR2 was previously available in 2000 only for that year's ACS sample; it is now available for all samples in that year.
- All negative values of replicate weights (REPWT and REPWTP) had been recoded to zero for ease of use in statistical software packages. However, in further discussions with StataCorp technical staff, it emerged that Stata can handle negative replicate weights. (Neither SAS nor SPSS can automatically process the kind of replicate weights included in the ACS and PRCS data.) The original replicate weight values are now provided, and IPUMS now provides an FAQ page on replicate weights that contains directions for using ACS/PRCS replicate weights in Stata.
- Because of a programming error, many cases that should have been coded as 0 on YRSUSA2 in the 2008 ACS were instead coded as 5, and many other cases that should have been coded as 1, 2, or 3 were instead coded as 0. This has been corrected. YRIMMIG and YRSUSA1 were accurate and remain unchanged, except for correcting another programming error that coded valid YRSUSA1 values of 0 as 1 in 1910-1930 samples. (YRSUSA1 codes of 0 contain both N/A cases and cases that arrived in America less than one year ago; they can be distinguished using BPL. See the YRSUSA1 codes page for more information.)
- Because of a programming error, all commutes (TRANTIME) over 99 minutes were too small by a factor of 10. This affects approximately the longest 0.5% of commutes in all ACS and PRCS samples. This has been corrected, and TRANTIME has been widened to three digits.
- QMIGRAT1 (the data quality flag for MIGRATE1) contained incorrect data for the 2006-2008 ACS samples, instead duplicating QMARST (the data quality flag for MARST). This has been corrected.
- Persons with OCC1950 values of 595 (armed services), 997 (missing/unknown), and 999 (N/A) received 59.5, 99.7, and 99.9 respectively as their NPBOSS50 scores. They now receive the appropriate N/A codes of 999.9.
- In the 2003 and 2004 ACS/PRCS samples, respondents with "some college but less than one year" were erroneously classified in EDUC (and the former EDUC99) as having an "associate's degree, occupational program", while those with "one or more years of college, no degree" were erroneously classified as having an "associate's degree, academic program". This has been corrected.
- In the 2008 ACS, three respondents with RACE codes of 827 ("White and one or more major race groups, n.e.c.") and one respondent with a code of 991 ("White race; Some other race; Black or African American race and/or American Indian and Alaska Native race and/or Asian groups and/or Native Hawaiian and Other Pacific Islander groups") received incorrect codes of "0" on RACESING. This has been corrected.
Several other variables have been widened:
- YRIMMIG and YRIMMIPR are now 4-digit years instead of 3-digit codes.
- INCTOT and INCEARN now contain seven digits to accommodate the highest incomes. For INCTOT, this affects fewer than 60 cases in each of the 2006-onward samples; for INCEARN, this affects only four cases in 2008.
- BEDROOMS and ROOMS are now two digits wide to accommodate expanded detail in the 2008 ACS/PRCS. All other samples are unaffected.
November 9, 2009. Posted new versions of the 2008 ACS/PRCS to correct erroneous data in REPWT14. Additionally, cases in the 1920 1% sample that should have been coded as "9997" (unknown) in YRNATUR instead received codes of "ZZZZ". This has been corrected.
November 6, 2009. Posted new versions of the 2008 ACS/PRCS. HHINCOME, FTOTINC, INCEARN, and INCTOT were not provided before due to errors in the original Census Bureau data; they are now available because the Census Bureau has released new data. POVERTY was previously provided just as the Census Bureau released it, with different values for each nonrelative of the householder. The POVERTY variable is now calculated as in all previous samples, where people in unrelated subfamilies have the same value.
The Census Bureau has not documented their data update, so users should know that any 2008 PUMS files downloaded from the Census Bureau's website between October 30 and November 3 have incorrect data for the four income summary variables with the Census Bureau names of FINCP, HINCP, PERNP, and PINCP. And, as of November 6, the DataFerrett data had not been updated; they still contain incorrect codes of -$59,999 (HINCP and FINCP), -$10,000 (PERNP), and -$19,999 (PINCP).
There are two other changes to IPUMS-USA data:
- YRSUSA1 was calculated incorrectly for the 2008 samples. It is now aligned with YRIMMIG.
- The documentation for MULTGEN, a new variable measuring multigenerational households that IPUMS provides without modification from the original Census Bureau data, has been updated to reflect the results of preliminary examination by IPUMS staff. We recommend that researchers use this variable with caution.
November 4, 2009. Added 1% samples from the 2008 American Community Survey (ACS) and the 2008 Puerto Rico Community Survey (PRCS). Together, the samples contain approximately three million person records. The 2008 ACS is the third ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2008 data.
The lowest level of geographic identifier in the 2008 ACS is the PUMA; 2008 PUMAs have the same boundaries as those in the 2005-2007 ACS and the 2000 census samples. The IPUMS version of the 2008 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
There are several noteworthy changes to the IPUMS data that stem from the Census Bureau's modifications to the ACS/PRCS questionnaire; users should consult our page on the 2008 ACS/PRCS.
We have also made a number of other improvements to the IPUMS data. Specifically:
- Incorrect adjustment factors were used for all dollar amounts in ACS and PRCS samples. This resulted in dollar amounts that were nearly 1 percent smaller than they should have been in 2006 ACS/PRCS data and 2006 cases in the 2005-2007 3-Year ACS/PRCS files. For all other ACS/PRCS data, dollar amounts were off by less than 0.5 percent. This was due to an error in the Census Bureau documentation, which states that the adjustment variables convert dollar amounts into July dollars (for instance, see the 2005 ACS Accuracy Statement, p. 13). Through further conversations with Census Bureau staff, we have realized that these variables actually convert dollar amounts into calendar year dollars. This is true for all ACS samples since 2000. All dollar amounts in IPUMS ACS/PRCS samples are pre-adjusted to reflect calendar year dollars.
The occupational standing measures have been refined.
- NPBOSS50 and NPBOSS90, which are based on median educational attainment and median earned income, have been re-calculated using the standard formula for calculating medians from grouped data. These were previously calculated using the GMEDIAN function in the SPSS MEANS command, which appears to have a programming error that causes its calculations to diverge from its stated method. The changes are quite small.
- ERSCOR50 and ERSCOR90, previously calculated as if the income data were not grouped, have been re-calculated using the above procedure for calculating medians from grouped data. Also, IPUMS now weights occupations by the number of workers they contain when calculating the standardized median incomes on which the ERSCOR measures are based. Because this weighting was not carried out before, scores for ERSCOR50 and ERSCOR90 did not truly represent the percentage of workers in occupations having lower median earnings than a given occupation, as stated in the documentation. Rather, they represented the percentage of occupations having lower median earnings than a given occupation. Together, these two improvements have the potential to make significant changes to the ERSCOR measures: computing medians without attention to the grouped nature of the data resulted in many ties in the positioning of occupations, which were broken by the (arbitrary) occupational category numbers. The increased precision alters the relative and absolute position of the occupations in the distribution of standardized median incomes, while weighting for the size of the occupations further alters the absolute position of each occupation in this distribution.
- The EDSCORE, ERSCORE, and NPBOSS measures rely on statistics that are derived from the PUMS data (for instance, the proportion of people in a given occupation who have a college education). Samples containing data from 2006-2007 were previously based on analysis of the 2006 1-year data. Components for these variables are now calculated separately for each sample.
- Some replicate weight values (REPWT and REPWTP) in the original Census Bureau 2006 and 2007 ACS/PRCS data files are negative. However, many statistical procedures balk at negative weights. In this latest revision, IPUMS has recoded all REPWT and REPWTP values to 0 where they were less than 0 in the original Census Bureau file. This affects very few cases, typically fewer than 40 for any one set of replicate weights. According to Census Bureau staff, this introduces no bias into replicate standard errors, and tests of IPUMS data confirmed this.
We also made several changes that were not specifically related to the 2008 ACS/PRCS data release:
- In the 1850-1870 samples, blanks on the REALPROP and PERSPROP items are now coded as 0. Some of the samples previously coded blanks as 999999.
- For household fragments in the 1910 1% data that were reunited with their proper household (SAMPRULE = 6), persons that fell outside the sampling window now receive PERWT, SLWT, and PERWTDET values of 0, in accordance with IPUMS documentation. Previously, such persons were inadvertently given the same weights as everyone else in the sample. This affects 573 household fragments.
- HWSEI, PERWTDET, and HHWTDET, which have two implied decimal places, are now divided by 100 automatically in the command setup files. EDSCOR50, EDSCOR90, ERSCOR50, ERSCOR90, NPBOSS50, NPBOSS90, PRESGL, and PRENT, all of which have one implied decimal place, are now divided by 10. Previously, users were required to perform these calculations.
- Year of naturalization (YRNATUR) now indicates the full four-digit year, rather than the last three digits (as was the case previously).
- The names of some disability variables available in the ACS have been streamlined, and minor errors in the IPUMS documentation have been corrected; for more information, see our page on the 2008 ACS/PRCS.
- The coding of BUILTYR2 (age of structure) has been changed. Previously, higher values of this variable represented older structures, and buildings constructed most recently had lower values. Because the most recent year of construction needs to be represented by 1, each additional year of data would otherwise have required frequent and inconvenient code changes. Now, higher values of this variable represent younger structures, and future revisions will merely add (not change) codes.
- For 2005-2007 ACS and PRCS samples, the data quality flag for CONDOFEE (QCONDOFE) contained data for QCOSTGAS instead. This has been corrected.
October 9, 2009. Added two new variables for 1850-1930. In these samples, dwellings could include more than one household. Households have always been uniquely identified by SERIAL; the new variable DWELLING is a unique identifier for dwellings. Within each value of DWELLING, there may be more than one SERIAL. The new variable DWSEQ indicates the order in which households were enumerated within the dwelling. For more information, please see the variable descriptions.
October 8, 2009. Added higher-density samples for 1880 and 1900. The 1880 10% sample has replaced the preliminary 1880 5% sample, while the 1900 5% sample has replaced the preliminary 1900 2.5% sample. The final samples contain all cases from the preliminary samples (which came from odd-numbered microfilm reels only) as well as new cases from even-numbered microfilm reels. For more details, see the sample description page for 1880 and 1900.
August 11, 2009. Posted new versions of all linked data samples. The LINKWT variable has been corrected in all samples. Due to a processing error, LINKWT values were low by an order of magnitude ranging from 2x to 50x. Any data that was downloaded previously should be replaced with these new data.
June 17, 2009. Posted new versions of 1950-2007 data.
Additional detail is now available on total family income (FTOTINC). Previously, this variable was part of the household record and described only members of the primary family, those persons related to the head (FAMUNIT=1). FTOTINC is now part of the person record and describes the family income of everyone in the household, even if they are unrelated to the head. The width of FTOTINC has been increased from 6 digits to 7 digits, which provides additional detail on very high incomes; however, users should remember that such incomes have been top-coded by the Census Bureau. Users should note two minor errors in the previous version of FTOTINC that have now been corrected; both affected only 2000-2007 ACS and 2005-2007 PRCS data:
- For household heads living with at least one nonrelative but no relatives, the IPUMS adjustment factor was erroneously applied twice, resulting in FTOTINC values that were slightly larger than they should have been for everyone in those households.
- FTOTINC values were too small by a factor of 10 for families with a total family income of $999,999 or more.
POVERTY, which is based on total family income, remains part of the person record and has not changed.
- Additional detail is also available on total household income (HHINCOME): the width of this variable has also been increased from 6 digits to 7 digits. Users should note one minor error in the previous version of HHINCOME that has now been corrected: in households where applying the IPUMS adjustment factor pushed HHINCOME values from below $999,999 to at least $999,999, HHINCOME values were too small by a factor of 10. The error affected only 2000-2007 ACS and 2005-2007 PRCS data.
- For the 2000 Census samples and ACS sample, the variables EDUC99 and EDUC00 were changed to reflect errors found in the data dictionary. In EDUC99, respondents in the 2000 - 2004 ACS samples that were previously coded as having an 'Associate degree - occupational program' were coded as 'Some college, no degree'. A value for "Associate degree, type not specified' was added in EDUC99 for classification of the respondents with associates degrees in the 2000 samples and the ACS. The respondents with a value for 'Associate degree, academic program' for the ACS 2001-2004 samples now are coded to have an 'Associates degree, type not specified'. For EDUC00, respondents in the 2000 - 2004 ACS samples that were previously coded as having an 'Associate degree - occupational program' were coded as 'One or more years of college but no degree'. The respondents with a value for 'Associate degree, academic program' are now coded to have an 'Associates degree".
June 11, 2009. Minor correction to 2007 ACS/PRCS data. In the 2007 ACS (and the 2007 cases in the 2005-2007 ACS 3-year file), some cases in Florida had missing values of PROPINSR. These are now coded as 9999, which is the correct PROPINSR topcode for Florida. The documentation of topcodes has also been updated to reflect this change.
May 29, 2009. Corrected minor inaccuracies in 2000 and ACS/PRCS data.
- In the 2000 PUMS and all ACS/PRCS data, persons with OCC codes of 384 ("miscellaneous law enforcement officers") received OCC1990 codes of 405 ("housekeepers, maids, butlers, stewards, and lodging quarters cleaners"). They now receive OCC1990 codes of 423 ("Other law enforcement: sheriffs, bailiffs, correctional institution officers"). (Note that this change diverges from the BLS working paper on which OCC1990 is based.)
- In all ACS/PRCS data, persons related to the household head were erroneously coded as 0 (N/A) for POVERTY if their total family income (FTOTINC) was negative. They now receive the proper codes of 1.
- In all PRCS data, POVERTY values for all cases were based on IPUMS calculations from topcoded income data. For members of the primary family in the household, POVERTY values now reflect the original Census Bureau values (based on non-topcoded income data), in accordance with IPUMS' treatment of ACS data. The effect of this alteration is small; for 92 percent of such cases, POVERTY values change by no more than three percentage points. For unrelated individuals and members of any secondary families, POVERTY values continue to be based on IPUMS calculations (see the variable description for background).
- In the 2006 ACS (and the 2006 cases in the 2005-2007 ACS 3-year file), group-quarters residents were erroneously coded as missing for QMOVEDIN (the data quality flag for MOVEDIN). They are now coded as 0.
May 13, 2009. Added four new variables describing subfamilies to 1880-2007 IPUMS samples: SFTYPE (subfamily type), SFRELATE (relationship within subfamily), SUBFAM (subfamily membership), and NSUBFAM (total number of subfamilies in the household). For more information, see the subfamilies overview page.
Also, documentation for other family interrelationship variables has been updated to conform to longstanding IPUMS procedures:
- When linking under the third rule for MOMRULE or POPRULE, the IPUMS uses an additional condition in surveys where respondents can give multiple responses (2000, ACS, and PRCS): persons for whom a single race is listed may not be linked to potential parents of a different race. Users should note that this condition has long been applied to 2000 and ACS data, but is now applied to the PRCS for the first time.
- Persons receive STEPMOM codes of 1 when the difference in ages between them and their mother is less than 12 years or greater than 54 years--not less than 15 years or greater than 49 years, as the documentation previously stated.
- Persons receive STEPPOP codes of 1 when the difference in ages between them and their father is less than 14 years--not less than 15 years or greater than 64 years, as the documentation previously stated.
See the variable descriptions for more information.
April 21, 2009. Corrected missing values and other minor inaccuracies in several samples. First, several variables contained missing data for some cases. Missing data has been assigned to the proper codes as follows:
- In the 1880 100% database, 21 cases that were mistakenly coded as missing on ENUMDIST are now been coded as "0", and SUBSAMP (formerly unavailable in these data) is now provided.
- In the 1900 1% sample (both with and without oversamples), four cases contained missing data for SUPDIST. One of these is now coded as "7", one as "14", and two as "73". Additionally, five cases contained missing data for DWSIZE. Two of these are now coded as "3", one as "5", and two as "7".
- In the 1920 Puerto Rico sample, two cases that were mistakenly coded as missing on ENUMMO are now coded as "01".
- In the 1930 1% sample, values of IND1930 and OCC1930 that contained non-numeric characters were mistakenly coded as missing; the proper values are now available.
- In the 1950 sample, missing data for WKSWORK1 has been assigned to "00" (N/A).
- In the 1980 urban/rural sample, cases that were coded as "3570" (Lexington, KY) for CITY have been switched to "3590" (Lexington-Fayette, KY) to account for the 1974 merger of Lexington and Fayette County. Additionally, city populations (CITYPOP) are now identified for this city as well as for city codes 6410 (Scranton, PA) and 6650 (Springfield, IL).
- In the 1980 labor market sample, the variables MIGCZ5 and PWCZ were mistakenly coded as missing for a large number of cases; the proper values are now available.
- In the 2000 5% and 1% Puerto Rico samples, missing data for INCWAGE, INCTOT, and FTOTINC has been assigned to "999999" (N/A).
- In the 2000 5% Puerto Rico sample, the variables PUMALAND and PUMAAREA were mistakenly coded as missing for all cases; the proper values are now available.
Second, all housing units with 10 or more persons unrelated to the household head have been re-classified as group quarters in all American Community Survey and Puerto Rican Community Survey samples, consistent with the treatment of such households in the 2000 census. For more information, see GQ. The cases in such housing units are now coded as 5 in the GQ variable and 9 in the GQTYPE variable (900 in the detailed version GQTYPED).
Third, information on variable availability has been updated as follows:
- The 2006 and 2007 ACS/PRCS samples now include QMOVEDIN (the data quality flag for MOVEDIN).
- The 2000 1% Puerto Rico sample does not contain PUMA information, and all cases were coded as missing for this variable. PUMA may no longer be downloaded with this sample.
- All ACS/PRCS samples now include GQTYPE to accommodate the aforementioned change in GQ coding (see above).
April 1, 2009. Improved and updated the coding of in-laws in the 2000-2007 American Community Survey (ACS) and 2005-2007 Puerto Rican Community Survey (PRCS) samples. In these samples, the Census Bureau's relationship variable includes only a global "in-law" category. IPUMS attempts to provide a more detailed classification of parents-in-law, siblings-in-law, and children-in-law in the RELATE variable. The new release of the ACS and PRCS datasets improves the procedures for making these detailed in-law assignments. More information on the new procedures is available here. Additionally, users should take note of three coding errors in the old classification scheme that have been corrected and/or no longer apply in the new classification scheme:
- Many never-married in-laws, all of whom should have been classified as siblings-in-law under the old classification scheme, were instead classified as parents-in-law or children-in-law. This condition no longer applies.
- In households containing unmarried partners of the head, the classification of in-laws departed from the stated rules and was likely to be particularly inaccurate. This is no longer the case.
- In the 2005-2007 PRCS, all in-laws were mistakenly classified as parents-in-law. This has been corrected.
Additionally, in the 2005-2007 ACS and PRCS 3-Year samples, the person weights (PERWT) for individuals in group quarters were not copied to the household weight variable (HHWT). This has been corrected.
March 5, 2009. Posted the 2005-2007 American Community Survey/Puerto Rican Community Survey 3-year file. This file includes all cases from the previously-released single-year files from the 2005-2007 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2007 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
March 1, 2009. PRCS data from 2005-2007 have been altered to resolve small coding differences across survey years. All 1,093 cases previously coded as 2 on the ABSENT variable in the 2006 and 2007 PRCS single-year files are now coded as 3, and one individual previously coded as 899 on the RACE variable in the 2005 PRCS single-year file is now coded as 943.
There was also a slight change in the PRCS's main immigration variable. PRCS data previously available in YRSUSA1 has been shifted to YRSPR, and the flag associated with this variable has changed from QYRIMM to QYRSPR.
February 9, 2009. Posted new versions of linked data samples for males from 1860-1880 and 1870-1880. In the 1860 data, 303 cases and in the 1870 data, 302 cases were removed after applying a filter for records where there was a middle initial mismatch that previously had not been applied properly.
December 20, 2008. Posted new version of the linked data sample for males from 1850-1880, 1880-1900, and 1880-1910.
Changes to the 1850-1880 data: the dataset was increased by 49 records. Some records were removed and some added as the result of 1) rerunning one of the classifiers and 2) properly applying a middle initial mismatch filter.
Changes to the 1880-1900 and 1880-1910 data: 282 cases were removed from 1900 and 215 cases were removed from 1910, after applying a middle initial mismatch filter that previously had not been applied properly.
December 11, 2008. Posted remaining linked data samples. Also posted new versions of samples linking couples from 1870-1880 and 1880-1910. In the 1870 data, 10 cases that previously had a LINKWT of 0 were given the correct non-0 LINKWT values. In the 1910 data, 140 cases that had LINKWT values greater than 5 were assigned values of 5 (the maximum allowable LINKWT).
November 11, 2008. Posted new versions of samples for 1970-2007. Improvements were made to the 1970 samples to correct the variable INCOTHER. Samples from 1980-2007 were expanded to include the variable OWNCOST.
Posted new version of the 1880 100% database. Fixed problems with the MCDSTR and PAGENO variables. Group quaters units containing more than 60 people were split into 1-person households. Researchers needing to study these units intact can use SERIAL80 and PERNUM80.
October 11, 2008. Added 1880 100% population database. This dataset was originally entered for genealogical purposes, by the Church of Jesus Christ of Latter Day Saints (LDS). Data cleaning and harmonization took place at the Minnesota Population Center (MPC). Versions of this data are also available from the the MPC's North Atlantic Population Project and the LDS's genealogical website FamilySearch.org.
The IPUMS-USA version of the data contains fully integrated codes and labels, newly-constructed family inter-relationship variables, and missing data allocation for key demographic variables. Since the dataset was first constructed for genealogy, several variable groups were never entered. Excluded variables include items relating to school, literacy, unemployment, disability, month of birth, marriage within the past year, and street address. The most detailed geographic variables are MCDSTR and INCSTR.
Added 2.5% preliminary sample of the 1900 census. This sample is "preliminary" because the final version will contain 5% of the population. The preliminary sample includes data only from odd-numbered microfilm reels. Counties on even-numbered reels are not represented in this dataset. Alaska and Hawaii are also excluded from the preliminary dataset. The final 5% dataset will be released in early 2009.
September 26, 2008. Added 1% samples from the 2007 American Community Survey (ACS) and the 2007 Puerto Rico Community Survey (PRCS). The samples have approximately three million person records. The 2007 ACS is the second ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2007 data.
The lowest level of geographic identifier in the 2007 ACS is the PUMA; 2007 PUMAs have the same boundaries as those in the 2005-2006 ACS and the 2000 census samples. The IPUMS version of the 2007 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
Note that the name of the IPUMS variable describing military service September 2001 and later has been changed from VET01X03 to VET01LTR. This name more accurately reflects the information contained in the variable.
More information on the background of and future plans for the American Community Survey is available at the ACS information page.
April 11, 2008. Posted IPUMS Version 4.0, the first major revision of the IPUMS files since 2004. Includes revised versions of all samples from 1850-1930, a new 1880 5% sample, and 13 new samples from the Puerto Rican Censuses of 1910-2000 and the Puerto Rican Community Survey. IPUMS 4.0 contains many new variables, including long-term hispanic identification back to 1850 (HISPAN), a consistent single-race identification variable from 1850-2006 (RACESING), a battery of socioeconomic indices, original strings for occupation (OCCSTR) and industry (INDSTR), and new detailed weight variables for the historical samples (HHWTDET and PERWTDET), and new standardized low-level geographic identifiers (MCD and INCORP). More information is available on the IPUMS 4.0 release page.
The most recent previous version of IPUMS data and documentation (IPUMS 3.0) is still available via the IPUMS archive page at ICPSR. The archive page permits users to revise old extracts, create new extracts, and download data and documentation. The link titled "IPUMS-USA website as of March, 2008" leads to a fully-functioning mirror of the IPUMS website as it existed prior the release of IPUMS 4.0. The archive page contains versions of the website from previous years as well.
February 14, 2008. Posted a new version of the 1950 census sample, with a correction made to the BPL variable. Several cases that had been erroneously coded "Missing/blank" are now coded correctly as follows: 94 cases coded "Israel," 9 coded "Byelorussia," and 3 coded "Pakistan." In the 2000 census samples, changed the MIGMET5 code for Hattiesburg, MS from 3285 to 3300 to be consistent with our METAREA coding.
Re-released VALUEH for the 2006 ACS sample; during a recent website update, VALUEH had inadvertently been removed from the data extract system.
December 14, 2007. Posted new versions of the 2005 and 2006 ACS sample: released CITYPOP for both samples. Fixed a small error in QCONDOFE and QVALUEH in the 2006 sample. Prior to the udpate, a small number of cases had missing values for these two variables.
November 15, 2007. Posted new versions of all ACS samples; a correction was made to the BUILTYR2 variable. Previously, households built prior to 1939 or earlier (BUILTYR2 = 10) were grouped with those reported as being built in 2005 or later (BUILTYR2 = 1).
October 15, 2007. Added a 1% sample from the 2006 American Community Survey (ACS). The sample has approximately 2,970,000 person records. The 2006 ACS is the first ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006 data.
The lowest level of geographic identifier in the 2006 ACS is the PUMA; 2006 PUMAs have the same boundaries as those in the 2005 ACS and the 2000 census samples. The IPUMS version of the 2006 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
More information on the background and future plans for the American Community Survey is available at the ACS information page.
August 17, 2007. Added HHTYPE for all samples from 1940 to 2005. In the future, HHTYPE will also be made available for all samples from 1850-1930.
July 19, 2007. Posted updated versions of all samples from 1900, 1910, and 1930. Corrections were made to the CHSURV variable in the 1900 and 1910 samples. All values were previously 0 or 1. The updated samples contain correct values. Corrections were made to the NSIBS variable in the 1900, 1910, and 1930 samples. Previously, a small number of persons identified as "siblings" in the RELATE variable (code 701) incorrectly received a value of 0 for NSIBS. This error has been corrected.
July 10, 2007. Posted updated versions of all samples from 1970 and 1980, and the ACS samples from 2000, 2001, and 2002. The updated samples include fixes to COSTELEC, COSTGAS, COSTFUEL, and COSTWATR. These variables did not properly identify cases having values of greater than 9990. All cases in this range are in the universe but have unreported values, usually because utility costs were included in rent payments. The old versions of the datasets incorrectly identified these cases as not being in the universe. COSTELEC and COSTGAS had the additional problem of presenting monthly values instead of annual values. These problems are now fixed.
The new 1980 5% sample additionally fixes a problem in the CITY variable. In the old sample, San Francisco was incorrectly identified. It has been corrected.
June 21, 2007. Posted an updated version of the 1930 1% sample. The updated sample includes fixes of minor problems in OCCSCORE (missing occupation data was not being allocated), YRSUSA2 (some allocated values were inconsistent with YRSUSA1), and QMARST (this variable indicated that we made more logical edits than were actually made).
Released new data extraction system with the "Attach Variables" feature, which allows researchers to create variables specifying characteristics of respondents' spouses, mothers, fathers, and household heads.
Released CITYPOP for 1850-1930 samples. Due to a technical problem, we had not been offering CITYPOP in these samples since February 2007. The CITYPOP values that we are providing now are not different from the values that were available prior to February.
June 7, 2007. Posted 1930 1% sample (up from previous 0.5% sample for 1930). The new sample includes several new occupation and industry variables - OCC, OCC1930, IND, IND1930 - as well as HISPAN and RACESING.
April 26, 2007. Added new occupation crosswalks (OCC to OCCSOC) for the 2000 census samples and the ACS samples; these are availabe via links from our Occupation and Industry documentation page. Also improved our OCC and OCCSOC code lists (available from the respective variable descriptions) for the 2000 census and ACS samples.
April 24, 2007. Posted a new version of the 2005 ACS; a correction was made to the MORTAMT1 variable.
April 9, 2007. Added Consistent PUMA variable and shapefiles (see CONSPUMA). CONSPUMA reconciles differences in low-level geographic identifiers in the 5% samples from 1980, 1990, , and the 2005 ACS. Also released all new shapefiles for low-level geographic identifiers from 1970-2005. Changes to the previous shapefiles were minor: numerous "holes" in the maps were assigned to their appropriate PUMA, County Group, or SEA. All files are accessible via the links on our geographic tools page.
March 27, 2007. Changed the name of the RACHIST variable to RACESING.
Posted new versions of the 1910 samples: we corrected a problem with SERIAL so that households within multi-household dwellings are now uniquely identified. The problem had affected less than .13% of households in the 1910 1.4% sample.
Created a new harmonized version of the TRIBE variable, which is now available in 1900-1910, 1990-2000, and the ACS.
Posted a new version of the 1940 sample: a minor correction was made to the VET1940 variable.
January 31, 2007. Relased new harmonized occupation and industry variables for 1950-2005: OCC1990 and IND1990. The OCC1990 variable was created in collaboration with researchers at the Bureau of Labor Statistics. Both variables are available only via the IPUMS.
Added RACHIST values to the 1950-1990 samples and the 2005 ACS IPUMS sample. RACHIST adapts an alogrithm developed at the National Center for Health Statistics to assign single races to persons who reported more than one race from 2000 onward.
December 19, 2006. Replaced the 1-in-250 1910 sample with two new samples: the 1910 1% sample and the 1910 1.4% sample with oversamples. The 1% sample includes a 1-in-100 national population sample, including Alaskans, Hawaiians, and persons enumerated on the American Indian Schedules. The 1.4% sample with oversamples includes a 1-in-70 national population sample that has been combined with large oversamples of Blacks, Hispanics, Alaskans, Hawaiians, and persons enumerated on the American Indian schedules. The 1910 Weighted sample must be used with weighting variables (see PERWT and HHWT).
Replaced the 1900 General sample with two new samples: the 1900 1% sample and the 1900 1% sample with oversamples. The 1900 1% sample is a 1-in-100 national sample, including Alaskans, Hawaiians, and persons enumerated on the American Indian Schedules. This sample has the same cases as the former "1900 General sample" did, though some variables and values have been modified in minor ways. The 1900 1% sample with oversamples is a 1% national sample that has been merged with 1-in-5 oversamples of Alaskans, Hawaiians, and persons enumerated on the American Indian schedules. The 1900 1% sample with oversamples must be used with weighting variables (see PERWT and HHWT).
More information about these samples is available in the 1900 and 1910 sections of the sample descriptions page. We expect to release a revised version of these samples in March 2007. The revised samples will included detailed geography at the minor civil division level and integrated versions of variables specific to the Alaskan, Hawaiian, and American Indian populations.
December 10, 2006. Posted new version of the 2005 ACS sample that includes the following new geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
November 29, 2006. Posted new versions of all samples from 1940 through 2005. The new samples include several minor improvements to SPLOC, MOMLOC, and POPLOC. These modifications have resulted in minor changes to the constructed household variables, the family interrelationship variables, POVERTY, and FTOTINC. Detailed information on these variables can be found in the family interrelationships documentation.
An error was corrected in the POVERTY variable for all samples. In two-person families where one person was over age 65 and the other person was under age 65, we sometimes used slightly different poverty thresholds for each member of the family. We should have applied the same threshold to both members of the family. This resulted in several thousand cases in each sample having a poverty value that was off by an average of two percent (10 points on POVERTY's 1-500 scale). We have corrected the problem.
The new samples also include a small number of corrected income values in the 1950, 1960, and 1970 samples. The majority of cases affected have negative income values.
November 20, 2006. Corrected a problem in the RACE variable in the 2005 ACS sample. There were approximately 3,000 cases with missing values. All of the cases were multi-racial persons. All cases are now assigned to the appropriate categories.
October 11, 2006. Posted 1% sample from the 2005 American Community Survey (ACS). The 2005 sample is the first ACS microdata to identify sub-state geography, including PUMA, MIGPUMA1, and PWPUMA00. The IPUMS version of the 2005 ACS also idenifies metropolitan status (METRO). A December 2006 release of the IPUMS 2005 ACS sample will identify CITY, METAREA, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWMETRO, PWCITY, PWTYPE, and PWPUMAS. These variables are being constructed at the Minnesota Population Center and are not available via the Census Bureau.
The base data for the IPUMS 2005 sample is the ACS data that the Census Bureau released on October 5th, 2006. The Census Bureau had originally released a version of the dataset on September 11th, 2006. The September release contained several small errors, so the Census Bureau updated the dataset in October. The erroneous dataset was never available via the IPUMS data extraction system.
More information on the background and future plans for the American Community Survey is available at the ACS information page.
October 1, 2006. Posted 0.5% sample from the 1930 census (up from the previous 0.2% 1930 sample).
September 6, 2006. Posted new version of IPUMS-USA website. The website has a new design, and the content of most variable descriptions has changed at least slightly. Users can still access all extract requests made on the old website.
June 30, 2006. Posted new versions of the 2000 1%, 5%, and Unweighted samples: a correction was made to the MIGPLAC5 variable.
April 27, 2006. Posted new versions of all ACS samples: a correction was made to the INCBUS00 variable.
April 7, 2006. Posted new versions of all 2000 Census samples and all ACS samples: a correction was made to the OCC variable.
January 20, 2006. Posted new versions of the 2000 1%, 5%, and Unweighted samples, as well as the 2000 ACS: a correction was made to the MARST variable.
November 30, 2005. Posted 13 new samples on the IPUMS-USA website. All samples were previously available on the IPUMS-USA Beta site, which was shut down. The new samples combined add nearly 15 million cases to the IPUMS database. For more details on this data release, see the sample information page.
October 7, 2005. Posted new versions of the 2000 1%, 5%, and Unweighted samples: a correction was made to the VET55x64 variable. New versions of the 2000-2004 ACS samples were also posted. In all eight samples above, improvements were made to the INDNAICS and OCCSOC variables.
September 16, 2005. Released the 2004 American Community Survey (ACS) sample on the IPUMS Beta site.
September 9, 2005. Released a new 2000 1% flat sample on the IPUMS Beta site. This is a national random sample drawn from the 2000 5% Census sample.
June 27, 2005. Posted new versions of the 2000 1% and 5% samples, and the 2000-2003 ACS samples. Added the following variables: RACHIST, PROBAI, PROBAPI, PROBBLK, PROBOTH, and PROBWHT. RACHIST is an historically compatible race variable which 'bridges' multiple-race responses into their most likely single race category. The other variables give detailed probabilities of each single-race response and are best used in combination with one another.
Removed RACGEN00, RACDET00, and SPANAMER from the data and documentation. The variables RACGEN00 and RACDET00 were redundant with RACE. A variable similar to SPANAMER can be created using the IPUMS variables MTONGUE, BPL, MBPL, FBPL, SPANNAME, and STATEFIP.
May 20, 2005. Released a revised version of the 1-in-100 sample of the 1900 census (see the August 21, 2003 revision note for information on the previous version of this sample). The revised dataset includes records extracted from Alaska, Hawaii, and the American Indian 1-in-5 oversamples (the complete oversample datasets are available via the IPUMS raw data download page).
Users should also be aware that the smaller 1900 sample previously available (the 1-in-750 "Preston" sample) will no longer be available via the IPUMS extract system. Users wishing to access this data can still download the entire dataset and SPSS command file via the IPUMS raw data download page).
May 13, 2005. Released a revised version of the preliminary 1-in-500 sample of the 1930 census. Corrected a major error in the race variable. The April 25th sample gave the "White" code (detailed race code 100) to all persons who reported their race as "Mexican." The revised sample gives these persons the new "Mexican" race code (detailed race code 140). The revised sample also corrects minor coding and labelling errors in the following variables: RENT30, GQTYPED, NUMHHTAK, FARMSCHD, ENUMMO, RADIO, HOMEMKR, VET1930, IND1950, MTONGUE, FBPL, MBPL, CITY, METRO, METAREA, URBAREA, and MDSTATUS.
April 25, 2005. Released preliminary 1-in-500 sample of the 1930 census. We expect to release a final 1-in-100 sample of the 1930 census by late 2007.
February 23, 2005. Posted new versions of the 2000-2003 ACS samples: a correction was made to the STATEICP variable.
February 1, 2005. Removed the POV2000 variable from the documentation and data. POV2000 was redundant with the IPUMS POVERTY variable. Both variables use the poverty matrix developed by the Social Security Administration in 1964 (and revised twice in the years since). The Office of Management and Budget's Directive 14 prescribes this definition as the official poverty measure for federal agencies to use in their statistical work.
November 23, 2004. Released the following samples on the IPUMS Beta site: the 2003 American Community Survey (ACS) sample, the 1990 Labor Market Areas sample, the 1980 Labor Market Areas sample, and the 1980 Detailed Metro/Nonmetro sample.
October 13, 2004. Posted new versions of the 2000 1% and 5% samples, and the 2000-2002 ACS samples. The following variables were improved: OCC1950, SEI, OCCSCORE, and IND1950. The new variables utilize the Census Bureau's recently published occupation and industry crosswalks between the 1990 and 2000 censuses.
August 27, 2004. Posted a new version of the 2000 5% sample: a correction was made to the METAREA variable.
August 6, 2004. Posted a new version of the 2000 5% sample: a corrections was made to the PWCITY variable.
June 28, 2004. Posted new versions of all the 2000 and ACS samples. The RACE variable has been expanded to incorporate all information from the new multiple-race variables. Details about multiple-race responses are now included, some value labels were clarified, and a few other categories were added. Also, CITYPOP was added to the 2000 1% and 5% samples, and corrections were made to MOBLHOME and METAREA.
June 17, 2004. Released American Community Survey (ACS) samples for 2000, 2001, and 2002 on the IPUMS Beta site.
May 6, 2004. Made 2000 5% sample available via the main IPUMS-USA site.
May 1, 2004. Posted new versions of all of the 2000 samples. The 2000 5% sample now includes variables for Super-PUMA of Work (PWPUMAS) and Super-PUMA of Migration (MIGPUMAS). For the 2000 1% sample, Super-PUMA information that was previously in the PWPUMA00 and MIGPUMA variables is now in the new PWPUMAS and MIGPUMAS variables. A new version of the INCRETIR variable in all three 2000 samples now includes retirement incomes of greater than $99,998 (the previous Top code). All three samples include a corrected version of the POV2000 variable.
Posted new versions of all 1990 samples that account for the greater width of INCRETIR (see above).
April 22, 2004. Posted a new version of the 2000 1% sample: a correction has been made to the MIGCITY5 variable.
March 10, 2004. Posted new versions of the 2000 1% sample and the 2000 5% sample. Both samples now include the PWCITY variable. For those living in group quarters, the variable HHWT now has the PERWT value, rather than a value of 0. In addition, corrections were made to the following variables: BPL, STEPMOM, STEPPOP, MARST, and PUMASUPR.
Posted new versions of the 1990 State, Metro, Elderly, and Unweighted samples. A problem in the MORTGAGE variable was corrected in the new samples.
January 30, 2004. Posted new versions of the 2000 1% sample, the 2000 5% sample, and the Census 2000 Supplementary Survey (C2SS). The 2000 1% and 5% samples now include variables for CITY and MIGCITY5. Minor problems in PWPUMAS, PWPUMA00, MIGPUMA, YRIMMIG, and MORTGAGE have also been corrected in the new samples. The new C2SS sample includes corrected values for INCBUS00 (all values were 0 before).
September 9, 2003. Posted new versions of the 1990 State, Metro, Elderly, and Unweighted samples. FTOTINC and HHINCOME now contain negative values for families and households having a net loss of income. A problem in the PERWT variable was corrected in all samples. These were the only affected variables.
August 21, 2003. Penultimate 1-in-100 version of the 1900 Minnesota sample released on the IPUMS Beta site. The dataset includes 170,438 households containing 754,631 individuals. This version has a number of flaws that will be corrected for the ultimate final version of the 1900 Minnesota sample, which we anticipate releasing in the Spring of 2004. The older 1-in-200 preliminary sample is still available via the data extract system at the main IPUMS-USA site.
- No cases from Alaska and Hawaii are included in the current sample.
- Data quality flags are not yet available.
- Detailed geographic variables are not yet available (these include MDSTATUS, METDIST, URBAREA, MCIVDIV, INCPLACE, and INCORP).
- Coding is not yet complete on the occupation variable (OCC).
- Native Americans enumerated on the special 1900 Indian Schedules are not included in the current sample (although the current version does contain Native Americans enumerated as part of the general population). The 1900 Indian Schedules contained questions not asked on the general schedule, including tribe, percentage Indian blood, and tax status, among others.
- Detailed German birthplaces in the current 1900 sample are coded according to the new scheme developed for the 1860 and 1870 samples. Users of this data should note that these codes do NOT correspond to those listed in the BPL variable description. Detailed German birthplace codes for the 1860-70 and 1900 samples are available here.
Users should also be aware that the smaller 1900 sample previously available (the 1-in-750 "Preston" sample) will no longer be available via the IPUMS extract system. Users wishing to access this data can still download the entire dataset and SPSS command file via the IPUMS raw data download page.
October 11, 2002. Reposted preliminary version of 1900 Minnesota sample. The previous version had incorrect values for children ever born (CHBORN). The new dataset contains corrected values. No other variables have been changed.
July 11, 2002. Final versions of the 1860 and 1870 samples released. The final 1-in-100 1860 IPUMS sample includes 54,094 households containing 273,947 free individuals and an additional 1,343 unoccupied dwellings. The final 1-in-100 1870 IPUMS sample includes 79,023 households containing 383,308 individuals and an additional 1,447 unoccupied dwellings. Frequencies in the on-line documentation will be updated in the next few months. Both the 1860 and 1870 IPUMS samples are also available with oversamples of the black population. Sample weights for the flat and black oversamples have been adjusted to be representative of the total population.
The final 1860 and 1870 IPUMS samples now include occupation codes based on the U.S. Census Office's 1880 classification system and detailed birthplace codes for individuals born in Germany. Several other changes have also been made, including a slightly modified urban/rural definition, minor changes in birthplace and occupation coding, and small changes in personal estate and real estate values. In addition, the final samples incorporate a few data additions and subtractions from the preliminary samples. For details of these changes and a listing of the new Germany detailed birthplace codes, click here.
May 7, 2002. Released preliminary version of the 1900 Minnesota sample. This 1900 Minnesota sample is a 1-in-200 nationally representative sample of dwellings taken from the 1900 U. S. Census of Population. The final version is scheduled to be released in 2004 and will have a 1-in-100 sampling density. Frequencies for this sample will be added to the documentation summer 2002. Currently both the 1900 Minnesota and the 1-in-760 1900 Preston sample are available. Ultimately the 1900 Minnesota sample will replace the 1900 Preston sample, although the Preston sample will be available by request.
The fundamental difference between the two 1900 samples pertains to sample design. In the 1900 Preston sample nonfamily individuals--boarders, lodgers, inmates, and military personnel--were sampled as individuals regardless of household size. In contrast, the 1900 Minnesota sample follows the general sample design used for the 1850-1880 and 1920 samples. For a discussion of issues relating to sample design see Chapter 2 of the IPUMS documentation.
July 11, 2001 -- The IPUMS extract system upgrade was successfully installed on Wednesday, July 11, 2001. No changes were made to the IPUMS data. The new extract system will process user data requests faster than the previous system and will prevent small jobs from being continually sidetracked for large data requests in the queue. Since this upgrade affects only the behind-the-scenes data extraction system, users will notice little change in the request process, itself. Re-registration is not required; previous jobs will be available for revision; and new jobs will begin numbering from the user's last completed job in the old system.
March 7, 2001. Released new preliminary (penultimate) versions of the 1860 and 1870 samples. Frequencies in the documentation will not be changed until release of final versions of these datasets, scheduled for summer 2002. Two versions of the 1860 and 1870 samples are now available:
- a flat 1-in-100 sample of all dwellings, and
- a black oversample containing a 1-in-50 sample of dwellings containing one or more blacks and a 1-in-100 sample of all other dwellings.
The sample weights in both the flat and black oversamples of the preliminary 1860 and 1870 PUMS have been adjusted to be representative of the total population. Although we believe that the new samples are near their final form-we expect only minor changes in the number of cases and the coding of a few variables between the current and final versions of the samples--users are advised that the current releases have a few known problems. In particular, the occupation ("OCC") variable in 1860/1870 is not coded. Users should rely on the occupation 1950 basis ("OCC1950") variable for studying occupation and labor force participation. In addition, detailed birthplace codes are not available for individuals born in Germany. Users may still use the birthplace variable (BPL), but no detail will be returned for German birthplaces.
Friday, August 18, 2000 -- The old IPUMS extract system was replaced by a new system incorporating enhanced features requested by users. One of the key features of the new system is the ability to modify and resubmit previous jobs. Data files from the two systems have been combined on a user-specific summary site. IPUMS data users previously registered in either extract system will not have to reregister to use the new extract system. Extract requests in the new system will begin numbering jobs from the highest numbered job in a user's personal extract summary.
January 22, 1999. Major error in the November 25 version of 1860 and 1870 samples corrected. The 1860/70 samples had an error in SURSIM, which in turn created errors in all the family interrelationship variables (IMPMOM, IMPPOP, IMPSP) and in the variables constructed from them (NCHILD, NCHLT, FAMSIZE, ELDCH, and so on). The error could also have implications for missing data allocation; we recommend tossing out any previous versions of 1860 and 1870.
- New geographic variables (METDIST, MDSTATUS, MCIVDIV, INCSTR, INCORP, URBAREA) were added to 1850, 1880, and 1910 samples.
- Minor fixes to OCC1950, IND1950, CITIZEN, LIT, COUNTY, SEA, GQTYPE, GQFUNDS, NATIVITY, VOTE, MARRINYR, NAMEFRST, and NAMELAST.
- Missing age allocation procedures fixed to allow age 0 to be allocated. Improved rules for spouse imputation (IMPSP).
- Added cases from Bradley county, TN to 1850 that had been inadvertently dropped from the 1850 sample. PERWT adjusted slightly.
November 25, 1998 -- PERWT, NUMHHTAK, and GQFUNDS fixed on the 1860 and 1870 sample.
November 6, 1998 -- Revised preliminary samples of the 1860 and 1870 census released. Two versions of both the 1860 and 1870 PUMS are now available: (1) a flat 1-in-200 sample of all dwellings, and (2) a black oversample containing a 1-in-100 sample of dwellings containing one or more blacks and a 1-in-200 sample of all other dwellings.
The sample weights in both the flat and black oversamples of the preliminary 1860 and 1870 PUMS have been adjusted to be representative of the total population.
August 20, 1998 -- Revised IPUMS-98 database released.
- AGE Allocations 1850-1920. There was an error in the missing data allocation procedure for AGE affecting all pre-1940 samples. Since age is used as a predictor in many other allocations, constructed variables, and universe checks, the frequencies for many variables in the earlier samples have changed slightly from the original iteration of IPUMS-98.
- Split YRSINUSA into two separate variables--YRSUSA1 and YRSUSA2-- to enhance compatibility over time. YRSUSA1 (columns 145-146 in the raw data files) contains the unrecorded continuous measure of years in the U.S. from the 1900-1920 samples. YRSUSA2 recodes 1900-1920 and 1970-1990 into five intervals compatible among all sample intervals. Users desiring greater detail on the original 1970-1990 intervals can refer to YRIMMIG, which retains all of the original detail recorded in the variable discussion. Documentation change: the universe for 1980 should have excluded foreign-born persons who were citizens at birth.
- OCCSCORE, SEI. In 1850-1870, laborers who were changed via logical edit to farm laborers (i.e., they lived on a farm), continued to receive the OCCSCORE and SEI for laborers. They will now receive the score for farm laborers. The original 1900 sample incorrectly classified many domestics as "service workers, nec" in their original 1950 occupation classification. The IPUMS fixed the occupational code, but neglected to assign the appropriate SEI and OCCSCORES for the new occupation. This has been rectified.
- RACE. In 1990, persons who indicated hispanic origin were recoded out of "other race, nec" in the race variable into the category "Spanish write-in." Persons of Mexican origin were mistakenly excluded from this recode. This is now fixed.
- PERWT and HHWT in 1990. Previously, the IPUMS adjusted the 1990 weights so that the total weighted sample would yield the same population count as the published census returns. We removed this programming, since users could not reverse this change is they desired to, and because there seemed no reason to assert the accuracy of the 1990 count at this level of detail.
- CITYPOP, SIZEPL. In 1980, households in New York City received the code for "not identifiable" (codes 00000, 00) in the city population variables. New York can be identified, and we have changed the population codes accordingly.
- ANCESTR1 and ANCESTR2. An error in the 1990 PUMS documentation slipped into the IPUMS. Anyone with a code of 0324 (West German) should have been coded 0460 (Greek). This is now fixed.
- MBPL, FBPL. In 1970, recoded "U.S. possesions, n.s." to match the documentation (code 12091); it was incorrectly coded 13000 in the data.
- YRIMMIG documentation change: the universe for 1980 excludes foreign-born persons who were citizens at birth. Changed 969 code to 970; it refers to 1965-1970, not 1965-1969. Added 914, which refers to the period before 1915 in the 1970 sample. We also changed the data, recoding 969 to 970.
- EDUCREC and HIGRADE. In 1980, N/A (under age 3) and "no schooling" were combined. We have separated them.
- BPL. In 1850, some persons with a birthplace of Iowa should have been coded as being born in Indiana (a confusion over the interpretation of the abbreviation "IA"). We have added programming to separate these codes.
- CLASSWKR. Removed new workers (persons looking for work but who have never obtained their first job) from the universe for 1940 and 1950 in order to increase compatibility. In 1990, reassigned unemployed persons who last worked over five years ago to the N/A category. In all years, the relevant information is preserved in other variables (EMPSTAT and YRSLASTWK).
- IND1950. The original 1940 contained an undocumented industry category. We determined that this is the category for "miscellaneous machinery" (code 358) The IPUMS had coded this category to "office and store machines" (code 357); we have recoded it to 358. In addition, the IND (contemporary industry classification) appendix for 1940 did not document this category. It has been added to the documentation.
May 20, 1998 -- OCC, OCC1950, FARM. Fixed a significant error in occupation coding in the 1860 sample (which also affected 1870, though to a much lesser degree). The missing data allocation procedure changed most persons with a blank response (no occupation) to having an occupation. This greatly overstated female occupational responses in 1860, particularly for married women. Since FARM status is inferred from occupation, and many of the allocated cases were farmers, the 1860 and 1870 samples overstated the number of farms. Both the 1860 and 1870 samples have been reconstructed to rectify this problem.
March 24, 1998 -- Made a significant, if somewhat subtle, change to the way the extraction system works. Altered the extraction system to zero out any variables that were "stacked" in the same column location as a requested variable. Previously, if you selected a variable that was not available in every sample chosen for extraction, the system would include whatever other variable was located in those columns in the raw IPUMS data files. For example, if you selected 1880 along with more modern samples and requested the variable Migration Status, 5 Years, the system would include the alphabetic data from the 1880 variable Last Name in those same extract columns. This caused considerable confusion among users.
Early March, 1998 -- Changed weights in "small" and "tiny" samples to be representative of total population.
Early March, 1998 -- Created a new Flat 1990 sample.
February 17, 1998 -- Changed the weights in the 1860 and 1870 files to account for oversample of blacks.
January, 1998 -- IPUMS-98 is available. For prior revisions, see Changes from IPUMS-95 to IPUMS-98.