|
|
Errata in and Revisions Made to IPUMS
Below is a list of upcoming changes to the IPUMS, along with significant changes made to the IPUMS since 1998.
The IPUMS archive page contains fully-functioning versions of earlier IPUMS websites.
Jump to beginning of year (scroll up to move forward in time):
Changes To Be Implemented in Future Releases
OWNCOST contains nonsensical data for the 2003 and 2004 ACS samples. The IPUMS data will be fixed with the next scheduled update; until then, users may download one of the following data files. Each contains the correct values of OWNCOST, together with ID variables (YEAR, DATANUM, and SERIAL) for merging with an IPUMS extract:
The next regularly scheduled IPUMS update will take place on September 2, 2010. Please check back later for any other errata and future revisions.
Revisions Made Previously
June 8, 2010. An error in the programming for the June 4 revision resulted in PERWT values of 0 for all cases in the 1880 1% and 1880 10% samples. These samples now contain the correct PERWT values.
June 4, 2010. Posted final versions of the IPUMS Linked Representative Samples. More information available here.
Posted new versions of all IPUMS samples. There are several new variables:
- CPI99 provides inflation factors to adjust dollar amounts into constant 1999 dollars. It is a constant value within each census year. CPI99 will be especially useful for extracts containing multiple years of data; users will need only to multiply dollar variables by a single inflation variable instead of manually multiplying different years of data by different constants. For more information, see our CPI adjustment page.
- For cases sampled from large households, NUMPERHH provides the number of people in that larger household. Several samples from 1850-1930 had errors in NUMPERHH. See the variable description for more information.
Other variables have been modified:
- For users' convenience, the standard weight variables for 1850-1930 will contain the more detailed weighting information previously available only in the detailed weights. Specifically, PERWT (and, aside from the 1940 and 1950 samples, SLWT) will contain values previously available in PERWTDET, and HHWT will contain values previously available in HHWTDET. PERWTDET and HHWTDET are no longer available. See the variable descriptions for full information.
- The new versions of PERWT and HHWT provide weight values to two decimals of precision for 1850-1930 data. Default weighting procedures in SPSS and SAS can work with fractional weights. When tabulating variables in Stata, however, users will need to specify that the weights are importance weights, which allow decimals. (The default weight in Stata's tabulate command is a frequency weight, which does not allow decimals.)
- There are slight improvements to the imputation of relationship to head (IMPREL), available only for 1850-1930 samples.
- Some residents of institutions (GQ codes of 3) were coded erroneously as being in the labor force (LABFORCE codes of 2) in the 1850-1930 samples. All such cases are now coded as NIU (LABFORCE = 0) or as not in the labor force (LABFORCE = 1).
- IMPREL, IMPMOM, IMPPOP, and IMPSP are now available in the 1880 10% sample.
- Data from one missing reel of microfilm was restored to the 1880 complete-count database.
- Parental and spousal links were occasionally illogical for households in the 1900 and 1910 data due to an error in how information on surviving children (CHSURV) and children ever born (CHBORN) was handled. This has been corrected.
- Six cases in the 1930 sample had RELATE codes of 0 (not a valid relate code). They now have the correct codes.
- RESPMODE (available only in the 2005-onward ACS/PRCS) is now a household variable rather than a person variable. No data have changed, however.
- INDNAICS in the 2008 ACS contained unnecessary characters around the correct values. It is now consistent with other years.
- There is a new code (did not work last year, but did work in the past five years) for WORKEDYR.
March 4, 2010. Posted new versions of 2003-onward files.
- Following conversations with Census Bureau staff about the calculation of dollar adjustment factors in the ACS/PRCS, the IPUMS no longer automatically applies the Census Bureau's adjustment factor to any dollar-amount variable in the ACS/PRCS. Such variables now exist in the IPUMS exactly as they were released by the Census Bureau, and IPUMS recommends that users analyze them without applying the adjustment factor. For more information, see the ACS adjustment page. Users who want to adjust dollar amounts may use the new variable ADJUST, which provides the adjustment factor as provided by the Census Bureau.
- Although adjustment factors are no longer applied automatically, users should know that there were problems in implementation for the 2005-2007 and 2006-2008 3-year files available between Jan. 12 and March 4, 2010. The adjustment factors in these samples should have varied with MULTYEAR. However, a programming error applied only the 2007 adjustment factor to all cases in the 2005-2007 3-year ACS/PRCS, and only the 2008 adjustment factor to all cases in the 2006-2008 3-year ACS/PRCS. Because these adjustment factors contained the CPI-U values to convert dollar amounts to constant dollars, dollar values were inaccurate. In the 2005-2007 3-year file, all dollar amounts should have been expressed in 2007 dollars, but dollar values for 2005 cases were too small by 6.1%, and dollar values for 2006 cases were too small by 2.7%. In the 2006-2008 3-year file, all dollar amounts should have been expressed in 2008 dollars, but dollar values for 2006 cases were too small by 6.1%, and dollar values for 2007 cases were too small by 3.5%. This error carried through to POVERTY values for nonrelatives of the householder (RELATE codes of 11, 12, and 13).
- In the 2003-onward ACS/PRCS, industry codes contain four digits of detail, but IPUMS codes (IND) formerly contained only three digits. All four digits are provided now. IND1950 and IND1990 (recoded, harmonized versions of industry) remain the same. New codes pages are available through IND and INDNAICS.
- OWNCOST was mistakenly omitted from the 2005-2007 and 2006-2008 ACS 3-year files. It is now available.
- Inspection of the revised 2006 AGE data revealed that almost all cases 65 years and older have changed in value, suggesting that the revised AGE data were imputed or created using synthetic data techniques. All persons age 65 or older in the 2006 ACS/PRCS (and all 2006 cases in the 2006-2008 3-year file) now have values of 4 in QAGE to indicate probable allocation.
Future revisions of the IPUMS data will be posted on a quarterly schedule (the beginning of March, June, September, and December). The next update will be on June 1, 2010.
February 10, 2010. Posted new versions of 1940-onward files.
- Due to a programming error, respondents in the 2000-2008 ACS/PRCS files with total family incomes (FTOTINC) of 0 or very small amounts received POVERTY codes of 0, which are reserved for N/A cases (group quarters and unrelated individuals not in a subfamily under age 15). They now receive POVERTY codes of 1.
- Respondents who were unrelated to the householder, under the age of 15, and not in an unrelated subfamily should have received POVERTY codes of 0 (N/A); instead, they were coded as 1. This has been corrected.
- Respondents in the 1970 and 1980 Puerto Rico samples who should have received GRADEATT codes of 0 instead received codes of "ZZ". This has been corrected.
Posted new versions of 1870 and 1900 datasets. In the new 1870 data, OCC and OCC1950 values are included for people uder the age of 16 who reported an occupation. The previous version coded these people as "not in universe." In 1900, several cases previously having invalid SPEAKENG values are now coded properly.
January 28, 2010.Posted new versions of 1900-1910 and 1940-onward files. Two minor changes have been made:
- Extra information on military service is now available in VETSTAT. Since 1990, the non-veteran category has included detail on people without military service, service members currently on active duty, and (since 2003) people whose only service is training in the National Guard or Reserves. The "Yes, Veteran" category has included detail on both service members who were previously on active duty at any time and (in 1990 only) people who were activated from the National Guard or Reserves. This detail is now contained in the new detailed version of VETSTAT; the general version coding remains the same as the previous VETSTAT coding. However, the actual frequencies in the general version differ from the previous one-digit version of VETSTAT for three reasons:
- In 1940, VETSTAT also contained valid codes for persons younger than 18, contrary to the stated universe for that year. All persons under age 18 are now classified as N/A in 1940.
- In 1950-1970, persons currently on active duty were not in the universe. They are now classified as non-veterans.
- Persons who should have been coded as non-veterans because they had undergone only training in the National Guard or the Reserves were erroneously classified as veterans in the 2000 census and 2000-2002 ACS data. As a result, the number of veterans was overestimated by 19.0 percent in the 2000 census and by 15.5 percent in the 2000-2002 ACS data (the difference comes from the fact that the ACS did not include group quarters in their sample). This has been corrected.
For more information, see the VETSTAT variable description.
- STATEFIP identifiers are now available in the 1% and 1.4% 1900 and 1910 samples for Alaska and Hawaii, and STATEICP is now available in 1900 for these two states (it has long been available in 1910).
Other changes are limited to the ACS/PRCS:
- The 2006-2008 3-year file contained incorrect data for RACESING, with the result that many RACESING codes were inconsistent with RACE responses. RACESING is now correct.
- AGE has been replaced in the 2006 ACS/PRCS file. The Census Bureau released revised 2006 data in December 2009 to fix a problem with disclosure control techniques. The original age variable is still available as AGEORIG06; for full details, see the AGEORIG06 variable description. Because the family interrelationship pointer variables (MOMLOC, POPLOC, and SPLOC) rely on age, some of these have changed as well; STEPMOM, STEPPOP, and other variables based on family units also stand to be affected.
- The DIFFCARE variable in the 2008 ACS/PRCS contained data for DIFFMOB instead; the correct variable is now available.
- QAGE was mistakenly not made available for the 2005-onward PRCS or for the 2006-2008 3-year files. It may now be downloaded.
January 12, 2010. Added 2006-2008 ACS/PRCS 3-Year files. These files include all cases in the previously-released single-year files from the 2006, 2007, and 2008 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2008 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files. Another notable difference is that some values of AGE in the 2006 portion of the 3-year file differ from those in the 2006 single-year file because of a change to the Census Bureau's disclosure avoidance methods. The Census Bureau has re-released the 2006 single-year file with the revised AGE variable, and it will be added to the IPUMS database soon.
An expanded version of the 1880 100% database is now available. The new release contains a variety of improvements over the previously-available version of this data:
- New variables were added, including BPLSTR, DWSIZE, FBPLSTR, GQSTR, INCORP, INCPLACE, LINE, MBPLSTR, MCD, MCIVIDIV, METDIST, OCC, PROBAI, PROBAPI, PROBBLK, PROBOTH, PROBWHT, QNAMELST, QQTRUNEM, QSURSIM, RACAMIND, RACASIAN, RACBLK, RACOTHER, RACPACIS, RACWHT, SFRELATE, SFTYPE, STREET, SUBFAM, SURSIM, URBAN, AND URBPOP.
- The total number of records changed slightly. The change is partly due to the removal of duplicate records. Enumerators or census clerks crossed out records when they found a duplicate or when the respondent was deceased on census day. For the new 100% database, we have removed these crossed out records. Also, the number of records changed for the city of St. Louis (see below). Overall the total number of households decreased by 26,089, and the total number of person records decreased by 42,572.
- The city of St. Louis was enumerated twice for the 1880 census. The previous 100% database contained data from the second enumeration. Although the second enumeration contains more records, we decided that for the final database, in the interest of consistency with the data from the rest of the country, we would release the first enumeration, as the first enumeration occured during the U.S. Census Bureau's designated timeframe for enumeration while the second took place 6 months later. As a result, the new release contains 7,573 fewer household records and 29,310 fewer person records for St. Louis.
- Improvements were made to geographic data. An audit conducted by Professor Michael Haines (Colgate University) revealed numerous county codes that were blank, invalid, or incorrect. This resulted in approximately 21,100 changes to the county variable and to variables that derive from county information (such as METRO, METAREA, etc.).
- We were able to take advantage of some additional detail in the RELSTR variable to better place individuals' detailed RELATE code, particularly in the 1200's (non-related individuals).
- Numerous other minor corrections were made to individual coding decisions.
Also posted new versions of all IPUMS-USA samples. There are several notable changes to variable availability and ease of use:
- The multigenerational household variable (MULTGEN), previously released for the first time in the 2008 ACS/PRCS exactly as provided by the Census Bureau, has been revised by IPUMS to contain additional detail and is now available for samples from 1880 onward. See the variable description for full information.
- A new variable, HOMELAND, identifies PUMAs that contain
a Census Bureau-designated American Indian, Alaska Native, or Native Hawaiian homeland area.
- A new variable, EDUC, contains all information on educational attainment that was previously spread across four variables (HIGRADE, EDUC99, EDUC00, and EDUC08). HIGRADE contains additional detail on educational attendance and has been retained. The other three variables are now superfluous and have been removed along with EDUCREC, the previous summary variable for educational attainment. EDUC has both general and detailed versions. The general version is the equivalent of EDUCREC in that it provides the set of general categories that can be identified in each year of data, but these general categories are more detailed than those formerly contained in EDUCREC. Additionally, all samples in and after 2000 contain the detailed category "Some college, but less than 1 year". In the old EDUCREC variable, this category was classified as "1-3 years of college". Because the people in this category have completed 12th grade but not 1 year of college, they are now classified as "12th grade" in the general version of the new EDUC variable.
- The occupational standing measures EDSCOR50, EDSCOR90, NPBOSS50, and NPBOSS90, which were based on EDUCREC, are now based on the more precise variable EDUC (general version). This has resulted in some changes. The EDSCOR measures are unchanged between 1950 and 1990; in all samples 2000 and after, though, they have decreased by an average of 6 to 7 points because of the code shift. The NPBOSS measures, which rely in part on the ordering of occupations' educational composition, have also changed, but by no more than 10 points in either direction.
- GRADEATT (grade of school now attending) has been expanded to a general/detailed coding scheme to accommodate the increased detail available in the 2008 ACS/PRCS. The variable GRADE08, which previously contained this detail, has been removed. Additionally, information from HIGRADE has been used to expand GRADEATT's availability to the 1960-1980 samples.
- Migration variables have been streamlined. Because of a programming error, YRSPR contained incorrect data for all observations, and data for year of immigration to Puerto Rico was split between YRIMMIG (for 1910-1920) and YRIMMIPR (for 1980 onward). This has been corrected. Additionally, YRSPR is limited to 1910, 1920, and 2000 onward; but YRSPR2 (an intervalled version of YRSPR) is now available and makes this information available in a less detailed form for 1980 and 1990 as well.
- For the ACS/PRCS multi-year samples, YEAR previously gave the actual year of survey (e.g., 2005, 2006, or 2007 for the 2005-2007 3-year file). To ensure that the combination of YEAR, DATANUM, SERIAL, and PERNUM uniquely identifies individuals, YEAR now provides the last year of data (e.g., 2007 for the 2005-2007 3-year file). Information on the actual year of survey has been shifted to a new household-level variable called MULTYEAR, valid only for the multi-year ACS/PRCS.
There are also several more minor changes:
- QGCHOUSE (the data quality flag for GCHOUSE) was mistakenly not made available before; it may now be downloaded.
- QYRSPR (the data quality flag for YRSPR) contained all 0's due to a programming error. It now contains correct data.
- BUILTYR2 was previously available in 2000 only for that year's ACS sample; it is now available for all samples in that year.
- All negative values of replicate weights (REPWT and REPWTP) had been recoded to zero for ease of use in statistical software packages. However, in further discussions with StataCorp technical staff, it emerged that Stata can handle negative replicate weights. (Neither SAS nor SPSS can automatically process the kind of replicate weights included in the ACS and PRCS data.) The original replicate weight values are now provided, and IPUMS now provides an FAQ page on replicate weights that contains directions for using ACS/PRCS replicate weights in Stata.
- Because of a programming error, many cases that should have been coded as 0 on YRSUSA2 in the 2008 ACS were instead coded as 5, and many other cases that should have been coded as 1, 2, or 3 were instead coded as 0. This has been corrected. YRIMMIG and YRSUSA1 were accurate and remain unchanged, except for correcting another programming error that coded valid YRSUSA1 values of 0 as 1 in 1910-1930 samples. (YRSUSA1 codes of 0 contain both N/A cases and cases that arrived in America less than one year ago; they can be distinguished using BPL. See the YRSUSA1 codes page for more information.)
- Because of a programming error, all commutes (TRANTIME) over 99 minutes were too small by a factor of 10. This affects approximately the longest 0.5% of commutes in all ACS and PRCS samples. This has been corrected, and TRANTIME has been widened to three digits.
- QMIGRAT1 (the data quality flag for MIGRATE1) contained incorrect data for the 2006-2008 ACS samples, instead duplicating QMARST (the data quality flag for MARST). This has been corrected.
- Persons with OCC1950 values of 595 (armed services), 997 (missing/unknown), and 999 (N/A) received 59.5, 99.7, and 99.9 respectively as their NPBOSS50 scores. They now receive the appropriate N/A codes of 999.9.
- In the 2003 and 2004 ACS/PRCS samples, respondents with "some college but less than one year" were erroneously classified in EDUC (and the former EDUC99) as having an "associate's degree, occupational program", while those with "one or more years of college, no degree" were erroneously classified as having an "associate's degree, academic program". This has been corrected.
- In the 2008 ACS, three respondents with RACE codes of 827 ("White and one or more major race groups, n.e.c.") and one respondent with a code of 991 ("White race; Some other race; Black or African American race and/or American Indian and Alaska Native race and/or Asian groups and/or Native Hawaiian and Other Pacific Islander groups") received incorrect codes of "0" on RACESING. This has been corrected.
- Several other variables have been widened:
- YRIMMIG and YRIMMIPR are now 4-digit years instead of 3-digit codes.
- INCTOT and INCEARN now contain seven digits to accommodate the highest incomes. For INCTOT, this affects fewer than 60 cases in each of the 2006-onward samples; for INCEARN, this affects only four cases in 2008.
- BEDROOMS and ROOMS are now two digits wide to accommodate expanded detail in the 2008 ACS/PRCS. All other samples are unaffected.
November 9, 2009. Posted new versions of the 2008 ACS/PRCS to correct erroneous data in REPWT14. Additionally, cases in the 1920 1% sample that should have been coded as "9997" (unknown) in YRNATUR instead received codes of "ZZZZ". This has been corrected.
November 6, 2009. Posted new versions of the 2008 ACS/PRCS. HHINCOME, FTOTINC, INCEARN, and INCTOT were not provided before due to errors in the original Census Bureau data; they are now available because the Census Bureau has released new data. POVERTY was previously provided just as the Census Bureau released it, with different values for each nonrelative of the householder. The POVERTY variable is now calculated as in all previous samples, where people in unrelated subfamilies have the same value.
The Census Bureau has not documented their data update, so users should know that any 2008 PUMS files downloaded from the Census Bureau's website between October 30 and November 3 have incorrect data for the four income summary variables with the Census Bureau names of FINCP, HINCP, PERNP, and PINCP. And, as of November 6, the DataFerrett data had not been updated; they still contain incorrect codes of -$59,999 (HINCP and FINCP), -$10,000 (PERNP), and -$19,999 (PINCP).
There are two other changes to IPUMS-USA data:
- YRSUSA1 was calculated incorrectly for the 2008 samples. It is now aligned with YRIMMIG.
- The documentation for MULTGEN, a new variable measuring multigenerational households that IPUMS provides without modification from the original Census Bureau data, has been updated to reflect the results of preliminary examination by IPUMS staff. We recommend that researchers use this variable with caution.
November 4, 2009. Added 1% samples from the 2008 American Community Survey (ACS) and the 2008 Puerto Rico Community Survey (PRCS). Together, the samples contain approximately three million person records. The 2008 ACS is the third ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2008 data.
The lowest level of geographic identifier in the 2008 ACS is the PUMA; 2008 PUMAs have the same boundaries as those in the 2005-2007 ACS and the 2000 census samples. The IPUMS version of the 2008 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
There are several noteworthy changes to the IPUMS data that stem from the Census Bureau's modifications to the ACS/PRCS questionnaire; users should consult our page on the 2008 ACS/PRCS.
We have also made a number of other improvements to the IPUMS data. Specifically:
- Incorrect adjustment factors were used for all dollar amounts in ACS and PRCS samples. This resulted in dollar amounts that were nearly 1 percent smaller than they should have been in 2006 ACS/PRCS data and 2006 cases in the 2005-2007 3-Year ACS/PRCS files. For all other ACS/PRCS data, dollar amounts were off by less than 0.5 percent. This was due to an error in the Census Bureau documentation, which states that the adjustment variables convert dollar amounts into July dollars (for instance, see the 2005 ACS Accuracy Statement, p. 13). Through further conversations with Census Bureau staff, we have realized that these variables actually convert dollar amounts into calendar year dollars. This is true for all ACS samples since 2000. All dollar amounts in IPUMS ACS/PRCS samples are pre-adjusted to reflect calendar year dollars.
- The occupational standing measures have been refined.
- NPBOSS50 and NPBOSS90, which are based on median educational attainment and median earned income, have been re-calculated using the standard formula for calculating medians from grouped data. These were previously calculated using the GMEDIAN function in the SPSS MEANS command, which appears to have a programming error that causes its calculations to diverge from its stated method. The changes are quite small.
- ERSCOR50 and ERSCOR90, previously calculated as if the income data were not grouped, have been re-calculated using the above procedure for calculating medians from grouped data. Also, IPUMS now weights occupations by the number of workers they contain when calculating the standardized median incomes on which the ERSCOR measures are based. Because this weighting was not carried out before, scores for ERSCOR50 and ERSCOR90 did not truly represent the percentage of workers in occupations having lower median earnings than a given occupation, as stated in the documentation. Rather, they represented the percentage of occupations having lower median earnings than a given occupation. Together, these two improvements have the potential to make significant changes to the ERSCOR measures: computing medians without attention to the grouped nature of the data resulted in many ties in the positioning of occupations, which were broken by the (arbitrary) occupational category numbers. The increased precision alters the relative and absolute position of the occupations in the distribution of standardized median incomes, while weighting for the size of the occupations further alters the absolute position of each occupation in this distribution.
- The EDSCORE, ERSCORE, and NPBOSS measures rely on statistics that are derived from the PUMS data (for instance, the proportion of people in a given occupation who have a college education). Samples containing data from 2006-2007 were previously based on analysis of the 2006 1-year data. Components for these variables are now calculated separately for each sample.
- Some replicate weight values (REPWT and REPWTP) in the original Census Bureau 2006 and 2007 ACS/PRCS data files are negative. However, many statistical procedures balk at negative weights. In this latest revision, IPUMS has recoded all REPWT and REPWTP values to 0 where they were less than 0 in the original Census Bureau file. This affects very few cases, typically fewer than 40 for any one set of replicate weights. According to Census Bureau staff, this introduces no bias into replicate standard errors, and tests of IPUMS data confirmed this.
We also made several changes that were not specifically related to the 2008 ACS/PRCS data release:
- In the 1850-1870 samples, blanks on the REALPROP and PERSPROP items are now coded as 0. Some of the samples previously coded blanks as 999999.
- For household fragments in the 1910 1% data that were reunited with their proper household (SAMPRULE = 6), persons that fell outside the sampling window now receive PERWT, SLWT, and PERWTDET values of 0, in accordance with IPUMS documentation. Previously, such persons were inadvertently given the same weights as everyone else in the sample. This affects 573 household fragments.
- HWSEI, PERWTDET, and HHWTDET, which have two implied decimal places, are now divided by 100 automatically in the command setup files. EDSCOR50, EDSCOR90, ERSCOR50, ERSCOR90, NPBOSS50, NPBOSS90, PRESGL, and PRENT, all of which have one implied decimal place, are now divided by 10. Previously, users were required to perform these calculations.
- Year of naturalization (YRNATUR) now indicates the full four-digit year, rather than the last three digits (as was the case previously).
- The names of some disability variables available in the ACS have been streamlined, and minor errors in the IPUMS documentation have been corrected; for more information, see our page on the 2008 ACS/PRCS.
- The coding of BUILTYR2 (age of structure) has been changed. Previously, higher values of this variable represented older structures, and buildings constructed most recently had lower values. Because the most recent year of construction needs to be represented by 1, each additional year of data would otherwise have required frequent and inconvenient code changes. Now, higher values of this variable represent younger structures, and future revisions will merely add (not change) codes.
- For 2005-2007 ACS and PRCS samples, the data quality flag for CONDOFEE (QCONDOFE) contained data for QCOSTGAS instead. This has been corrected.
October 9, 2009. Added two new variables for 1850-1930. In these samples, dwellings could include more than one household. Households have always been uniquely identified by SERIAL; the new variable DWELLING is a unique identifier for dwellings. Within each value of DWELLING, there may be more than one SERIAL. The new variable DWSEQ indicates the order in which households were enumerated within the dwelling. For more information, please see the variable descriptions.
October 8, 2009. Added higher-density samples for 1880 and 1900. The 1880 10% sample has replaced the preliminary 1880 5% sample, while the 1900 5% sample has replaced the preliminary 1900 2.5% sample. The final samples contain all cases from the preliminary samples (which came from odd-numbered microfilm reels only) as well as new cases from even-numbered microfilm reels. For more details, see the sample description page for 1880 and 1900.
August 11, 2009. Posted new versions of all linked data samples. The LINKWT variable has been corrected in all samples. Due to a processing error, LINKWT values were low by an order of magnitude ranging from 2x to 50x. Any data that was downloaded previously should be replaced with these new data.
June 17, 2009. Posted new versions of 1950-2007 data.
June 11, 2009. Minor correction to 2007 ACS/PRCS data. In the 2007 ACS (and the 2007 cases in the 2005-2007 ACS 3-year file), some cases in Florida had missing values of PROPINSR. These are now coded as 9999, which is the correct PROPINSR topcode for Florida. The documentation of topcodes has also been updated to reflect this change.
May 29, 2009. Corrected minor inaccuracies in 2000 and ACS/PRCS data.
- In the 2000 PUMS and all ACS/PRCS data, persons with OCC codes of 384 ("miscellaneous law enforcement officers") received OCC1990 codes of 405 ("housekeepers, maids, butlers, stewards, and lodging quarters cleaners"). They now receive OCC1990 codes of 423 ("Other law enforcement: sheriffs, bailiffs, correctional institution officers"). (Note that this change diverges from the BLS working paper on which OCC1990 is based.)
- In all ACS/PRCS data, persons related to the household head were erroneously coded as 0 (N/A) for POVERTY if their total family income (FTOTINC) was negative. They now receive the proper codes of 1.
- In all PRCS data, POVERTY values for all cases were based on IPUMS calculations from topcoded income data. For members of the primary family in the household, POVERTY values now reflect the original Census Bureau values (based on non-topcoded income data), in accordance with IPUMS' treatment of ACS data. The effect of this alteration is small; for 92 percent of such cases, POVERTY values change by no more than three percentage points. For unrelated individuals and members of any secondary families, POVERTY values continue to be based on IPUMS calculations (see the variable description for background).
- In the 2006 ACS (and the 2006 cases in the 2005-2007 ACS 3-year file), group-quarters residents were erroneously coded as missing for QMOVEDIN (the data quality flag for MOVEDIN). They are now coded as 0.
May 13, 2009. Added four new variables describing subfamilies to 1880-2007 IPUMS samples: SFTYPE (subfamily type), SFRELATE (relationship within subfamily), SUBFAM (subfamily membership), and NSUBFAM (total number of subfamilies in the household). For more information, see the subfamilies overview page.
Also, documentation for other family interrelationship variables has been updated to conform to longstanding IPUMS procedures:
- When linking under the third rule for MOMRULE or POPRULE, the IPUMS uses an additional condition in surveys where respondents can give multiple responses (2000, ACS, and PRCS): persons for whom a single race is listed may not be linked to potential parents of a different race. Users should note that this condition has long been applied to 2000 and ACS data, but is now applied to the PRCS for the first time.
- Persons receive STEPMOM codes of 1 when the difference in ages between them and their mother is less than 12 years or greater than 54 years--not less than 15 years or greater than 49 years, as the documentation previously stated.
- Persons receive STEPPOP codes of 1 when the difference in ages between them and their father is less than 14 years--not less than 15 years or greater than 64 years, as the documentation previously stated.
See the variable descriptions for more information.
April 21, 2009. Corrected missing values and other minor inaccuracies in several samples. First, several variables contained missing data for some cases. Missing data has been assigned to the proper codes as follows:
- In the 1880 100% database, 21 cases that were mistakenly coded as missing on ENUMDIST are now been coded as "0", and SUBSAMP (formerly unavailable in these data) is now provided.
- In the 1900 1% sample (both with and without oversamples), four cases contained missing data for SUPDIST. One of these is now coded as "7", one as "14", and two as "73". Additionally, five cases contained missing data for DWSIZE. Two of these are now coded as "3", one as "5", and two as "7".
- In the 1920 Puerto Rico sample, two cases that were mistakenly coded as missing on ENUMMO are now coded as "01".
- In the 1930 1% sample, values of IND1930 and OCC1930 that contained non-numeric characters were mistakenly coded as missing; the proper values are now available.
- In the 1950 sample, missing data for WKSWORK1 has been assigned to "00" (N/A).
- In the 1980 urban/rural sample, cases that were coded as "3570" (Lexington, KY) for CITY have been switched to "3590" (Lexington-Fayette, KY) to account for the 1974 merger of Lexington and Fayette County. Additionally, city populations (CITYPOP) are now identified for this city as well as for city codes 6410 (Scranton, PA) and 6650 (Springfield, IL).
- In the 1980 labor market sample, the variables MIGCZ5 and PWCZ were mistakenly coded as missing for a large number of cases; the proper values are now available.
- In the 2000 5% and 1% Puerto Rico samples, missing data for INCWAGE, INCTOT, and FTOTINC has been assigned to "999999" (N/A).
- In the 2000 5% Puerto Rico sample, the variables PUMALAND and PUMAAREA were mistakenly coded as missing for all cases; the proper values are now available.
Second, all housing units with 10 or more persons unrelated to the household head have been re-classified as group quarters in all American Community Survey and Puerto Rican Community Survey samples, consistent with the treatment of such households in the 2000 census. For more information, see GQ. The cases in such housing units are now coded as 5 in the GQ variable and 9 in the GQTYPE variable (900 in the detailed version GQTYPED).
Third, information on variable availability has been updated as follows:
- The 2006 and 2007 ACS/PRCS samples now include QMOVEDIN (the data quality flag for MOVEDIN).
- The 2000 1% Puerto Rico sample does not contain PUMA information, and all cases were coded as missing for this variable. PUMA may no longer be downloaded with this sample.
- All ACS/PRCS samples now include GQTYPE to accommodate the aforementioned change in GQ coding (see above).
Finally, the IPUMS variables FDSTPAMT and OWNCOST are now adjusted to calendar-year dollars in all ACS/PRCS samples; see the ACS income variables note for more information.
April 1, 2009. Improved and updated the coding of in-laws in the 2000-2007 American Community Survey (ACS) and 2005-2007 Puerto Rican Community Survey (PRCS) samples. In these samples, the Census Bureau's relationship variable includes only a global "in-law" category. IPUMS attempts to provide a more detailed classification of parents-in-law, siblings-in-law, and children-in-law in the RELATE variable. The new release of the ACS and PRCS datasets improves the procedures for making these detailed in-law assignments. More information on the new procedures is available here. Additionally, users should take note of three coding errors in the old classification scheme that have been corrected and/or no longer apply in the new classification scheme:
- Many never-married in-laws, all of whom should have been classified as siblings-in-law under the old classification scheme, were instead classified as parents-in-law or children-in-law. This condition no longer applies.
- In households containing unmarried partners of the head, the classification of in-laws departed from the stated rules and was likely to be particularly inaccurate. This is no longer the case.
- In the 2005-2007 PRCS, all in-laws were mistakenly classified as parents-in-law. This has been corrected.
Additionally, in the 2005-2007 ACS and PRCS 3-Year samples, the person weights (PERWT) for individuals in group quarters were not copied to the household weight variable (HHWT). This has been corrected.
March 5, 2009. Posted the 2005-2007 American Community Survey/Puerto Rican Community Survey 3-year file. This file includes all cases from the previously-released single-year files from the 2005-2007 ACS/PRCS. The new 3-year file differs in several ways from the single-year files. Most importantly, weights have been re-calculated, incomes and other dollar amounts have been standardized to 2007 dollars, and different topcodes have been applied. For more information, please see this FAQ on multi-year PUMS files.
March 1, 2009. PRCS data from 2005-2007 have been altered to resolve small coding differences across survey years. All 1,093 cases previously coded as 2 on the ABSENT variable in the 2006 and 2007 PRCS single-year files are now coded as 3, and one individual previously coded as 899 on the RACE variable in the 2005 PRCS single-year file is now coded as 943.
There was also a slight change in the PRCS's main immigration variable. PRCS data previously available in YRSUSA1 has been shifted to YRSPR, and the flag associated with this variable has changed from QYRIMM to QYRSPR.
February 9, 2009. Posted new versions of linked data samples for males from 1860-1880 and 1870-1880. In the 1860 data, 303 cases and in the 1870 data, 302 cases were removed after applying a filter for records where there was a middle initial mismatch that previously had not been appled properly.
December 20, 2008. Posted new version of the linked data sample for males from 1850-1880, 1880-1900, and 1880-1910.
Changes to the 1850-1880 data: the dataset was increased by 49 records. Some records were removed and some added as the result of 1) rerunning one of the classifiers and 2) properly applying a middle initial mismatch filter.
Changes to the 1880-1900 and 1880-1910 data: 282 cases were removed from 1900 and 215 cases were removed from 1910, after applying a middle initial mismatch filter that previously had not been applied properly.
December 11, 2008. Posted remaining linked data samples. Also posted new versions of samples linking couples from 1870-1880 and 1880-1910. In the 1870 data, 10 cases that previously had a LINKWT of 0 were given the correct non-0 LINKWT values. In the 1910 data, 140 cases that had LINKWT values greater than 5 were assigned values of 5 (the maxium allowable LINKWT).
November 11, 2008. Posted
new versions of samples for 1970-2007. Improvements were made to
the 1970 samples to correct the variable INCOTHER. Samples from
1980-2007 were expanded to include the variable OWNCOST.
Posted new version of the 1880 100% database. Fixed problems with the MCDSTRNG and PAGENO variables. Group quaters units containing more than 60 people were split into 1-person households. Researchers needing to study these units intact can use SERIAL80 and PERNUM80.
October 11, 2008. Added 1880
100% population database. This dataset was originally entered for
genealogical purposes, by the Church of Jesus Christ of Latter Day
Saints (LDS). Data cleaning and harmonization took place at the
Minnesota Population Center (MPC). Versions of this data are also
available from the the MPC's North
Atlantic Population Project and the LDS's genealogical website
FamilySearch.org.
The IPUMS-USA version of the data contains fully integrated codes and labels, newly-constructed family inter-relationship variables, and missing data allocation for key demographic variables. Since the dataset was first constructed for genealogy, several variable groups were never entered. Excluded variables include items relating to school, literacy, unemployment, disability, month of birth, marriage within the past year, and street address. The most detailed geographic variables are MCDSTRNG and INCSTRNG.
Added 2.5% preliminary sample of the 1900 census. This sample is "preliminary" because the final version will contain 5% of the population. The preliminary sample includes data only from odd-numbered microfilm reels. Counties on even-numbered reels are not represented in this dataset. Alaska and Hawaii are also excluded from the preliminary dataset. The final 5% dataset will be released in early 2009.
September 26, 2008. Added 1% samples from the 2007 American Community Survey (ACS) and the 2007 Puerto Rico Community Survey (PRCS). The samples have approximately three million person records. The 2007 ACS is the second ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006-2007 data.
The lowest level of geographic identifer in the 2007 ACS is the PUMA; 2007 PUMAs have the same boundaries as those in the 2005-2006 ACS and the 2000 census samples. The IPUMS version of the 2007 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
Note that the name of the IPUMS variable describing military service September 2001 and later has been changed from VET01X03 to VET01LTR. This name more accurately reflects the information contained in the variable.
More information on the background of and future plans for the American Community Survey is available at the ACS information page.
April 11, 2008. Posted IPUMS Version 4.0, the first major revision of the IPUMS files since 2004. Includes revised versions of all samples from 1850-1930, a new 1880 5% sample, and 13 new samples from the Puerto Rican Censuses of 1910-2000 and the Puerto Rican Community Survey. IPUMS 4.0 contains many new variables, including long-term hispanic identification back to 1850 (HISPAN), a consistent single-race identification variable from 1850-2006 (RACESING), a battery of socioeconomic indices, original strings for occupation (OCCSTRNG) and industry (INDSTRNG), and new detailed weight variables for the historical samples (HHWTDET and PERWTDET), and new standardized low-level geographic identifiers (MCD and INCORP). More information is available on the IPUMS 4.0 release page.
The most recent previous version of IPUMS data and documentation (IPUMS 3.0) is still available via the IPUMS archive page at ICPSR. The archive page permits users to revise old extracts, create new extracts, and download data and documentation. The link titled "IPUMS-USA website as of March, 2008" leads to a fully-functioning mirror of the IPUMS website as it existed prior the release of IPUMS 4.0. The archive page contains versions of the website from previous years as well.
February 14, 2008. Posted a new version of the 1950 census sample, with a correction made to the BPL variable. Several cases that had been erroneously coded "Missing/blank" are now coded correctly as follows: 94 cases coded "Israel," 9 coded "Byelorussia," and 3 coded "Pakistan." In the 2000 census samples, changed the MIGMET5 code for Hattiesburg, MS from 3285 to 3300 to be consistent with our METAREA coding.
Re-released VALUEH for the 2006 ACS sample; during a recent website update, VALUEH had inadvertently been removed from the data extract system.
December 14, 2007. Posted new versions of
the 2005 and 2006 ACS sample: released CITYPOP
for both samples. Fixed a small error in QCONDOFE
and QVALUEH
in the 2006 sample. Prior to the udpate, a small number of
cases had missing values for these two variables.
Posted new versions of the 1% and 5% census samples for 2000: fixed PUMALAND,
PUMAAREA,
and ACREPROP. Prior to this correction, these three variables contained incorrect data.
November 15, 2007. Posted new versions of
all ACS samples; a correction was made to the BUILTYR2
variable. Previously, households built prior to
1939 or earlier (BUILTYR2 = 10) were grouped with those reported
as being built in 2005 or later (BUILTYR2 = 1).
October 15, 2007. Added a 1% sample from the 2006 American Community Survey (ACS). The sample has approximinately 2,970,000 person records. The 2006 ACS is the first ACS sample to provide information on group quarters, which can be identified in the GQ variable. Researchers analyzing multiple ACS samples over time should remove group quarters cases, since they are available only in the 2006 data.
The lowest level of geographic identifer in the 2006 ACS is the PUMA; 2006 PUMAs have the same boundaries as those in the 2005 ACS and the 2000 census samples. The IPUMS version of the 2006 ACS provides the following additional geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
More information on the background and future plans for the American Community Survey is available at the ACS information page.
August 17, 2007. Added HHTYPE
for all samples from 1940 to 2005. In the future, HHTYPE will also be made available for all samples from 1850-1930.
July 25, 2007. Added RACESING
to all samples from 1900, 1910, and 1940. Added HISPAN
and HISPRULE
to all samples from 1910.
July 19, 2007. Posted updated versions of all samples from
1900, 1910, and 1930. Corrections were made to the CHSURV
variable in the 1900 and 1910 samples. All values were previously 0 or 1. The updated samples contain correct values. Corrections were made to
the NSIBS variable in the 1900, 1910, and 1930 samples. Previously, a small number of persons identified as "siblings" in the RELATE variable (code 701) incorrectly received a value of 0 for NSIBS. This error has been corrected.
July 10, 2007. Posted updated versions of all
samples from 1970 and 1980, and the ACS samples from 2000, 2001,
and 2002. The updated samples include fixes to COSTELEC,
COSTGAS,
COSTFUEL,
and COSTWATR.
These variables did not properly identify cases having values of
greater than 9990. All cases in this range are in the universe but
have unreported values, usually because utility costs were included
in rent payments. The old versions of the datasets incorrectly identified
these cases as not being in the universe. COSTELEC and COSTGAS had
the additional problem of presenting monthly values instead
of annual values. These problems are now fixed.
The new 1980 5% sample additionally fixes
a problem in the CITY
variable. In the old sample, San Francisco was incorrectly identified.
It has been corrected.
June 21, 2007. Posted an updated version of the 1930 1% sample. The updated sample includes fixes of minor problems in OCCSCORE (missing occupation data was not being allocated), YRSUSA2 (some allocated values were inconsistent with YRSUSA1), and QMARST (this variable indicated that we made more logical edits than were actually made).
Released new data extraction system with the "Attach Variables" feature, which allows researchers to create variables specifying characteristics of respondents' spouses, mothers, fathers, and household heads.
Released new version of the 2005 American Community Survey sample that includes 160 replicate weight variables (see REPWT and REPWTP).
Released CITYPOP for 1850-1930 samples. Due to a technical problem, we had not been offering CITYPOP in these samples since February 2007. The CITYPOP values that we are providing now are not different from the values that were available prior to February.
June 7, 2007. Posted 1930 1% sample (up from previous 0.5% sample for 1930). The new sample includes several new occupation and industry variables - OCC, OCC1930, IND, IND1930 - as well as HISPAN and RACESING.
April 26, 2007. Added new occupation crosswalks
(OCC
to OCCSOC)
for the 2000 census samples
and the ACS samples; these
are availabe via links from our Occupation
and Industry documentation page. Also improved our OCC
and OCCSOC code lists (available from the respective variable descriptions)
for the 2000 census and ACS samples.
April 24, 2007. Posted a new version of the 2005
ACS; a correction was made to the MORTAMT1
variable.
April 9, 2007. Added Consistent PUMA variable and shapefiles
(see CONSPUMA).
CONSPUMA reconciles differences in low-level geographic identifiers
in the 5% samples from 1980, 1990, , and the 2005 ACS. Also
released all new shapefiles for low-level geographic identifiers
from 1970-2005. Changes to the previous shapefiles were minor: numerous
"holes" in the maps were assigned to their appropriate
PUMA, County Group, or SEA. All files are accessible via the links
on our geographic tools page.
March 27, 2007. Changed the name of the RACHIST variable to RACESING.
March 21, 2007. Added QHISPAN,
the data quality flag for HISPAN,
to the and ACS samples. Posted new versions of the 1940 and
1950 samples: a minor correction was made to the CHBORN variable.
Posted new versions of the 1910 samples: we corrected a problem with SERIAL so that households within multi-household dwellings are now uniquely identified. The problem had affected less than .13% of households in the 1910 1.4% sample.
February 15, 2007. Created HISPAN
and HISPRULE
variables for the 1900 and 1930 samples. A later data release will
create these variables for the 1850-1880 and 1910-1920 samples.
Created a new harmonized version of the TRIBE variable, which is now available in 1900-1910, 1990-2000, and the ACS.
Posted a new version of the 1940 sample: a minor correction was made to the VET1940 variable.
January 31, 2007. Relased new harmonized occupation and industry variables for 1950-2005: OCC1990 and IND1990. The OCC1990 variable was created in collaboration with researchers at the Bureau of Labor Statistics. Both variables are available only via the IPUMS.
Added metropolitan area designations to the 2003 ACS, in METAREA and MET2003. Metropolitan areas are also identified in the 2005 ACS IPUMS sample.
Created HISPAN and HISPRULE variables for the 1940-1970 samples.
Added RACHIST values to the 1950-1990 samples and the 2005 ACS IPUMS sample. RACHIST adapts an alogrithm developed at the National Center for Health Statistics to assign single races to persons who reported more than one race from 2000 onward.
December 19, 2006. Replaced the 1-in-250 1910 sample with two new samples: the 1910 1% sample and the 1910 1.4% sample with oversamples. The 1% sample includes a 1-in-100 national population sample, including Alaskans, Hawaiians, and persons enumerated on the American Indian Schedules. The 1.4% sample with oversamples includes a 1-in-70 national population sample that has been combined with large oversamples of Blacks, Hispanics, Alaskans, Hawaiians, and persons enumerated on the American Indian schedules. The 1910 Weighted sample must be used with weighting variables (see PERWT and HHWT).
Replaced the 1900 General sample with two new samples: the 1900 1% sample and the 1900 1% sample with oversamples. The 1900 1% sample is a 1-in-100 national sample, including Alaskans, Hawaiians, and persons enumerated on the American Indian Schedules. This sample has the same cases as the former "1900 General sample" did, though some variables and values have been modified in minor ways. The 1900 1% sample with oversamples is a 1% national sample that has been merged with 1-in-5 oversamples of Alaskans, Hawaiians, and persons enumerated on the American Indian schedules. The 1900 1% sample with oversamples must be used with weighting variables (see PERWT and HHWT).
More information about these samples is available in the 1900 and 1910 sections of the sample descriptions page. We expect to release a revised version of these samples in March 2007. The revised samples will included detailed geography at the minor civil division level and integrated versions of variables specific to the Alaskan, Hawaiian, and American Indian populations.
December 10, 2006. Posted new version of the 2005 ACS sample that includes the following new geographic identifiers: CITY, METAREA, METRO, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWTYPE, PWMETRO, PWCITY, and PWPUMAS. These variables were constructed at the University of Minnesota and are not available via the Census Bureau.
November 29, 2006. Posted new versions of all samples from
1940 through 2005. The new samples include several minor improvements
to SPLOC,
MOMLOC,
and POPLOC.
These modifications have resulted in minor changes to the constructed
household variables, the family
interrelationship variables, POVERTY,
and FTOTINC.
Detailed information on these variables can be found in the family
interrelationships documentation.
An error was corrected in the POVERTY variable for all samples. In two-person families where one person was over age 65 and the other person was under age 65, we sometimes used slightly different poverty thresholds for each member of the family. We should have applied the same threshold to both members of the family. This resulted in several thousand cases in each sample having a poverty value that was off by an average of two percent (10 points on POVERTY's 1-500 scale). We have corrected the problem.
The new samples also include a small number of corrected income values in the 1950, 1960, and 1970 samples. The majority of cases affected have negative income values.
November 20, 2006. Corrected a problem in the RACE variable in the 2005 ACS sample. There were approximately 3,000 cases with missing values. All of the cases were multi-racial persons. All cases are now assigned to the appropriate categories.
October 11, 2006. Posted 1% sample from the 2005 American Community Survey (ACS). The 2005 sample is the first ACS microdata to identify sub-state geography, including PUMA, MIGPUMA1, and PWPUMA00. The IPUMS version of the 2005 ACS also idenifies metropolitan status (METRO). A December 2006 release of the IPUMS 2005 ACS sample will identify CITY, METAREA, PUMASUPR, MIGTYPE1, MIGMET1, MIGCITY1, MIGPUMS1, PWMETRO, PWCITY, PWTYPE, and PWPUMAS. These variables are being constructed at the Minnesota Population Center and are not available via the Census Bureau.
The base data for the IPUMS 2005 sample is the ACS data that the Census Bureau released on October 5th, 2006. The Census Bureau had originally released a version of the dataset on September 11th, 2006. The September release contained several small errors, so the Census Bureau updated the dataset in October. The erroneous dataset was never available via the IPUMS data extraction system.
More information on the background and future plans for the American Community Survey is available at the ACS information page.
October 1, 2006. Posted 0.5% sample from the 1930 census (up from the previous 0.2% 1930 sample).
September 6, 2006. Posted new version of IPUMS-USA website. The website has a new design, and the content of most variable descriptions has changed at least slightly. Users can still access all extract requests made on the old website.
June 30, 2006. Posted new versions of the 2000 1%, 5%, and Unweighted samples: a correction was made to the MIGPLAC5 variable.
April 27, 2006. Posted new versions of all ACS samples: a correction was made to the INCBUS00 variable.
April 7, 2006. Posted new versions of all 2000 Census samples and all ACS samples: a correction was made to the OCC variable.
January 20, 2006. Posted new versions of the 2000 1%, 5%, and Unweighted samples, as well as the 2000 ACS: a correction was made to the MARST variable.
November 30, 2005. Posted 13 new samples on the IPUMS-USA website. All samples were previously available on the IPUMS-USA Beta site, which was shut down. The new samples combined add nearly 15 million cases to the IPUMS database. For more details on this data release, see the sample information page.
October 7, 2005. Posted new versions of the 2000 1%, 5%, and Unweighted samples: a correction was made to the VET55x64 variable. New versions of the 2000-2004 ACS samples were also posted. In all eight samples above, improvements were made to the INDNAICS and OCCSOC variables.
Corrections were made to the YRIMMIG, YRSUSA1, and YRSUSA2 variables in the ACS 2001-2004 samples.
September 16, 2005. Released the 2004 American Community Survey (ACS) sample on the IPUMS Beta site.
September 9, 2005. Released a new 2000 1% flat sample on the IPUMS Beta site. This is a national random sample drawn from the 2000 5% Census sample.
September 1, 2005. Posted a new version of the 1930 1-in-500 sample. Corrections were made to the VET1930 variable and the AGEMARR variable.
June 27, 2005. Posted new versions of the 2000 1% and 5% samples, and the 2000-2003 ACS samples. Added the following variables: RACHIST, PROBAI, PROBAPI, PROBBLK, PROBOTH, and PROBWHT. RACHIST is an historically compatible race variable which 'bridges' multiple-race responses into their most likely single race category. The other variables give detailed probabilities of each single-race response and are best used in combination with one another.
Removed RACGEN00, RACDET00, and SPANAMER from the data and documentation. The variables RACGEN00 and RACDET00 were redundant with RACE. A variable similar to SPANAMER can be created using the IPUMS variables MTONGUE, BPL, MBPL, FBPL, SPANNAME, and STATEFIP.
May 20, 2005. Released a revised version of the 1-in-100 sample of the 1900 census (see the August 21, 2003 revision note for information on the previous version of this sample). The revised dataset includes records extracted from Alaska, Hawaii, and the American Indian 1-in-5 oversamples (the complete oversample datasets are available via the IPUMS raw data download page).
Users should also be aware that the smaller 1900 sample previously available (the 1-in-750 "Preston" sample) will no longer be available via the IPUMS extract system. Users wishing to access this data can still download the entire dataset and SPSS command file via the IPUMS raw data download page).
May 13, 2005. Released a revised version of the preliminary 1-in-500 sample of the 1930 census. Corrected a major error in the race variable. The April 25th sample gave the "White" code (detailed race code 100) to all persons who reported their race as "Mexican." The revised sample gives these persons the new "Mexican" race code (detailed race code 140). The revised sample also corrects minor coding and labelling errors in the following variables: RENT30, GQTYPED, NUMHHTAK, FARMSCHD, ENUMMO, RADIO, HOMEMKR, VET1930, IND1950, MTONGUE, FBPL, MBPL, CITY, METRO, METAREA, URBAREA, and MDSTATUS.
April 25, 2005. Released preliminary 1-in-500 sample of the 1930 census. We expect to release a final 1-in-100 sample of the 1930 census by late 2007.
February 23, 2005. Posted new versions of the 2000-2003 ACS samples: a correction was made to the STATEICP variable.
February 1, 2005. Removed the POV2000 variable from the documentation and data. POV2000 was redundant with the IPUMS POVERTY variable. Both variables use the poverty matrix developed by the Social Security Administration in 1964 (and revised twice in the years since). The Office of Management and Budget's Directive 14 prescribes this definition as the official poverty measure for federal agencies to use in their statistical work.
November 23, 2004. Released the following samples on the IPUMS Beta site: the 2003 American Community Survey (ACS) sample, the 1990 Labor Market Areas sample, the 1980 Labor Market Areas sample, and the 1980 Detailed Metro/Nonmetro sample.
October 13, 2004. Posted new versions of the 2000 1% and 5% samples, and the 2000-2002 ACS samples. The following variables were improved: OCC1950, SEI, OCCSCORE, and IND1950. The new variables utilize the Census Bureau's recently published occupation and industry crosswalks between the 1990 and 2000 censuses.
Made a slight correction to the multipliers used to construct the POVERTY variable in the 2000-2002 samples (for more information see the 1990 poverty status definition).
August 27, 2004. Posted a new version of the 2000 5% sample: a correction was made to the METAREA variable.
August 6, 2004. Posted a new version of the 2000 5% sample: a corrections was made to the PWCITY variable.
June 28, 2004. Posted new versions of all the 2000 and ACS samples. The RACE variable has been expanded to incorporate all information from the new multiple-race variables. Details about multiple-race responses are now included, some value labels were clarified, and a few other categories were added. Also, CITYPOP was added to the 2000 1% and 5% samples, and corrections were made to MOBLHOME and METAREA.
June 17, 2004. Released American Community Survey (ACS) samples for 2000, 2001, and 2002 on the IPUMS Beta site.
May 6, 2004. Made 2000 5% sample available via the main IPUMS-USA site.
May 1, 2004. Posted new versions of all of the 2000 samples. The 2000 5% sample now includes variables for Super-PUMA of Work (PWPUMAS) and Super-PUMA of Migration (MIGPUMAS). For the 2000 1% sample, Super-PUMA information that was previously in the PWPUMA00 and MIGPUMA variables is now in the new PWPUMAS and MIGPUMAS variables. A new version of the INCRETIR variable in all three 2000 samples now includes retirement incomes of greater than $99,998 (the previous Top code). All three samples include a corrected version of the POV2000 variable.
Posted new versions of all 1990 samples that account for the greater width of INCRETIR (see above).
April 22, 2004. Posted a new version of the 2000 1% sample: a correction has been made to the MIGCITY5 variable.
March 10, 2004. Posted new versions of the 2000 1% sample and the 2000 5% sample. Both samples now include the PWCITY variable. For those living in group quarters, the variable HHWT now has the PERWT value, rather than a value of 0. In addition, corrections were made to the following variables: BPL, STEPMOM, STEPPOP, MARST, and PUMASUPR.
Posted new versions of the 1990 State, Metro, Elderly, and Unweighted samples. A problem in the MORTGAGE variable was corrected in the new samples.
January 30, 2004. Posted new versions of the 2000 1% sample, the 2000 5% sample, and the Census 2000 Supplementary Survey (C2SS). The 2000 1% and 5% samples now include variables for CITY and MIGCITY5. Minor problems in PWPUMAS, PWPUMA00, MIGPUMA, YRIMMIG, and MORTGAGE have also been corrected in the new samples. The new C2SS sample includes corrected values for INCBUS00 (all values were 0 before).
September 9, 2003. Posted new versions of the 1990 State, Metro, Elderly, and Unweighted samples. FTOTINC and HHINCOME now contain negative values for families and households having a net loss of income. A problem in the PERWT variable was corrected in all samples. These were the only affected variables.
August 21, 2003. Penultimate 1-in-100 version of the 1900 Minnesota sample released on the IPUMS Beta site. The dataset includes 170,438 households containing 754,631 individuals. This version has a number of flaws that will be corrected for the ultimate final version of the 1900 Minnesota sample, which we anticipate releasing in the Spring of 2004. The older 1-in-200 preliminary sample is still available via the data extract system at the main IPUMS-USA site.
- No cases from Alaska and Hawaii are included in the current sample.
- Data quality flags are not yet available.
- Detailed geographic variables are not yet available (these include MDSTATUS, METDIST, URBAREA, MCIVDIV, INCPLACE, and INCORP).
- Coding is not yet complete on the occupation variable (OCC).
- Native Americans enumerated on the special 1900 Indian Schedules are not included in the current sample (although the current version does contain Native Americans enumerated as part of the general population). The 1900 Indian Schedules contained questions not asked on the general schedule, including tribe, percentage Indian blood, and tax status, among others.
- Detailed German birthplaces in the current 1900 sample are coded according to the new scheme developed for the 1860 and 1870 samples. Users of this data should note that these codes do NOT correspond to those listed in the BPL variable description. Detailed German birthplace codes for the 1860-70 and 1900 samples are available here.
Users should also be aware that the smaller 1900 sample previously available (the 1-in-750 "Preston" sample) will no longer be available via the IPUMS extract system. Users wishing to access this data can still download the entire dataset and SPSS command file via the IPUMS raw data download page.
October 11, 2002. Reposted preliminary version of 1900 Minnesota sample. The previous version had incorrect values for children ever born (CHBORN). The new dataset contains corrected values. No other variables
have been changed.
July 11, 2002. Final versions of the 1860 and 1870 samples released. The final 1-in-100 1860 IPUMS sample includes 54,094 households containing 273,947 free individuals and an additional 1,343 unoccupied dwellings. (A preliminary sample of the nation’s slave inhabitants is available separately at the 1860 Slave Schedule page.) The final 1-in-100 1870 IPUMS sample includes 79,023 households containing 383,308 individuals and an additional 1,447 unoccupied dwellings. Frequencies in the on-line documentation will be updated in the next few months. Both the 1860 and 1870 IPUMS samples are also available with oversamples of the black population. Sample weights for the flat and black oversamples have been adjusted to be representative of the total population.
The final 1860 and 1870 IPUMS samples now include occupation codes based on the U.S. Census Office’s 1880 classification system and detailed birthplace codes for individuals born in Germany. Several other changes have also been made, including a slightly modified urban/rural definition, minor changes in birthplace and occupation coding, and small changes in personal estate and real estate values. In addition, the final samples incorporate a few data additions and subtractions from the preliminary samples. For details of these changes and a listing of the new Germany detailed birthplace codes, click here.
May 7, 2002. Released preliminary version of the 1900 Minnesota sample. This 1900 Minnesota sample is a 1-in-200 nationally representative sample of dwellings taken from the 1900 U. S. Census of Population. The final version is scheduled to be released in 2004 and will have a 1-in-100 sampling density. Frequencies for this sample will be added to the documentation summer 2002. Currently both the 1900 Minnesota and the 1-in-760 1900 Preston sample are available. Ultimately the 1900 Minnesota sample will replace the 1900 Preston sample, although the Preston sample will be available by request.
The fundamental difference between the two 1900 samples pertains
to sample design. In the 1900 Preston sample nonfamily individuals--boarders,
lodgers, inmates, and military personnel--were sampled as individuals
regardless of household size. In contrast, the 1900 Minnesota sample
follows the general sample design used for the 1850-1880 and 1920
samples. For a discussion of issues relating to sample design see
Chapter 2 of the IPUMS documentation.
July 11, 2001 -- The IPUMS extract system upgrade was successfully installed on Wednesday, July 11, 2001. No changes were made to the IPUMS data. The new extract system will process user data requests faster than the previous system and will prevent small jobs from being continually sidetracked for large data requests in the queue. Since this upgrade affects only the behind-the-scenes data extraction system, users will notice little change in the request process, itself. Re-registration is not required; previous jobs will be available for revision; and new jobs will begin numbering from the user’s last completed job in the old system.
March 7, 2001. Released new preliminary (penultimate) versions of the 1860 and 1870 samples. Frequencies in the documentation will not be changed until release of final versions of these datasets, scheduled for summer 2002. Two versions of the 1860 and 1870 samples are now available:
- a flat 1-in-100 sample of all dwellings, and
- a black oversample containing a 1-in-50 sample of dwellings containing one or more blacks and a 1-in-100 sample of all other dwellings.
The sample weights in both the flat and black oversamples of the preliminary 1860 and 1870 PUMS have been adjusted to be representative of the total population. Although we believe that the new samples are near their final form-we expect only minor changes in the number of cases and the coding of a few variables between the current and final versions of the samples--users are advised that the current releases have a few known problems. In particular, the occupation ("OCC") variable in 1860/1870 is not coded. Users should rely on the occupation 1950 basis ("OCC1950") variable for studying occupation and labor force participation. In addition, detailed birthplace codes are not available for individuals born in Germany. Users may still use the birthplace variable (BPL), but no detail will be returned for German birthplaces.
Friday, August 18, 2000 -- The old IPUMS extract system was replaced by a new system incorporating enhanced features requested by users. One of the key features of the new system is the ability to modify and resubmit previous jobs. Data files from the two systems have been combined on a user-specific summary site. IPUMS data users previously registered in either extract system will not have to reregister to use the new extract system. Extract requests in the new system will begin numbering jobs from the highest numbered job in a user's personal extract summary.
January 22, 1999. Major error in the November 25 version of 1860 and 1870 samples corrected. The 1860/70 samples had an error in SURSIM, which in turn created errors in all the family interrelationship variables (IMPMOM, IMPPOP, IMPSP) and in the variables constructed from them (NCHILD, NCHLT, FAMSIZE, ELDCH, and so on). The error could also have implications for missing data allocation; we recommend tossing out any previous versions of 1860 and 1870.
July 1, 1999. Released new versions of 1850, 1860, 1870, 1880, 1900, and 1910 samples, containing the following enhancements and corrections:
- New geographic variables (METDIST, MDSTATUS, MCIVDIV, INCPLACE, INCORP, URBAREA) were added to 1850, 1880, and 1910 samples.
- Minor fixes to OCC1950, IND1950, CITIZEN, LIT, COUNTY, SEA, GQTYPE, GQFUNDS, NATIVITY, VOTE, MARRINYR, NAMEFRST, and NAMELAST.
- Missing age allocation procedures fixed to allow age 0 to be allocated. Improved rules for spouse imputation (IMPSP).
- Added cases from Bradley county, TN to 1850 that had been inadvertently dropped from the 1850 sample. PERWT adjusted slightly.
November 25, 1998 -- PERWT, NUMHHTAK, and GQFUNDS fixed on the 1860 and 1870 sample.
November 6, 1998 -- Revised preliminary samples of the 1860 and 1870 census released. Two versions of both the 1860 and 1870 PUMS are now available: (1) a flat 1-in-200 sample of all dwellings, and (2) a black oversample containing a 1-in-100 sample of dwellings containing one or more blacks and a 1-in-200 sample of all other dwellings.
The sample weights in both the flat and black oversamples of the preliminary 1860 and 1870 PUMS have been adjusted to be representative of the total population.
August 20, 1998 -- Revised IPUMS-98 database released.
- AGE Allocations 1850-1920. There was an error in the missing data allocation procedure for AGE affecting all pre-1940 samples. Since age is used as a predictor in many other allocations, constructed variables, and universe checks, the frequencies for many variables in the earlier samples have changed slightly from the original iteration of IPUMS-98.
- Split YRSINUSA into two separate variables--YRSUSA1 and YRSUSA2-- to enhance compatibility over time. YRSUSA1 (columns 145-146 in the raw data files) contains the unrecoded continuous measure of years in the U.S. from the 1900-1920 samples. YRSUSA2 recodes 1900-1920 and 1970-1990 into five intervals compatible among all sample intervals. Users desiring greater detail on the original 1970-1990 intervals can refer to YRIMMIG, which retains all of the original detail recorded in the variable discussion. Documentation change: the universe for 1980 should have excluded foreign-born persons who were citizens at birth.
- OCCSCORE, SEI. In 1850-1870, laborers who were changed via logical edit to farm laborers (i.e., they lived on a farm), continued to receive the OCCSCORE and SEI for laborers. They will now receive the score for farm laborers. The original 1900 sample incorrectly classified many domestics as "service workers, nec" in their original 1950 occupation classification. The IPUMS fixed the occupational code, but neglected to assign the appropriate SEI and OCCSCORES for the new occupation. This has been rectified.
- RACE. In 1990, persons who indicated hispanic origin were recoded out of "other race, nec" in the race variable into the category "Spanish write-in." Persons of Mexican origin were mistakenly excluded from this recode. This is now fixed.
- PERWT and HHWT in 1990. Previously, the IPUMS adjusted the 1990 weights so that the total weighted sample would yield the same population count as the published census returns. We removed this programming, since users could not reverse this change is they desired to, and because there seemed no reason to assert the accuracy of the 1990 count at this level of detail.
- CITYPOP, SIZEPL. In 1980, households in New York City received the code for "not identifiable" (codes 00000, 00) in the city population variables. New York can be identified, and we have changed the population codes accordingly.
- ANCESTR1 and ANCESTR2. An error in the 1990 PUMS documentation slipped into the IPUMS. Anyone with a code of 0324 (West German) should have been coded 0460 (Greek). This is now fixed.
- MBPL, FBPL. In 1970, recoded "U.S. possesions, n.s." to match the documentation (code 12091); it was incorrectly coded 13000 in the data.
- YRIMMIG documentation change: the universe for 1980 excludes foreign-born persons who were citizens at birth. Changed 969 code to 970; it refers to 1965-1970, not 1965-1969. Added 914, which refers to the period before 1915 in the 1970 sample. We also changed the data, recoding 969 to 970.
- EDUCREC and HIGRADE. In 1980, N/A (under age 3) and "no schooling" were combined. We have separated them.
- BPL. In 1850, some persons with a birthplace of Iowa should have been coded as being born in Indiana (a confusion over the interpretation of the abbreviation "IA"). We have added programming to separate these codes.
- CLASSWKR. Removed new workers (persons looking for work but who have never obtained their first job) from the universe for 1940 and 1950 in order to increase compatibility. In 1990, reassigned unemployed persons who last worked over five years ago to the N/A category. In all years, the relevant information is preserved in other variables (EMPSTAT and YRSLASTWK).
- IND1950. The original 1940 contained an undocumented industry category. We determined that this is the category for "miscellaneous machinery" (code 358) The IPUMS had coded this category to "office and store machines" (code 357); we have recoded it to 358. In addition, the IND (contemporary industry classification) appendix for 1940 did not document this category. It has been added to the documentation.
May 20, 1998 -- OCC, OCC1950, FARM. Fixed a significant error in occupation coding in the 1860 sample (which also affected 1870, though to a much lesser degree). The missing data allocation procedure changed most persons with a blank response (no occupation) to having an occupation. This greatly overstated female occupational responses in 1860, particularly for married women. Since FARM status is inferred from occupation, and many of the allocated cases were farmers, the 1860 and 1870 samples overstated the number of farms. Both the 1860 and 1870 samples have been reconstructed to rectify this problem.
March 24, 1998 -- Made a significant, if somewhat subtle, change to the way the extraction system works. Altered the extraction system to zero out any variables that were "stacked" in the same column location as a requested variable. Previously, if you selected a variable that was not available in every sample chosen for extraction, the system would include whatever other variable was located in those columns in the raw IPUMS data files. For example, if you selected 1880 along with more modern samples and requested the variable Migration Status, 5 Years, the system would include the alphabetic data from the 1880 variable Last Name in those same extract columns. This caused considerable confusion among users.
Early March, 1998 -- Changed weights in "small" and "tiny" samples to be representative of total population.
Early March, 1998 -- Created a new Flat 1990 sample.
February 17, 1998 -- Changed the weights in the 1860 and 1870 files to account for oversample of blacks.
January, 1998 -- IPUMS-98 is available. For prior revisions, see Changes from IPUMS-95 to IPUMS-98.
|