What's new in IPUMS-98?
IPUMS-98 is a comprehensive revision of the Integrated Public Use Microdata Series. We have expanded the documentation threefold, added new datasets and variables, and revised dozens of variables.
New Documentation
In addition to the User's Guide, IPUMS-98 has two additional documentation volumes designed to make any reference to the original PUMS codebooks unnecessary. The User's Guide Supplement (Vol. 2) contains geographic tools such as maps, tables of historical changes in metropolitan area composition, descriptions of data allocation procedures, and detailed supplementary variable information. Volume 3, Counting the Past, has all enumerator instructions, copies of enumeration forms, capsule histories of the censuses, and descriptions of PUMS sample creation procedures. Two further volumes detailing how original PUMS variables were assigned IPUMS values are in preparation. We will make on-line hypertext versions of the documentation available soon.
New Datasets
- New version of the 1920 sample - The new 1-in-200 1920 sample contains 66% more cases than IPUMS-95. Major improvements to the occupation and industry coding greatly improve the utility of the sample. We consider the new 1920 sample to be of similar quality to previous completed PUMS. The final version is scheduled for completion in 1999.
- 1860 and 1870 samples - We have begun work on the creation of the 1860 and 1870 PUMS. A preliminary 1-in-500 version of each PUMS is now available. The final 1-in-100 samples will be completed in 2001.
New Variables
- Labor force status - constructed with a consistent age universe for all years. Based on occupation before 1940 and employment status (1940-1990).
- Poverty status - uses 1990 poverty thresholds and criteria to calculate percent of poverty level for all families, secondary families, and individuals (1950-1990).
- City of residence five years ago (1980 and 1990) - based on county group and PUMA 5 years ago.
- Metro of residence five years ago (1980 and 1990) - based on county group and PUMA 5 years ago.
- Place of work: city - based on place of work county group and PUMA.
- Place of work: metropolitan area - based on place of work county group and PUMA.
- Imputed mother, father, and spouse linking variables - constructed identically for 1850, 1860, 1870, 1880, and 1910 to enable consistent comparisons with 1850, when relationship to head is not available.
- Number of households sampled - in years with dwelling-level samples, identifies the number of separate households from the dwelling that are included in the data. Enables the analysis of multi-household dwellings.
Because of the additional variables, we were forced to revise the layout of the raw data files. This will not affect users of our automated data extraction system.
Revised Variables
- Family interrelationship variables - significantly improved the variables linking husbands to wives and parents to children (MOMLOC, POPLOC, SPLOC).
- Metropolitan area - changed the classification to match 1990 FIPS codes, with 1 extra digit for historical change.
- City - fixed several errors and added more identifiable 1980 and 1990 cities.
- Occupation, 1950 - much improved coding in the 1920 sample.
- Industry - recoded 1910 to numeric from alphabetic format.
- Industry, 1950 - inferred industry from occupation responses in 1850, 1860, 1870, and 1880; improved 1920 dramatically.
- Weights - the various weights now reflect the actual total population counts rather than theoretical sample densities.
- Group quarters - reclassified some households to enhance comparability.
- Group quarters type - significantly revised the classification to make categories more consistent.
- Relationship - significantly revised.
- Birthplace - rationalized the coding scheme, changing some foreign-born codes.
- Spanish surname - identified in all census years that contain surname.
- Migration and place of work variables - renamed, added, and changed numerous variables.
In addition, we developed more consistent classifications of missing, "not applicable," and "not specified" categories. Wherever it was possible to impose a consistent universe without losing information, we did so.
Data Quality Flags
We have added a number of data quality flags previously excluded, made new flags corresponding to IPUMS allocation procedures, and corrected errors in flag coding and record layout.
Missing Data Allocation
We have allocated missing and illegible cases for all substantive variables from 1850 to 1920. Corresponding data quality flags are constructed to indicate where the data have been changed. In addition to the variables allocated in IPUMS-95, missing cases have been allocated for county, urban status, city, city population, metropolitan area, farm, ownership, mortgage, duration of marriage, children ever born, children surviving, citizenship, year naturalized, year immigrated, years in U.S., mother tongue, language, school, literacy, employment status, occupation, industry, class of worker, period unemployed, age in months, and month of birth.
Error Correction
We have made many corrections to the data since IPUMS-95 not documented above. Some were errors of the original census samples; others were introduced in the creation of IPUMS-95. We appreciate the help of those researchers who helped us identify a number of these data problems.