|
|
What's new in IPUMS-98?
IPUMS-98 is a comprehensive revision of the Integrated Public
Use Microdata Series. We have expanded the documentation threefold,
added new datasets and variables, and revised dozens of variables.
New Documentation
In addition to the User's Guide, IPUMS-98 has two additional documentation
volumes designed to make any reference to the original PUMS codebooks
unnecessary. The User's Guide Supplement (Vol. 2) contains geographic
tools such as maps, tables of historical changes in metropolitan
area composition, descriptions of data allocation procedures,
and detailed supplementary variable information. Volume 3, Counting
the Past, has all enumerator instructions, copies of enumeration
forms, capsule histories of the censuses, and descriptions of
PUMS sample creation procedures. Two further volumes detailing
how original PUMS variables were assigned IPUMS values are in
preparation. We will make on-line hypertext versions of the documentation
available soon.
New Datasets
-
New version of the 1920 sample - The new 1-in-200
1920 sample contains 66% more cases than IPUMS-95. Major improvements
to the occupation and industry coding greatly improve the utility
of the sample. We consider the new 1920 sample to be of similar
quality to previous completed PUMS. The final version is scheduled
for completion in 1999.
-
1860 and 1870 samples - We have begun work on the creation of the 1860 and
1870 PUMS. A preliminary 1-in-500 version of each PUMS is now available. The
final 1-in-100 samples will be completed in 2001.
New Variables
-
Labor force status - constructed with a consistent
age universe for all years. Based on occupation before 1940 and
employment status (1940-1990).
-
Poverty status - uses 1990 poverty thresholds and criteria to calculate percent
of poverty level for all families, secondary families, and individuals (1950-1990).
-
City of residence five years ago (1980 and 1990) - based on county group
and PUMA 5 years ago.
-
Metro of residence five years ago (1980 and 1990) - based on county group
and PUMA 5 years ago.
-
Place of work: city - based on place of work county group and PUMA.
-
Place of work: metropolitan area - based on place of work county group and
PUMA.
-
Imputed mother, father, and spouse linking variables - constructed identically
for 1850, 1860, 1870, 1880, and 1910 to enable consistent comparisons with
1850, when relationship to head is not available.
-
Number of households sampled - in years with dwelling-level samples, identifies
the number of separate households from the dwelling that are included in the
data. Enables the analysis of multi-household dwellings.
Because of the additional variables, we were forced to revise
the layout of the raw data files. This will not affect users of
our automated data extraction system.
Revised Variables
-
Family interrelationship variables - significantly
improved the variables linking husbands to wives
and parents to
children (MOMLOC, POPLOC, SPLOC).
-
Metropolitan area - changed the classification to match 1990 FIPS codes,
with 1 extra digit for
historical change.
-
City - fixed several errors and added more identifiable 1980 and 1990 cities.
-
Occupation, 1950 - much improved coding in the 1920 sample.
-
Industry - recoded 1910 to numeric from alphabetic format.
-
Industry, 1950 - inferred industry from occupation responses in 1850, 1860,
1870, and 1880;
improved 1920 dramatically.
-
Weights - the various weights now reflect the actual total population counts
rather than
theoretical sample densities.
-
Group quarters - reclassified some households to enhance comparability.
-
Group quarters type - significantly revised the classification to make categories
more consistent.
-
Relationship - significantly revised.
-
Birthplace - rationalized the coding scheme, changing some foreign-born codes.
-
Spanish surname - identified in all census years that contain surname.
-
Migration and place of work variables - renamed, added, and changed numerous
variables.
In addition, we developed more consistent classifications of missing, "not
applicable," and "not specified" categories. Wherever
it was possible to impose a consistent universe without losing
information, we did so.
Data Quality Flags
We have added a number of data quality flags previously excluded,
made new flags corresponding to IPUMS allocation procedures,
and corrected errors in flag coding and record layout.
Missing Data Allocation
We have allocated missing and illegible cases for all substantive
variables from 1850 to 1920. Corresponding data quality flags
are constructed to indicate where the data have been changed.
In addition to the variables allocated in IPUMS-95, missing cases
have been allocated for county, urban status, city, city population,
metropolitan area, farm, ownership, mortgage, duration of marriage,
children ever born, children surviving, citizenship, year naturalized,
year immigrated, years in U.S., mother tongue, language, school,
literacy, employment status, occupation, industry, class of worker,
period unemployed, age in months, and month of birth.
Error Correction
We have made many corrections to the data since IPUMS-95 not documented
above. Some were errors of the original census samples; others
were introduced in the creation of IPUMS-95. We appreciate the
help of those researchers who helped us identify a number of
these data problems. |