|
|
1910
Sampling Procedures
Go Back to Sampling Procedures Index
The 1910 IPUMS samples drew from seven component
samples. This document describes the sampling strategy employed
in each of the component sample.
The 1910 1% sample consists of all cases from the 1-in-100
national sample and 1-in-20 cases from the 20% samples of Alaska,
Hawaii, and the American Indian schedules.
The 1910 1.4% sample with oversamples consists of all cases from all seven component
samples listed below.
Sample design for the 1-in-100
national sample (SAMP1910 = 1)
The sample was drawn systematically from each of the 1,784 microfilm
reels of the 1910 census, ordinarily at intervals of five pages.
On each selected census page, one line was randomly selected and
designated as the sample point. Any valid sample unit beginning
at the sample point or within four subsequent lines was included
in the sample, yielding a 1-in-100 sample with equal probabilities
of inclusion for all individuals and households. Valid sample units
are defined as follows:
- Dwellings: structures containing fewer than
31 residents, with or without multiple families.
- Households: census families with fewer than
31 members in dwellings containing 31+ residents.
- Related groups in large units: groups related
by blood or marriage in census families with 31+ members. Family
relationships are inferred from relationship to head information
and surnames.
- Individuals in large units: unrelated individuals
in census families with 31+ members.
The sample also includes 3023 records that have a PERWT
of 0. These records are part of fragmentary households. Some members
of these households were located in sample windows and were originally
sampled as household fragments (see SAMPRULE).
These records received a non-zero PERWT. However, other members of
these households were enumerated elsewhere on the original manuscripts
(i.e., not contiguous with the sample window). When possible, we located
these individuals and reunited them with the remainder of their household
and assigned a PERWT of 0. Adding these records was useful in order
to construct household level variables that require information about
all members of a given household.
Sample design for the 1-in-5 samples for Alaska,
Hawaii, and American Indians (SAMP1910 = 2, 3, and 4)
The sample design followed the same rules as the 1-in-100 national
sample described above, with the following exceptions:
The schedules for Alaska and Hawaii have 50 lines per page number,
numbered 1-25 on the A side and 26-50 on the B side. Five-line sample
windows were randomly generated for every side of the census page.
The schedules used to enumerate the American Indian population
had 40 line per page. Eight-line sample windows were randomly generated
for every side of the census page.
Sample design for the 1-in-250 national sample
(SAMP1910 = 5)
The 1-in-250 national sample was created at the University of Pennsylvania
in the 1980s. The sample was comprised of two separately-drawn 1-in-500
samples. The first was funded by the National Institute of Child Health
and Human Development, and the second was funded by the National Science
Foundation. Households that appeared in the first 1-in-500 sample
were eligible for inclusion in the second.
For each of the 1-in-500 samples, one sample point was chosen for
every 5 pages. Since there were 100 lines per page, a sample point
was chosen for every 500 lines. If the examination line was blank
or if it was impossible to read, data entry operators rejected the
point, after recording it as blank on their entry file, and moved
on to the next examination line.
The 1910 1-in-250 national sample is a household sample. The creators'
definition of a household differed from that of the census family,
the Census Bureau's equivalent unit in 1910. A census family was
"a group of persons living together in the same dwelling place."
According to the instructions to the census enumerators, the people
constituting a census family need not be related by ties of kinship
if they lived together forming one household. Thus, all occupants
and employees of an hotel, boarding house, or lodging house were
considered to be a census family if that was their usual place or
abode; the same was true for officials and inmates of an institution
if they slept in the same building or group of buildings. A census
family could also be a single individual if the person lived alone.
Except in situations where there was no head of household, head
equivalent (e.g., superintendent of an orphanage), or where there
were more than 20 individuals unrelated to the head, the census
family and the 1910 PUS household are the same. Where there was
no head or head equivalent, each of the individuals in the group
was treated as a single-person household. Similarly, a census family
that included more than 20 non-family members was viewed as a collection
of several households, one of which is the group of individuals
related to the head, and the others of which are individual non-family
members. Non-family members were identified through their relation-to-head-of-household
code. Thus, for example, a boarder in a large family-run boarding
house was considered a single-person household, whereas the original
census regarded that individual as a member of the large census
family made up of people related to the proprietor and all the boarders.
This distinction helped to avoid including in the sample all of
the individuals in an especially large group quarters or institution,
some of which housed many hundreds of people.
Census families with 20 or fewer non-family members and a head or
head-equivalent were included in the PUS sample as a household only
if the head or head equivalent (in most cases, the first person
enumerated in the census family) appeared on the examination line.
When that happened the entire census family, including all non-family
members (e.g., companions, boarders, servants, hired hands), was
entered into the sample and became a PUS household. If any other
members of this type if census family appeared on the examination
line, the census family was rejected and the data entry operators
proceeded to the next examination line. To illustrate, if the individual
on the examination line were the son of the head in a six-member
census family, the census family was passed over. If, however, the
head was on the examination line, the data entry operator entered
all information for the head and the five other members of the census
family, located on the lines immediately below the head.
A census family with 21 or more non-family members and a head or
a head equivalent was regarded as a collection of smaller households.
One of the households contained all family members related to the
head. This household was entered into the PUS sample if the head
appeared on the examination line; non-family members were excluded.
If a relative of the head appeared on the examination line, the
household was excluded from the sample.
Each of the non-family members in such a census family was considered
a separate, one-person, household and was included in sample if
he or she appeared on the examination line. Only that person was
taken, even if he or she had relatives within the census family.
For example, if the examination line contained a male boarder with
a wife and two children, only the male boarder was included in the
sample; if it contained one of the boarder's children, only the
child was taken.
When there was no head or head equivalent present in a census family,
all individuals in the unit were considered one-person households
to be sampled individually, irrespective of the size of the group.
This situation was characteristic of trailing household fragments,
parts of census families that were initially missed by enumerators
and later entered on enumerator forms, usually at the end of the
enumeration district. Typically, this situation occurred when lodgers
or boarders who were absent from the census family on the first
canvass were enumerated on a subsequent canvass of the district.
They were not integrated into the body of the household by the census
enumerators but were listed separately. The PUS treated individuals
in these trailing household fragments as one-person households,
in part because there was often too little information to link these
individuals back to the rest of the household.
A decision on whether an examination line was a legitimate sample
point was made as the data entry operators answered questions flashed
on their computer monitors about the size and characteristic of
the relevant household. The logic employed can be summarized in
the following way:
I. Is there a HEAD or HEAD EQUIVALENT present?
A. If no, enter the individual
B. If yes, does this dwelling contain 21 or more persons unrelated
to the head?
1. If no, is this person a head of a census family or head equivalent?
a. If yes, enter the entire census family
b. If no, proceed to next examination point.
2. If yes, is this person a head or head equivalent
a. If yes, enter the individual ant his or her related
family members.
b. If no, is this person a relative of the head?
1. If yes, proceed to the next examination point.
2. If no, enter the individual as a one-person household.
An even more complete description of the creation of the 1910 1-in-250
national sample can be found in the dataset's original codebook.
Sample design for the black oversample
(SAMP1910 = 6)
The 1910 Black oversample followed the same general sampling strategy
as the 1-in-250 national sample, except that households were only
included if the individual on the sample point was an African-American
household head. In order to reduce the costs of searching for African-American
heads of household in areas having a very small Black population,
the oversample was drawn only from counties where at least 10 percent
of the population was African American (negro, black or mulatto),
and only from states where a reasonably large number of counties
had this proportion of African Americans. In addition, states with
large concentrations of Blacks (Alabama, Georgia, South Carolina,
Mississippi) were not oversampled, because existing samples provided
ample cases for study. As a result, the Black oversample sampled
a total of 467 counties in 9 states.
More detailed information about the 1910 Black oversample is available via the Black Oversample User's Guide and via Gutmann and Ewbank's Historical Methods article on the topic.
Sample design for the Hispanic oversample
(SAMP1910 = 7)
The Hispanic Oversample was created by a team headed by Myron
Gutmann and Steven Ruggles at the Universities of Texas and Minnesota
between 1995 and 1998. The overall sample was designed to study
those of Mexican, Spanish, and Cuban origin in the United States
in 1910, as well as a significant number of their non-Hispanic neighbors.
Samples were drawn from 57 counties in six states: Arizona, California,
Florida, Kansas, New Mexico, and Texas. In the two Florida counties,
the sample design selected half of the households headed by Hispanics;
in all the other counties, the sample design produced a twenty percent
sample of households headed by Hispanics in each county. The overall
sample includes roughly ten percent of the 845,000 Hispanics living
in the United States in 1910 (Gutmann, Frisbie, and Blanchard 1999).
In addition, one household not headed by a Hispanic was selected
for every two Hispanic-headed households in the sample counties,
except where the non-Hispanic population was so small as to render
this selection impossible.
Defining an individual as Hispanic is one of the complexities involved
in the creation of the Hispanic oversample. For the purposes of
sample creation, heads of household were defined as Hispanic if
they were born in Mexico, Cuba, or Spain, if one of their parents
was born in one of those countries, if they spoke Spanish, if they
had a name that was judged to be Hispanic by the data entry operator,
or if the census enumerator in 1910 enumerated their race as "Mexican."
The HISPRUL2 variable indicates why each household was selected
for inclusion in the Hispanic oversample.
More detailed information about the 1910 Hispanic oversample is available via the Hispanic Oversample User's Guide and via Gutmann and Ewbank's Historical Methods article on the topic.
Go Back to Sampling Procedures Index
|