1910 Sampling Procedures

Go Back to Sampling Procedures Index

The 1910 IPUMS samples drew from seven component samples. This document describes the sampling strategy employed in each of the component sample.

The 1910 1% sample consists of all cases from the 1-in-100 national sample and 1-in-20 cases from the 20% samples of Alaska, Hawaii, and the American Indian schedules.

The 1910 1.4% sample with oversamples consists of all cases from all seven component samples listed below.

1-in-100 national sample
1-in-5 Alaskan sample
1-in-5 Hawaiian sample
1-in-5 sample of the American Indian schedules
1-in-250 national sample
Black oversample of selected counties
Hispanic oversample of selected counties

Sample design for the 1-in-100 national sample (SAMP1910 = 1)

The sample was drawn systematically from each of the 1,784 microfilm reels of the 1910 census, ordinarily at intervals of five pages. On each selected census page, one line was randomly selected and designated as the sample point. Any valid sample unit beginning at the sample point or within four subsequent lines was included in the sample. In the instance that a sample unit (household or dwelling) ends before the fourth subsequent line from the randomly selected sample point, to correct for undersampling of small households, the subsequent sample units up to the fourth subsequent line are also included.

For example, if the randomly selected sample point is the 2nd person in a 4 person sample unit and a 2 person sample unit begins on the fourth subsequent line, the 2 person sample unit is also included in the sample. This yields a 1-in-100 sample with equal probabilities of inclusion for all individuals and households. Valid sample units are defined as follows:

Dwellings: structures containing fewer than 31 residents, with or without multiple families.
Households: census families with fewer than 31 members in dwellings containing 31+ residents.
Related groups in large units: groups related by blood or marriage in census families with 31+ members. Family relationships are inferred from relationship to head information and surnames.
Individuals in large units: unrelated individuals in census families with 31+ members.

The sample also includes 3023 records that have a PERWT of 0. These records are part of fragmentary households. Some members of these households were located in sample windows and were originally sampled as household fragments (see SAMPRULE). These records received a non-zero PERWT. However, other members of these households were enumerated elsewhere on the original manuscripts (i.e., not contiguous with the sample window). When possible, we located these individuals and reunited them with the remainder of their household and assigned a PERWT of 0. Adding these records was useful in order to construct household level variables that require information about all members of a given household.

Sample design for the 1-in-5 samples for Alaska, Hawaii, and American Indians (SAMP1910 = 2, 3, and 4)

The sample design followed the same rules as the 1-in-100 national sample described above, with the following exceptions:

The schedules for Alaska and Hawaii have 50 lines per page number, numbered 1-25 on the A side and 26-50 on the B side. Five-line sample windows were randomly generated for every side of the census page.

The schedules used to enumerate the American Indian population had 40 line per page. Eight-line sample windows were randomly generated for every side of the census page.

Sample design for the 1-in-250 national sample (SAMP1910 = 5)

The 1-in-250 national sample was created at the University of Pennsylvania in the 1980s. The sample was comprised of two separately-drawn 1-in-500 samples. The first was funded by the National Institute of Child Health and Human Development, and the second was funded by the National Science Foundation. Households that appeared in the first 1-in-500 sample were eligible for inclusion in the second.

For each of the 1-in-500 samples, one sample point was chosen for every 5 pages. Since there were 100 lines per page, a sample point was chosen for every 500 lines. If the examination line was blank or if it was impossible to read, data entry operators rejected the point, after recording it as blank on their entry file, and moved on to the next examination line.

The 1910 1-in-250 national sample is a household sample. The creators' definition of a household differed from that of the census family, the Census Bureau's equivalent unit in 1910. A census family was "a group of persons living together in the same dwelling place." According to the instructions to the census enumerators, the people constituting a census family need not be related by ties of kinship if they lived together forming one household. Thus, all occupants and employees of an hotel, boarding house, or lodging house were considered to be a census family if that was their usual place or abode; the same was true for officials and inmates of an institution if they slept in the same building or group of buildings. A census family could also be a single individual if the person lived alone.

Except in situations where there was no head of household, head equivalent (e.g., superintendent of an orphanage), or where there were more than 20 individuals unrelated to the head, the census family and the 1910 PUS household are the same. Where there was no head or head equivalent, each of the individuals in the group was treated as a single-person household. Similarly, a census family that included more than 20 non-family members was viewed as a collection of several households, one of which is the group of individuals related to the head, and the others of which are individual non-family members. Non-family members were identified through their relation-to-head-of-household code. Thus, for example, a boarder in a large family-run boarding house was considered a single-person household, whereas the original census regarded that individual as a member of the large census family made up of people related to the proprietor and all the boarders. This distinction helped to avoid including in the sample all of the individuals in an especially large group quarters or institution, some of which housed many hundreds of people.

Census families with 20 or fewer non-family members and a head or head-equivalent were included in the PUS sample as a household only if the head or head equivalent (in most cases, the first person enumerated in the census family) appeared on the examination line. When that happened the entire census family, including all non-family members (e.g., companions, boarders, servants, hired hands), was entered into the sample and became a PUS household. If any other members of this type if census family appeared on the examination line, the census family was rejected and the data entry operators proceeded to the next examination line. To illustrate, if the individual on the examination line were the son of the head in a six-member census family, the census family was passed over. If, however, the head was on the examination line, the data entry operator entered all information for the head and the five other members of the census family, located on the lines immediately below the head.

A census family with 21 or more non-family members and a head or a head equivalent was regarded as a collection of smaller households. One of the households contained all family members related to the head. This household was entered into the PUS sample if the head appeared on the examination line; non-family members were excluded. If a relative of the head appeared on the examination line, the household was excluded from the sample.

Each of the non-family members in such a census family was considered a separate, one-person, household and was included in sample if he or she appeared on the examination line. Only that person was taken, even if he or she had relatives within the census family. For example, if the examination line contained a male boarder with a wife and two children, only the male boarder was included in the sample; if it contained one of the boarder's children, only the child was taken.

When there was no head or head equivalent present in a census family, all individuals in the unit were considered one-person households to be sampled individually, irrespective of the size of the group. This situation was characteristic of trailing household fragments, parts of census families that were initially missed by enumerators and later entered on enumerator forms, usually at the end of the enumeration district. Typically, this situation occurred when lodgers or boarders who were absent from the census family on the first canvass were enumerated on a subsequent canvass of the district. They were not integrated into the body of the household by the census enumerators but were listed separately. The PUS treated individuals in these trailing household fragments as one-person households, in part because there was often too little information to link these individuals back to the rest of the household.

A decision on whether an examination line was a legitimate sample point was made as the data entry operators answered questions flashed on their computer monitors about the size and characteristic of the relevant household. The logic employed can be summarized in the following way:

Is there a HEAD or HEAD EQUIVALENT present?
1. If no, enter the individual
2. If yes, does this dwelling contain 21 or more persons unrelated to the head?
  1. If no, is this person a head of a census family or head equivalent?
    1. If yes, enter the entire census family
    2. If no, proceed to next examination point.
  2. If yes, is this person a head or head equivalent
    1. If yes, enter the individual ant his or her related family members.
    2. If no, is this person a relative of the head?
      1. If yes, proceed to the next examination point.
      2. If no, enter the individual as a one-person household.

An even more complete description of the creation of the 1910 1-in-250 national sample can be found in the dataset's original codebook.

Sample design for the black oversample (SAMP1910 = 6)

The 1910 Black oversample followed the same general sampling strategy as the 1-in-250 national sample, except that households were only included if the individual on the sample point was an African-American household head. In order to reduce the costs of searching for African-American heads of household in areas having a very small Black population, the oversample was drawn only from counties where at least 10 percent of the population was African American (negro, black or mulatto), and only from states where a reasonably large number of counties had this proportion of African Americans. In addition, states with large concentrations of Blacks (Alabama, Georgia, South Carolina, Mississippi) were not oversampled, because existing samples provided ample cases for study. As a result, the Black oversample sampled a total of 467 counties in 9 states.

More detailed information about the 1910 Black oversample is available via the Black Oversample User's Guide and via Gutmann and Ewbank's Historical Methods article on the topic.

Sample design for the Hispanic oversample (SAMP1910 = 7)

The Hispanic Oversample was created by a team headed by Myron Gutmann and Steven Ruggles at the Universities of Texas and Minnesota between 1995 and 1998. The overall sample was designed to study those of Mexican, Spanish, and Cuban origin in the United States in 1910, as well as a significant number of their non-Hispanic neighbors.

Samples were drawn from 57 counties in six states: Arizona, California, Florida, Kansas, New Mexico, and Texas. In the two Florida counties, the sample design selected half of the households headed by Hispanics; in all the other counties, the sample design produced a twenty percent sample of households headed by Hispanics in each county. The overall sample includes roughly ten percent of the 845,000 Hispanics living in the United States in 1910 (Gutmann, Frisbie, and Blanchard 1999). In addition, one household not headed by a Hispanic was selected for every two Hispanic-headed households in the sample counties, except where the non-Hispanic population was so small as to render this selection impossible.

Defining an individual as Hispanic is one of the complexities involved in the creation of the Hispanic oversample. For the purposes of sample creation, heads of household were defined as Hispanic if they were born in Mexico, Cuba, or Spain, if one of their parents was born in one of those countries, if they spoke Spanish, if they had a name that was judged to be Hispanic by the data entry operator, or if the census enumerator in 1910 enumerated their race as "Mexican." The HISPRUL2 variable indicates why each household was selected for inclusion in the Hispanic oversample.

More detailed information about the 1910 Hispanic oversample is available via the Hispanic Oversample User's Guide and via Gutmann and Ewbank's Historical Methods article on the topic.