1880 Sampling Procedures1
All individuals in the 1880 census were assigned to a "family," a term that the census defined more broadly than it does today. A family was an individual or group of individuals who jointly occupied a dwelling place or part of a dwelling place. Census instructions defined dwelling places as any occupied structure; they included both wigwams and tenement houses. Within dwelling places, the number of separate families was generally determined by the number of separate eating tables. However, there were exceptions to this criterion. All the permanent occupants of hotels, institutions, and military barracks constituted single families, provided they slept in the same building. Census enumerators likewise counted boarders, lodgers, and servants as part of the family occupying the dwelling place where they slept, regardless of their eating arrangements.
The analytic power of the public use microdata samples derives in large measure from their hierarchical organization: they are simultaneously samples of households (or families) and of individuals, and within households the relationships among individuals are known. This complex structure allows the creation of an almost limitless number of variables. The public use microdata samples for the censuses of 1940 through 1980 are samples of households, and those for 1900 and 1910 are samples of "families."
We have added another level of hierarchy by creating a sample of dwellings rather than a sample of families. There are several advantages to sampling at the level of dwellings. First, there is the matter of definitional differences between the late-nineteenth century family and the household and group quarters of the post-1940 period. Families were distinguished by a separate eating table; although the exact definition of a household has varied, households and group quarters generally have required either complete cooking facilities or a separate entrance. It is likely, therefore, that some households or group quarters under current census definitions would have been considered to be two or more families in the late nineteenth century. The nineteenth-century definition of dwelling, on the other hand, is clearly broader than the current definition of household or group quarters. By providing information at the level of both dwellings and families, we maximize the potential for consistent comparisons.
As well as maximizing comparability, sampling by dwellings provides additional information that would not otherwise be available. The sample indicates that some 20 percent of the total population resided in multifamily dwellings. The high frequency of such living arrangements makes them worthy of study in their own right. Even if analysis is carried out at the level of the family, it may be useful to incorporate some variables constructed from the characteristics of the dwelling as a whole. For example, analysis of surnames allows identification of kin who resided in the same dwelling but in different families. a pattern that seems to have been common in nineteenth-century cities.
The chief liability of sampling by dwellings instead of families is that it reduces the number of independent observations in the file. Since census microdata files are cluster samples (ordinarily clustered by household), standard errors depend on both the number of clusters and on the homogeneity of variables within clusters. Calculation of standard errors for samples of this type is quite complicated (U.S. Bureau of the Census 1972; Kish 1965). In the worst case, with perfect homogeneity within clusters, the standard errors for variables would be inversely proportional to the square root of the number of clusters rather than the number of individuals. Even for variables that are not very homogeneous within clusters, such as age, there is some loss of precision when the total number of clusters is reduced. However, the increase of error is small.
The public use microdata samples for 1900 and 1910 substituted the modern census term "household" for the contemporary term "family." To avoid confusion, this documentation follows that precedent.
The manuscript census for 1880 consists of about 1.2 million enumeration pages, with 50 persons per page. These records are contained on 1,454 reels of microfilm. Each reel contains the census pages for several enumeration districts.
Our sampling strategy was based on the census page. Each pair of census pages contains 100 persons; since our sample density is 1 in 100, we needed an average of one person every two pages. We therefore randomly generated one sample point for every two pages. To ensure that dwellings had an equal probability of being included in the sample regardless of their size, they were only entered if the sample point fell on the line containing the first person in the dwelling. When the sample point fell on any other dwelling member, the dwelling was skipped. Accordingly, for example, if the sample point fell within a dwelling with 5 members, there was only a 1 in 5 chance that the dwelling would be included in the sample, but if it was included, all five members were entered. Under this procedure each dwelling, family, and individual in the population had a 1 in 100 probability of inclusion.
We modified this procedure for persons residing in institutions and large group quarters. The previous public use microdata samples incorporated a variety of sampling strategies for handling such cases. In general, members of large units have been sampled on an individual basis, simply by treating each member as if they lived in their own one-person household. This procedure increases the efficiency of the sample by raising the number of observations while at the same time maintaining representativeness .
Unfortunately, since the criteria for designating units to be sampled on an individual basis have varied, the samples are incompatible for some applications. In the 1980 Public Use Microdata Sample, all units with 9 or more members unrelated to the householder were classified as group quarters, and members of group quarters were sampled on an individual basis (U.S. Bureau of the Census 1982). For the public use microdata samples of the period 1940-1970, the procedure was similar, except that units with 5 or more secondary individuals or secondary family members were classified as group quarters and sampled individually (U.S. Bureau of the Census 1972, 1984a, 1984b). In the 1910 sample up to 20 members of a family could be unrelated to the head before the members were sampled at the individual level (Strong 1988). The higher threshold for individual level sampling in 1910 allows detailed study of the small boarding houses that were characteristic of the period; once again, however, there is a tradeoff between sampling error and the richness of the data. In the case of the 1900 data file, all boarders, lodgers, and the institutionalized were sampled as individuals or as secondary families, a strategy that maximizes precision at considerable cost in terms of lost information (Graham 1980). For example, the 1900 system makes it impossible to create a precise analog of the group quarters concept used in recent census years.
To ensure definitional comparability of the 1880 sample with all existing public use microdata samples, the number of persons allowed before sampling the unit at the individual level had to be at least as large as in the 1910 sample. We decided to expand the threshold to 30, which allows study of many boarding houses as intact units. The following set of inclusion rules assured compatibility with the sample designs of the previous public use microdata samples, while at the same time enriching the data. These rules result in equal probabilities of inclusion, regardless of dwelling size, family size, or the number of coresident relatives.
- If the dwelling contains 30 or fewer residents:
- accept the entire dwelling if the sample point falls on the first listed individual in the dwelling.
- reject the entire dwelling if the sample point falls on any other dwelling resident.
- If the dwelling contains 31 or more residents and the family contains 30 or fewer persons:
- accept the entire family if the sample point falls on the family head. Also enter data on overall dwelling size and the number of families in the dwelling.
- reject the entire family if the sample point falls on any other family member.
- If the dwelling contains 31 or more residents and the family contains 31 or more persons and the sample point falls within any group of related persons within the family (in 1960 census usage, within a primary or secondary family):
- accept the group of related persons if the sample point falls on the first listed individual within the related group. Also enter data on overall dwelling size, family size, and the number of families in the dwelling.
- reject the entire related group if the sample point falls on any other member of the related group.
- If the dwelling contains 31 or more residents and the family contains 31 or more persons and the sample point falls on an individual with no relatives in the family:
- accept the individual. Also enter data on overall dwelling size, family size, and the number of families in the dwelling.
- Fragments were also identified. Fragments are individuals or groups of individuals who were enumerated separately from their household or group quarters. Most often this occurred when, at the end of a district, and enumerator added the names of individuals who had been missed. For these individuals, the unit's status as household versus group quarters cannot be classified.
This set of sampling rules may seem excessively complex, but their implementation is straightforward. All but a few percent of the cases fall under the first rule. The second rule for the most part comes into play in the case of the large tenement houses of the Eastern cities. The third and fourth rules apply for institutions, military barracks, hotels, dormitories, and the like. In some cases, we were unable to determine the breaks between dwellings, because the enumerator failed to provide dwelling numbers. We then sampled at the level of the family, using rules 2 through 4. Most of these cases were probably single-family dwellings, but their dwelling size was coded as missing. The variable SAMPRULE indicates which sampling rule was employed for each case, distinguishing those cases in which sampling was carried out at the family level because of missing dwelling numbers.
Divisions among families within dwellings were determined both by the family relationship codes and family numbers. Enumerators were instructed to write a new family number at the beginning of each family, and to assess family relationships from the perspective of the first person, or head, of each family. Occasionally, however, the two fields provide conflicting information; either there is a new head of household with no new family number or there are continuous family realtionships with a new family number. In such cases, we generally used family relationships to determine the breaks between households, except in cases where we judged the family relationship to be unreliable. When two apparently unrealted families, as determined by the family relationship variable, RELATE, appeared in the same household, the unit was divided into two households. These are cases in which the enumerator neglected to write a new household number; these errors were edited by hand. Similarly, there were cases where two households in a dwelling were merged into one. Some enumerators were overzealous when it came to identifying separate households, and split every dwelling into multiple families. We adopted the policy of letting the relationship field take priority over the family numbers. In cases where the first person in the second family had a valid relationship to the head of the first family (daughter, servant, etc.) the second family number was replaced with that of the prior family.
- Adapted from the following text: Steven Ruggles and Russell R. Menard, Public Use Microdata Sample of the 1880 United States Census of Population: User's Guide and Technical Documentation, Minneapolis: Social History Research Laboratory, 1995, pp. 4-7.