All individuals in the 1850 census were assigned to a "family," a term that the census defined more broadly than it does today. A family was an individual or group of individuals who jointly occupied a dwelling place or part of a dwelling place. Census instructions defined dwelling places as any occupied structure; they included both wigwams and tenement houses. The term family was defined as either (l) one person living separately in a house, or a part of a house, and providing for him or herself; or (2) several persons living together in a house, or in part of a house, upon one common means of support, and separately from others in similar circumstances. Thus, a widow living alone and separately providing for herself, or 200 individuals living together and provided for by a common head, were each considered to be one family. The resident inmates of a hotel, jail, garrison, hospital, asylum, or other similar institutions were also considered one family.

The analytic power of the public use microdata samples derives in large measure from their hierarchical organization: they are simultaneously samples of households (or families) and of individuals, and within households the relationships among individuals are Known. This complex structure allows the creation of an almost limitless number of variables. Sampling units, however, have varied among the samples. The public use microdata samples for the censuses of 1940 through 1980 are samples of households, those for 1900 and 1910 are samples of "families," and the public use microdata sample for 1880 is a sample of dwellings.

The 1850 Public Use Microdata Sample follows the precedent set by the 1880 sample: it is also a sample of dwellings. This sample strategy adds another level of hierarchy compared to a sample of families. There are several advantages to sampling at the level of dwellings. First, there are definitional differences between the nineteenth century family and the household and group quarters of the post-1940 period. Families were distinguished by a common means of support and separate residence; although the exact definition of a household has varied, households and group quarters generally have required either complete cooking facilities or a separate entrance. It is likely, therefore, that some households or group quarters under current census definitions were considered to be two or more families in the late nineteenth century. The nineteenth-century definition of dwelling, on the other hand, is clearly broader than the current definition of household or group quarters. By providing information at the level of both dwellings and families, we maximize the potential for consistent comparisons.

As well as maximizing comparability, sampling by dwellings provides additional information that would not otherwise be available. The sample indicates that over 10 percent of the total population resided in multi-family dwellings. The high frequency of such living arrangements makes them worthy of study in their own right. Multi-household dwellings can be identified by means of the variable NUMHH in the 1850-1930 census samples. Even if analysis is carried out at the level of the family, it may be useful to incorporate some variables constructed from the characteristics of the dwelling as a whole. For example, analysis of surnames allows identification of kin who resided in the same dwelling but in different families, a pattern that seems to have been common in nineteenth-century cities.

The chief liability of sampling by dwellings instead of families is that it reduces the number of independent observations in the file. Since census microdata files are cluster samples (ordinarily clustered by household), standard errors depend on both the number of clusters and on the homogeneity of variables within clusters. Calculation of standard errors for samples of this type is quite complicated (U.S. Bureau of the Census 1972; Kish 1965). In the worst case, with perfect homogeneity within clusters, the standard errors for variables would be inversely proportional to the square root of the number of clusters rather than the number of individuals. Even for variables that are not very homogeneous within clusters, such as age, there is some loss of precision when the total number of clusters is reduced. However. the increase of error is small.

The public use microdata samples for 1880, 1900, and 1910 substituted the modern census term "household" for the contemporary term "family." To avoid confusion, this documentation follows that precedent.


The manuscript census of the free population in 1850 consists of about 560,000 enumeration pages with 42 persons per page. These records are contained on 976 reels of microfilm. Each reel contains the census pages for several enumeration districts.

Our sampling strategy involved randomly generating one sample point, on average, for every hundred persons in the census. To ensure that dwellings had equal probability of being included in the sample regardless of their size, they were only entered if the randomly generated sample point fell on the line containing the first person in the dwelling. When the sample point fell on any other dwelling member, the dwelling was skipped. For example, if the sample point fell within a dwelling with 5 members, there was only a 1 in 5 chance that the dwelling was included in the sample, but if it was taken, all five members were entered. Under this procedure each dwelling, family, and individual in the population had a 1 in 100 probability of Inclusion.

We modified this procedure for persons residing in institutions and large group quarters. The previous public use samples incorporated a variety of sampling strategies for handling such cases. In general, members of large units were sampled on an individual basis, simply by treating each member as if they lived on their own one-person household. This procedure increased the efficiency of the sample by raising the number of observations while maintaining representativeness.

Unfortunately, the criteria for designating units to be sampled on an individual basis have varied, making the samples incompatible for some applications. In the 1980 Public Use Microdata Sample, all units with 9 or more members unrelated to the householder were classified as group quarters and members of group quarters were sampled on an individual basis (U.S. Bureau of the Census 1982). For the public use microdata samples of the period 1940-1970, the procedure was similar, except that units with 5 or more secondary individuals or secondary family members were classified as group quarters and sampled individually (U.S. Bureau of the Census 1972, 1984a, 1984b). In the 1910 sample up to 20 members of a family could be unrelated to the head before the members were sampled at the individual level (Strong 1988). The higher threshold for individual level sampling in 1910 allows detailed study of the small boarding houses that were characteristic of the period; once again, however, there is a tradeoff between sampling error and the richness of the data. In the case of the 1900 data file, all boarders, lodgers, and the institutionalized were sampled as individuals or as secondary families, a strategy that maximizes precision at considerable cost in terms of lost information (Graham 1980). For example, the 1900 system makes it impossible to create a precise analog of the group quarters concept used in recent census years.

To ensure definitional comparability of the 1850 sample with all existing public use microdata samples, the number of persons allowed before sampling the unit at the individual level had to be at least as large as in the 1910 sample. We followed the precedent established by the 1880 Public Use Microdata Sample and expanded the threshold to 31, which allows study of many boarding houses as intact units. The following set of inclusion rules assured compatibility with the sample designs of the previous public use microdata samples, while at the same time enriching the data. These rules result in equal probabilities of inclusion, regardless of dwelling size, family size, or the number of coresident relatives.

  1. If the dwelling contains 30 or fewer residents:
    1. accept the entire dwelling if the sample point falls on the first listed individual in the dwelling.
    2. reject the entire dwelling if the sample point falls on any other dwelling resident.
  2. If the dwelling contains 31 or more residents and the family contains 30 or fewer persons:
    1. accept the entire family if the sample point falls on the family head; also enter data on overall dwelling size and the number of families in the dwelling.
    2. reject the entire family if the sample point falls on any other family member.
  3. If the dwelling contains 31 or more residents and the family contains 31 or more persons and the sample point falls within any group of related persons within the family (in 1960 census usage, within a primary or secondary family):
    1. accept the group of related persons if the sample point falls on the first listed individual within the related group; also enter data on overall dwelling size, family size, and the number of families in the dwelling.
    2. reject the entire related group if the sample point falls on any other member of the related group.
  4. If the dwelling contains 31 or more residents and the family contains 31 or more persons and the sample point falls on an individual with no relatives in the family:
    1. accept the individual; also enter data on overall dwelling size, family size, and the number of families in the dwelling.
  5. Fragments were also identified. Fragments are individuals or groups of individuals who were enumerated separately from their household or group quarters. Most often this ocurred when, at the end of a district, and enumerator added the names of individuals who had been missed. For these individuals, the unit's status as household versus group quarters cannot be classified.

These sampling rules may seem complex, but their implementation is straightforward. All but a few percent of the cases fell under the first rule. The second rule comes into play mostly in the case of large tenement houses of the Eastern cities. The third and fourth rules apply for institutions, military barracks, hotels, dormitories, and the like. In some cases, we are unable to determine the breaks between dwellings because the marshal failed to provide dwelling numbers. We then sampled at the level of the family, using rules 2 through 4. Most of these cases were probably single-family dwellings, but their dwelling size was coded as missing. The variable SAMPRULE indicates which sampling rule was employed for each case, distinguishing those cases in which sampling was carried out at the family level because of missing dwelling numbers.


