|
|
1850 Sampling Procedures1
Go Back to Sampling Procedures Index
SAMPLE UNITS
All individuals in the 1850 census were assigned to a "family," a term
that the census defined more broadly than it does today. A family was an
individual or group of individuals who jointly occupied a dwelling place
or part of a dwelling place. Census instructions defined dwelling places
as any occupied structure; they included both wigwams and tenement houses.
The term family was defined as either (l) one person living separately
in a house, or a part of a house, and providing for him or herself; or
(2) several persons living together in a house, or in part of a house,
upon one common means of support, and separately from others in similar
circumstances. Thus, a widow living alone and separately providing for
herself, or 200 individuals living together and provided for by a common
head, were each considered to be one family. The resident inmates of a
hotel, jail, garrison, hospital, asylum, or other similar institutions
were also considered one family.
The analytic power of the public use microdata samples derives in large
measure from their hierarchical organization: they are simultaneously samples
of households (or families) and of individuals, and within households the
relationships among individuals are Known. This complex structure allows
the creation of an almost limitless number of variables. Sampling units,
however, have varied among the samples. The public use microdata samples
for the censuses of 1940 through 1980 are samples of households, those
for 1900 and 1910 are samples of "families," and the public use microdata
sample for 1880 is a sample of dwellings.
The 1850 Public Use Microdata Sample follows the precedent set by the
1880 sample: it is also a sample of dwellings. This sample strategy adds
another level of hierarchy compared to a sample of families. There are
several advantages to sampling at the level of dwellings. First, there
are definitional differences between the nineteenth century family and
the household and group quarters of the post-1940 period. Families were
distinguished by a common means of support and separate residence; although
the exact definition of a household has varied, households and group quarters
generally have required either complete cooking facilities or a separate
entrance. It is likely, therefore, that some households or group quarters
under current census definitions were considered to be two or more families
in the late nineteenth century. The nineteenth-century definition of dwelling,
on the other hand, is clearly broader than the current definition of household
or group quarters. By providing information at the level of both dwellings
and families, we maximize the potential for consistent comparisons.
As well as maximizing comparability, sampling by dwellings provides
additional information that would not otherwise be available. The sample
indicates that over 10 percent of the total population resided in multi-family
dwellings. The high frequency of such living arrangements makes them worthy
of study in their own right. Multi-household dwellings can be identified
by means of the variable NUMHH in the 1850-1930 census samples. Even if analysis is carried out at the
level of the family, it may be useful to incorporate some variables constructed
from the characteristics of the dwelling as a whole. For example, analysis
of surnames allows identification of kin who resided in the same dwelling
but in different families, a pattern that seems to have been common in
nineteenth-century cities.
The chief liability of sampling by dwellings instead of families is
that it reduces the number of independent observations in the file. Since
census microdata files are cluster samples (ordinarily clustered by household),
standard errors depend on both the number of clusters and on the homogeneity
of variables within clusters. Calculation of standard errors for samples
of this type is quite complicated (U.S. Bureau of the Census 1972; Kish
1965). In the worst case, with perfect homogeneity within clusters, the
standard errors for variables would be inversely proportional to the square
root of the number of clusters rather than the number of individuals. Even
for variables that are not very homogeneous within clusters, such as age,
there is some loss of precision when the total number of clusters is reduced.
However. the increase of error is small.
The public use microdata samples for 1880, 1900, and 1910 substituted
the modern census term "household" for the contemporary term "family."
To avoid confusion, this documentation follows that precedent.
SAMPLE DESIGN
The manuscript census of the free population in 1850 consists of about
560,000 enumeration pages with 42 persons per page. These records are contained
on 976 reels of microfilm. Each reel contains the census pages for several
enumeration districts.
Our sampling strategy involved randomly generating one sample point,
on average, for every hundred persons in the census. To ensure that dwellings
had equal probability of being included in the sample regardless of their
size, they were only entered if the randomly generated sample point fell
on the line containing the first person in the dwelling. When the sample
point fell on any other dwelling member, the dwelling was skipped. For
example, if the sample point fell within a dwelling with 5 members, there
was only a 1 in 5 chance that the dwelling was included in the sample,
but if it was taken, all five members were entered. Under this procedure
each dwelling, family, and individual in the population had a 1 in 100
probability of Inclusion.
We modified this procedure for persons residing in institutions and
large group quarters. The previous public use samples incorporated a variety
of sampling strategies for handling such cases. In general, members of
large units were sampled on an individual basis, simply by treating each
member as if they lived on their own one-person household. This procedure
increased the efficiency of the sample by raising the number of observations
while maintaining representativeness.
Unfortunately, the criteria for designating units to be sampled on an
individual basis have varied, making the samples incompatible for some
applications. In the 1980 Public Use Microdata Sample, all units with 9
or more members unrelated to the householder were classified as group quarters
and members of group quarters were sampled on an individual basis (U.S.
Bureau of the Census 1982). For the public use microdata samples of the
period 1940Ä1970, the procedure was similar, except that units with
5 or more secondary individuals or secondary family members were classified
as group quarters and sampled individually (U.S. Bureau of the Census 1972,
1984a, 1984b). In the 1910 sample up to 20 members of a family could be
unrelated to the head before the members were sampled at the individual
level (Strong 1988). The higher threshold for individual level sampling
in 1910 allows detailed study of the small boarding houses that were characteristic
of the period; once again, however, there is a tradeoff between sampling
error and the richness of the data. In the case of the 1900 data file,
all boarders, lodgers, and the institutionalized were sampled as individuals
or as secondary families, a strategy that maximizes precision at considerable
cost in terms of lost information (Graham 1980). For example, the 1900
system makes it impossible to create a precise analog of the group quarters
concept used in recent census years.
To ensure definitional comparability of the 1850 sample with all existing
public use microdata samples, the number of persons allowed before sampling
the unit at the individual level had to be at least as large as in the
1910 sample. We followed the precedent established by the 1880 Public Use
Microdata Sample and expanded the threshold to 31, which allows study of
many boarding houses as intact units. The following set of inclusion rules
assured compatibility with the sample designs of the previous public use
microdata samples, while at the same time enriching the data. These rules
result in equal probabilities of inclusion, regardless of dwelling size,
family size, or the number of coresident relatives.
1. If the dwelling contains 30 or fewer residents:
a) accept the entire dwelling if the sample point falls on the first
listed individual in the dwelling.
b) reject the entire dwelling if the sample point falls on any other
dwelling resident.
2. If the dwelling contains 31 or more residents and the family contains
30 or fewer persons:
a) accept the entire family if the sample point falls on the family
head; also enter data on overall dwelling size and the number of families
in the dwelling. b) reject the entire family if the sample point falls
on any other family member.
3. If the dwelling contains 31 or more residents and the family contains
31 or more persons and the sample point falls within any group of related
persons within the family (in 1960 census usage, within a primary or secondary
family):
a) accept the group of related persons if the sample point falls on
the first listed individual within the related group; also enter data on
overall dwelling size, family size, and the number of families in the dwelling.
b) reject the entire related group if the sample point falls on any
other member of the related group.
4. If the dwelling contains 31 or more residents and the family contains
31 or more persons and the sample point falls on an individual with no
relatives in the family:
a) accept the individual; also enter data on overall dwelling size,
family size, and the number of families in the dwelling.
5. Fragments were also identified. Fragments are individuals or groups of individuals who were enumerated separately from their household or group quarters. Most often this ocurred when, at the end of a district, and enumerator added the names of individuals who had been missed. For these individuals, the unit's status as household versus group quarters cannot be classified.
These sampling rules may seem complex, but their implementation is straightforward.
All but a few percent of the cases fell under the first rule. The second
rule comes into play mostly in the case of large tenement houses of the
Eastern cities. The third and fourth rules apply for institutions, military
barracks, hotels, dormitories, and the like. In some cases, we are unable
to determine the breaks between dwellings because the marshal failed to
provide dwelling numbers. We then sampled at the level of the family, using
rules 2 through 4. Most of these cases were probably single-family dwellings,
but their dwelling size was coded as missing. The variable SAMPRULE indicates
which sampling rule was employed for each case, distinguishing those cases
in which sampling was carried out at the family level because of missing
dwelling numbers.
ENDNOTES
-
Adapted from the following text: Steven Ruggles and Russell R. Menard, Public Use
Microdata Sample of the 1850 United States Census of Population: User's
Guide and Technical Documentation, Minneapolis: Social History Research
Laboratory, 1995, pp. 7-10.
Go Back to Sampling Procedures Index
|