|
|
1880
Sampling Procedures1
Go Back to Sampling Procedures Index
SAMPLING UNITS
All individuals in the 1880 census were assigned to a "family," a term
that the census defined more broadly than it does today. A family was an
individual or group of individuals who jointly occupied a dwelling place
or part of a dwelling place. Census instructions defined dwelling places
as any occupied structure; they included both wigwams and tenement houses.
Within dwelling places, the number of separate families was generally determined
by the number of separate eating tables. However, there were exceptions
to this criterion. All the permanent occupants of hotels, institutions,
and military barracks constituted single families, provided they slept
in the same building. Census enumerators likewise counted boarders, lodgers,
and servants as part of the family occupying the dwelling place where they
slept, regardless of their eating arrangements.
The analytic power of the public use microdata samples derives in large
measure from their hierarchical organization: they are simultaneously samples
of households (or families) and of individuals, and within households the
relationships among individuals are known. This complex structure allows
the creation of an almost limitless number of variables. The public use
microdata samples for the censuses of 1940 through 1980 are samples of
households, and those for 1900 and 1910 are samples of "families."
We have added another level of hierarchy by creating a sample of dwellings
rather than a sample of families. There are several advantages to sampling
at the level of dwellings. First, there is the matter of definitional differences
between the late-nineteenth century family and the household and group
quarters of the post-1940 period. Families were distinguished by a separate
eating table; although the exact definition of a household has varied,
households and group quarters generally have required either complete cooking
facilities or a separate entrance. It is likely, therefore, that some households
or group quarters under current census definitions would have been considered
to be two or more families in the late nineteenth century. The nineteenth-century
definition of dwelling, on the other hand, is clearly broader than the
current definition of household or group quarters. By providing information
at the level of both dwellings and families, we maximize the potential
for consistent comparisons.
As well as maximizing comparability, sampling by dwellings provides
additional information that would not otherwise be available. The sample
indicates that some 20 percent of the total population resided in multifamily
dwellings. The high frequency of such living arrangements makes them worthy
of study in their own right. Even if analysis is carried out at the level
of the family, it may be useful to incorporate some variables constructed
from the characteristics of the dwelling as a whole. For example, analysis
of surnames allows identification of kin who resided in the same dwelling
but in different families. a pattern that seems to have been common in
nineteenth-century cities.
The chief liability of sampling by dwellings instead of families is
that it reduces the number of independent observations in the file. Since
census microdata files are cluster samples (ordinarily clustered by household),
standard errors depend on both the number of clusters and on the homogeneity
of variables within clusters. Calculation of standard errors for samples
of this type is quite complicated (U.S. Bureau of the Census 1972; Kish
1965). In the worst case, with perfect homogeneity within clusters, the
standard errors for variables would be inversely proportional to the square
root of the number of clusters rather than the number of individuals. Even
for variables that are not very homogeneous within clusters, such as age,
there is some loss of precision when the total number of clusters is reduced.
However, the increase of error is small.
The public use microdata samples for 1900 and 1910 substituted the modern
census term "household" for the contemporary term "family." To avoid confusion,
this documentation follows that precedent.
SAMPLE DESIGN
The manuscript census for 1880 consists of about 1.2 million enumeration
pages, with 50 persons per page. These records are contained on 1,454 reels
of microfilm. Each reel contains the census pages for several enumeration
districts.
Our sampling strategy was based on the census page. Each pair of census
pages contains 100 persons; since our sample density is 1 in 100, we needed
an average of one person every two pages. We therefore randomly generated
one sample point for every two pages. To ensure that dwellings had an equal
probability of being included in the sample regardless of their size, they
were only entered if the sample point fell on the line containing the first
person in the dwelling. When the sample point fell on any other dwelling
member, the dwelling was skipped. Accordingly, for example, if the sample
point fell within a dwelling with 5 members, there was only a 1 in 5 chance
that the dwelling would be included in the sample, but if it was included,
all five members were entered. Under this procedure each dwelling, family,
and individual in the population had a 1 in 100 probability of inclusion.
We modified this procedure for persons residing in institutions and
large group quarters. The previous public use microdata samples incorporated
a variety of sampling strategies for handling such cases. In general, members
of large units have been sampled on an individual basis, simply by treating
each member as if they lived in their own one-person household. This procedure
increases the efficiency of the sample by raising the number of observations
while at the same time maintaining representativeness .
Unfortunately, since the criteria for designating units to be sampled
on an individual basis have varied, the samples are incompatible for some
applications. In the 1980 Public Use Microdata Sample, all units with 9
or more members unrelated to the householder were classified as group quarters,
and members of group quarters were sampled on an individual basis (U.S.
Bureau of the Census 1982). For the public use microdata samples of the
period 1940-1970, the procedure was similar, except that units with 5 or
more secondary individuals or secondary family members were classified
as group quarters and sampled individually (U.S. Bureau of the Census 1972,
1984a, 1984b). In the 1910 sample up to 20 members of a family could be
unrelated to the head before the members were sampled at the individual
level (Strong 1988). The higher threshold for individual level sampling
in 1910 allows detailed study of the small boarding houses that were characteristic
of the period; once again, however, there is a tradeoff between sampling
error and the richness of the data. In the case of the 1900 data file,
all boarders, lodgers, and the institutionalized were sampled as individuals
or as secondary families, a strategy that maximizes precision at considerable
cost in terms of lost information (Graham 1980). For example, the 1900
system makes it impossible to create a precise analog of the group quarters
concept used in recent census years.
To ensure definitional comparability of the 1880 sample with all existing
public use microdata samples, the number of persons allowed before sampling
the unit at the individual level had to be at least as large as in the
1910 sample. We decided to expand the threshold to 30, which allows study
of many boarding houses as intact units. The following set of inclusion
rules assured compatibility with the sample designs of the previous public
use microdata samples, while at the same time enriching the data. These
rules result in equal probabilities of inclusion, regardless of dwelling
size, family size, or the number of coresident relatives.
1. If the dwelling contains 30 or fewer residents:
a) accept the entire dwelling if the sample point falls on the first
listed individual in the dwelling.
b) reject the entire dwelling if the sample point falls on any other
dwelling resident.
2. If the dwelling contains 31 or more residents and the family contains
30 or fewer persons:
a) accept the entire family if the sample point falls on the family
head. Also enter data on overall dwelling size and the number of families
in the dwelling.
b) reject the entire family if the sample point falls on any other
family member.
3. If the dwelling contains 31 or more residents and the family contains
31 or more persons and the sample point falls within any group of related
persons within the family (in 1960 census usage, within a primary or secondary
family):
a) accept the group of related persons if the sample point falls on
the first listed individual within the related group. Also enter data on
overall dwelling size, family size, and the number of families in the dwelling.
b) reject the entire related group if the sample point falls on any
other member of the related group.
4. If the dwelling contains 31 or more residents and the family contains
31 or more persons and the sample point falls on an individual with no
relatives in the family:
a) accept the individual. Also enter data on overall dwelling size,
family size, and the number of families in the dwelling.
5. Fragments were also identified. Fragments are individuals or groups of individuals who were enumerated separately from their household or group quarters. Most often this ocurred when, at the end of a district, and enumerator added the names of individuals who had been missed. For these individuals, the unit's status as household versus group quarters cannot be classified.
This set of sampling rules may seem excessively complex, but their implementation
is straightforward. All but a few percent of the cases fall under the first
rule. The second rule for the most pan comes into play in the case of the
large tenement houses of the Eastern cities. The third and fourth rules
apply for institutions, military barracks, hotels, dormitories, and the
like. In some cases, we were unable to determine the breaks between dwellings,
because the enumerator failed to provide dwelling numbers. We then sampled
at the level of the family, using rules 2 through 4. Most of these cases
were probably single-family dwellings, but their dwelling size was coded
as missing. The variable SAMPRULE indicates which sampling rule was employed
for each case, distinguishing those cases in which sampling was carried
out at the family level because of missing dwelling numbers.
Divisions among families within dwellings were determined both by the
family relationship codes and family numbers. Enumerators were instructed
to write a new family number at the beginning of each family, and to assess
family relationships from the perspective of the first person, or head,
of each family. Occasionally, however, the two fields provide conflicting
information; either there is a new head of household with no new family
number or there are continuous family relationships with a new family number.
In such cases, we generally used family relationships to determine the
breaks between households, except in cases where we judged the family relationship
to be unreliable. Users can consult the variables QHHSPLIT and QHHJOIN for more information on family relationships. QHHSPLIT is a household-level variable that identifies when a family was split into two. This occurred when two apparently unrelated families, as determined by the family relationship variable, RELATE, appeared in the same household, and the unit was divided into two households. This change is indicated by the flag QHHSPLIT on the household record. These are cases in which the enumerator clearly put a new household number on the wrong line; these errors were edited by hand.
The related person-level variable, QHHJOIN, represents when two households in a dwelling were merged into one. Some enumerators were overzealous when it came to identifying separate households, and split every dwelling into multiple families. We adopted the policy of letting the relationship field take priority over the family numbers. In cases where the first person in the second family had a valid relationship to the head of the first family (daughter, servant, etc.) the second family number was replaced with that of the prior family.
ENDNOTES
-
Adapted from the following text: Steven Ruggles and Russell R. Menard, Public Use
Microdata Sample of the 1880 United States Census of Population: User's
Guide and Technical Documentation, Minneapolis: Social History Research
Laboratory, 1995, pp. 4-7.
Go Back to Sampling Procedures Index
|