1980
Sample Design1
Go Back to Sampling Procedures Index
This chapter discusses the selection procedure for the public-use microdata
samples in terms of three major operations: (1) the selection of the full
1980 census sample, (2) the estimation procedure for the full census sample,
and (3) the selection of the public-use microdata samples from the persons
and housing units included in the full 1980 census sample, using weights
derived from the full sample estimation procedure.
1980 Census Sample Design
and Estimation Procedure
While every person and housing unit in the United States was enumerated
on a questionnaire that requested certain basic demographic information
(e.g. age, race, relationship), a sample of persons and housing units was
enumerated on a questionnaire that requested additional information. The
basic sampling unit for the 1980 census was the housing unit, including
all occupants. For persons living in group quarters, the sampling unit
was the person. Two sampling rates were employed. In counties, incorporated
places and minor civil divisions estimated to have fewer than 2500 persons
(based on precensus estimates), one-half of all housing units and persons
in group quarters were to be included in the sample. In all other places,
one-sixth of the housing units or persons in group quarters were sampled.
The purpose of this scheme was to provide relatively more reliable estimates
for small places. When both sampling rates were taken into account across
the nation, approximately 19 percent of the nation's housing units were
included in the census sample.
The sample designation method depended on the data collection procedures.
In about 95 percent of the country the census was taken by the mailout/mailback
procedure. For these areas, the Bureau of the Census either purchased a
commercial mailing list which was updated and corrected by Census Bureau
field staff, or prepared a mailing list by canvassing and listing each
address in the area prior to Census Day. These lists were computerized,
and every sixth unit (for 1-in-6 areas) or every second unit (for 1-in-2
areas) was designated as a sample unit by computer. Both of these lists
were also corrected by the Post Office.
In non-mailout/mailback areas, a blank listing book with designated
sample lines (every sixth of every second line) was prepared for the enumerator.
Beginning about Census Day, the enumerator systematically canvassed the
areas and listed all housing units in the listing book in the order they
were encountered. Completed questionnaires, including sample information
for any housing unit which was listed an a designated sample line, were
collected.
In both types of data collection procedure areas, an enumerator was
responsible for a small geographic area known as an enumeration district,
or ED. An ED usually represented the average workload area for one enumerator.
In order to reduce the cost of processing the full census sample, a
scheme was designed, while the sample questionnaires were being processed,
to select a sample of questionnaires on which the travel time to work,
place of work and migration data items would be coded (hereafter referred
to as POW/MIG items). The sample questionnaires were processed by work
units consisting of 1980 census EDs. In work units (EDs) where these data
items had not yet been coded, every second sample questionnaire within
the work unit was selected for these coding operations. In work units where
the POW/MIG data items already had been coded, all sample questionnaires
were included in tabulations.
Estimation Procedure For
Published Sample Data
The estimates which appear in census sample publications were obtained
from an iterative ratio estimation procedure which resulted in the assignment
of a weight to each sample person or housing unit record. For any given
tabulation area, a characteristic total was estimated by summing the weights
assigned to the persons or housing units in the tabulation area which possessed
the characteristic. Estimates of family characteristics were based on the
weights assigned to the family members designated as householders. Each
sample person or housing unit record was assigned one weight to be used
to produce estimates of all characteristics. (Persons with the migration,
travel time to work, and place of work characteristic received an additional
weight.) For example, if the weight given to a sample person or housing
unit had the value of five, all characteristics of that person or housing
unit would be tabulated with a weight of five. The estimation procedure,
however, did assign weights which vary from person to person or housing
unit to housing unit.
The estimation procedure used to assign the weights was performed in
geographically defined "weighting areas." Weighting areas were generally
formed of adjoining portions of geography, which closely agreed with census
tabulation areas within counties. Weighting areas were required to have
a minimum sample of 400 persons. Weighting areas were never allowed to
cross state or county boundaries. In small counties with a sample count
of less than 400 persons, the minimum required sample condition was relaxed
to permit the entire county to become a weighting area.
Within a weighting area, the ratio estimation procedure for persons
was performed in three stages. For persons, the first stage employed seventeen
household-type groups. The second stage used two groups: householders and
nonhouseholders. The third stage could potentially use 160 age-sex-race-Spanish
origin groups. The stages were as follows:
Stage I - Type of Household
| Group |
Persons in Housing Units with Family with Own Children Under
18: |
| 1 |
2 persons in housing unit |
| 2 |
3 persons in housing unit |
| 3 |
4 persons in housing unit |
| 4 |
5 to 7 persons in housing unit |
| 5 |
8-or-more persons in housing unit |
| |
Persons in Housing Units with a Family without Own Children under 18: |
| 6-10 |
2 persons in housing unit through 8-or-more
persons in housing unit |
| |
Persons in All Other Housing Units: |
| 11 |
1 person in housing unit |
| 12-16 |
2 persons in housing unit through 8-or-more
persons in housing unit |
| 17 |
Persons in Group Quarters |
Stage II - Householder/Nonhouseholder
| Group |
|
| 1 |
Householder |
| 2 |
Nonhouseholder (including persons in group quarters) |
Stage III - Age/Sex/Race/Spanish Origin
| Group |
White Race |
| |
Persons of Spanish Origin |
| |
Male |
| 1 |
0 to 4 years of age |
| 2 |
5 to 14 years of age |
| 3 |
15 to 19 years of age |
| 4 |
20 to 24 years of age |
| 5 |
25 to 34 years of age |
| 6 |
35 to 44 years of age |
| 7 |
45 to 64 years of age |
| 8 |
65 years of age or older |
| |
Female |
| 9-16 |
Same age categories as groups 1 to 8 |
| |
Persons Not of Spanish origin |
| 17-32 |
Same age and sex categories as groups 1 to 16 |
| |
Black Race |
| 33-64 |
Same age/sex/Spanish origin categories as groups
1 to 32 |
| |
Asian, Pacific Islander Race |
| 65-96 |
Same age/sex/Spanish origin categories as groups
1 to 32 |
| |
Indian (American) or Eskimo or Aleut Race |
| 97-128 |
Same age/sex/Spanish origin categories as groups
1 to 32 |
| |
Other Race (includes those races not listed above) |
| 129-160 |
Same age/sex/Spanish origin categories as groups
1 to 32 |
Within a weighting area, the first step in the estimation procedure
was to assign each sample person record an initial weight. This weight
was approximately equal to the inverse of the probability of selecting
a person for the census sample, for example 6 in a 1-in-6 area.
The next step in the estimation procedure was to combine, if necessary,
the groups within each of the three stages prior to the repeated ratio
estimation in order to increase the reliability of the ratio estimation
procedure. For the first and second stages, any group that did not meet
certain criteria concerning the unweighted sample count or the ratio of
the complete count to the initially weighted sample count, was combined,
or collapsed, with another group in the same stage according to a specified
collapsing pattern. At the third stage, the "Other" race category was collapsed
with the "White" race category before the application of the above collapsing
criteria as well as an additional criterion concerning the number of complete
count persons in each category.
As a final step, the initial weights underwent three stages of ratio
adjustment which used the groups listed above. At the first stage, the
ratio of the complete census count to the sum of the initial weights for
each sample person was computed for each stage I group. The initial weight
assigned to each person in a group was then multiplied by the stage I group
ratio to produce an adjusted weight. In stage II, the stage I adjusted
weights were again adjusted by the ratio of the complete census count to
the sum of the stage I weights for sample persons in each stage II group.
Finally, the stage II weights were adjusted at stage III by the ratio of
the complete census count and the sum of the stage II weights for sample
persons in each stage III group. The three stages of adjustment were performed
twice (two iterations) in the order given above. The weights obtained from
the second iteration for stage III were assigned to the sample person records.
However, to avoid complications in rounding for tabulated data, only whole
number weights were assigned. For example, if the final weight for the
persons in a particular group was 7.2, then one-fifth of the sample persons
in this group were randomly assigned a weight of 8 and the remaining four-fifths
received a weight of 7.
Separate weights were derived for tabulating the travel time to work,
place of work, and migration data items. The weights were obtained by adjusting
the weight derived above for persons on questionnaires selected for coding
by the reciprocal of the ED coding rate and a ratio adjustment to ensure
that the sum of the weights and the complete-count total population figure
would agree.
The ratio estimation procedure for housing units was essentially the
same as that for persons. The major difference was that the occupied housing
unit ratio estimation procedure was done in two stages and the vacant housing
unit ratio estimation procedure was done in one stage. The first stage
for occupied housing units employed sixteen household type categories and
the second stage could potentially use 190 tenure- race-Spanish origin-value/rent
groups. For vacant housing units three groups were utilized. The stages
for the ratio estimation for housing units were as follows:
Occupied Housing Units
Stage I - Type of Household
| Group |
|
| 1 |
Housing Units with a Family with Own Children Under 18 |
| 2 |
2 persons in housing unit |
| 3 |
3 persons in housing unit |
| 4 |
4 persons in housing unit |
| 5 |
5 to 7 persons in housing unit through 8-or-more
persons in housing unit |
| |
Housing Units with a Family Without Own Children Under 18 |
| 6-10 |
2 persons in housing unit through 8-or-more
persons in housing unit |
| |
All Other Housing Units |
| 11 |
1 person in housing unit |
| 12 |
2 persons in housing unit through 8-or-more
persons in housing unit |
Stage II-Tenure/Race and Origin of Householder/Value or Rent
Note: the table directly below applies only to the United States samples. The Puerto Rico samples used a slightly different matrix, available below this table.
| Group |
Owner: |
| |
White race (Householder): |
| |
Persons of Spanish origin (Householder) |
| |
Value of house |
| 1 |
$ 1 - $ 9,999 |
| 2 |
$10,000 - $ 19,999 |
| 3 |
$20,000 - $ 24,999 |
| 4 |
$25,000 - $ 49,999 |
| 5 |
$50,000 - $ 99,999 |
| 6 |
$100,000 - $ 149,999 |
| 7 |
$150,000 + |
| 8 |
Other Owners |
| |
Persons not of Spanish Origin |
| 9-16 |
Same value categories as groups 1 to 8 |
| |
Black Race: |
| 17-32 |
Same value - Spanish origin categories as groups 1 to 16 |
| |
Asian, Pacific Islander Race: |
| 33-48 |
Same value - Spanish origin categories as groups 1 to 16 |
| |
Indian (American) or Eskimo or Aleut Race: |
| 49-64 |
Same value - Spanish origin categories as groups 1 to 16 |
| |
Other Race (includes those races not listed above): |
| 65-80 |
Same value - Spanish origin categories as groups 1 to 16 |
| |
Renter: |
| |
White Race: |
| |
Persons of Spanish origin |
| |
Rent categories |
| 81 |
$ 1 - $ 59 |
| 82 |
$ 60 - $ 99 |
| 83 |
$ 100 - $ 149 |
| 84 |
$ 150 - $ 199 |
| 85 |
$ 200 - $ 249 |
| 86 |
$ 250 - $ 299 |
| 87 |
$ 300 - $ 399 |
| 88 |
$ 400 - $ 499 |
| 89 |
$ 500 + |
| 90 |
Other Renter |
| 91 |
No Cash Rent |
| |
Persons not of Spanish origin |
| 92-102 |
Same rent categories as groups 81 to 91 |
| |
Black Race: |
| 103-104 |
Same rent - Spanish origin categories as groups 81 to 102 |
| |
Asian, Pacific Islander Race: |
| 125-146 |
Same rent - Spanish origin categories as groups 81 to 102 |
| |
Indian (American) or Eskimo or Aleut Race: |
| 147-168 |
Same rent - Spanish origin categories as groups 81 to 102 |
| |
Other Race (includes those not listed above): |
| 169-190 |
Same rent - Spanish origin categories as groups 81 to 102 |
Stage II-Tenure/Value or Rent [Puerto Rico only]
Note: the table directly below applies only to the Puerto Rico samples. The United States samples used a slightly different version of this table, available above.
| |
Owner: |
| Group |
Value of house |
| 1 |
$ 0 - $ 1,999 |
| 2 |
$2,000 - $ 4,999 |
| 3 |
$5,000 - $ 9,999 |
| 4 |
$10,000 - $ 19,999 |
| 5 |
$20,000 - $ 49,999 |
| 6 |
$50,000 - $ 74,999 |
| 7 |
$75,000 + |
| 8 |
Other Owners |
| |
|
| |
Renter: |
| |
Rent categories |
| 9 |
$ 1 - $ 29 |
| 10 |
$ 30 - $ 59 |
| 11 |
$ 60 - $ 99 |
| 12 |
$ 100 - $ 149 |
| 13 |
$ 150 - $ 199 |
| 14 |
$ 200 - $ 249 |
| 15 |
$ 250 - $ 299 |
| 16 |
$ 300 - $ 399 |
| 17 |
$ 400 + |
| 18 |
Other Renter |
| 19 |
No Cash Rent |
Vacant Housing Units
| Group |
|
| 1 |
Vacant for Rent |
| 2 |
Vacant for Sale |
| 3 |
Other Vacant |
The estimates produced by this procedure realize some of the gains in
sampling efficiency that would have resulted if the population had been
stratified into the ratio estimation groups before sampling, and the sampling
rate had been applied independently to each group. The net effect is a
reduction in both the standard error and the possible bias of most estimated
characteristics to levels below what would have resulted from simply using
the initial (unadjusted) weight A by-product of this estimation procedure
is that the estimates from the sample will, for the most part, be consistent
with the complete-count figures for the population and housing unit groups
used in the estimation procedure.
Selection of the Public-Use-Microdata
Samples
A stratified systematic selection procedure with probability proportional
to a measure of size was used to select each public-use microdata sample.
The sampling elements were the occupied housing unit including all occupants,
the person in group quarters or the vacant housing unit. The measure of
size was the full sample weight that resulted from the 1980 census ratio
estimation procedure described above.
It was also necessary to employ a subsampling scheme to yield microdata
samples with a consistent proportion of cases, from area to area, for which
place of work, travel time and migration were coded. The subsampling scheme
resulted in the occasional designation of selected microdata sample elements
for which the place of work, travel time and migration information was
blanked. This subsampling scheme was instituted so that the POW/MIG data
would be uniformly available for one half of all microdata cases, not half
in most areas but more than half in other areas. Thus, each 1-percent microdata
sample gives a 0.5-percent sample of records containing POW/MIG data, and
the 5-percent microdata sample gives a 2.5-percent sample for POW/MIG data.
The subsampling scheme was also based on a probability-proportional-to-size
sampling scheme which utilized measures of size based on both the POW/MIG
half-sample and full sample weights.
The sample selection procedures were as follows. First, the sample units
were stratified during the selection process. This stratification was intended
to improve the reliability of the 5-percent, 1-percent, and 0.1-percent
samples by defining strata within which there is an appreciable degree
of homogeneity among the census sample households with respect to characteristics
of major interest.
A total of 102 strata were defined: 72 strata for persons living in
occupied housing units; 24 strata for persons in group quarters (GQ); and
6 strata for vacant housing units. The strata are shown on Figures
1, 2, and 3.
The sample selection procedures were applied on a state-by-state basis
to obtain the microdata samples. Briefly, for any particular state, the
procedure to accomplish the sample selection consisted of creating a number
of cells in the computer which correspond to each of the strata defined
above. A random value was assigned to each cell and the sample edited detail
file (i.e., the internal-use microdata from the full census sample) was
then passed and the appropriate weight from each sample housing unit/GQ
person was cumulated into the cell corresponding to the appropriate stratum
for each unit/person. For occupied housing units, the full sample person
weight assigned to the householder of the unit was used. For GQ persons,
the full sample person weight was used, while for vacant housing units,
the full sample housing unit weight was used.
For a given 1-percent sample, when a unit/person caused the cumulation
to exceed 100, that unit/person was designated for the sample, and the
value of the cell was reset. The procedure was then repeated. For the 5-percent
sample selection, the procedure was the same except that the cumulation
cut-off was 20 instead of 100. The starting value of each cell was set
so as to minimize the likelihood that any one case would be selected into
more than one public-use microdata sample, and the overlap among the samples
may be considered negligible. There is a small probability that a given
individual unit (one with a high census weight) may have been selected
into the 5-percent sample more than once, but this duplication should not
have any particularly undesirable consequences.
The POW/MIG subsampling operation was performed by first assigning
each selected microdata unit, from the POW/MIG coded strata, a measure
equal to the ratio of the POW/MIG half-sample weight to the full sample
weight for the selected microdata unit. These measures were cumulated from
the selected microdata sample units until the cumulation exceeded 2. The
POW/MIG data for the units which caused the cumulation to exceed 2 was
retained; otherwise, the POW/MIG information was blanked.
Selection of One-In-One-Thousand
and Other Subsamples
During the sample selection operation, consecutive two-digit subsample
numbers from 00 to 99 were assigned to each sample case in the 5-percent
and 1-percent samples to allow for the designation of various size subsamples
and to allow for the calculation of standard errors. As an example,
for a B or C 1-percent public-use microdata sample, the choice of records
having subsample numbers with the same "units" digit (e.g., the ones "units"
digit includes subsample numbers 01, 11, 21, ... 91) will provide a one-in-one-thousand
subsample.
The Bureau has chosen one one-in-one thousand subsample from each of
the A, B, and C public-use microdata samples. The one-in-one-thousand subsample
from the A Sample was obtained by selecting those records with a subsample
number of 13 or 63. The one-in-one-thousand subsamples from the B and C
Samples were obtained by selecting those records with subsample number
having a units digit of 4 on the B Sample, or a units digit of 9 on the
C Sample, ignoring the tens digit of the two-digit subsample number.
Samples of any size between 1/20 and 1/10,000 may be selected in a similar
manner by using appropriate two-digit subsample numbers assigned to
the A, B, or C microdata samples. Care must be exercised when selecting
such samples. If only one "units" digit is required, the "units"
digit should be randomly selected. If two "units" digits are required,
the first should be randomly selected and the second should be either
five more or five less than the first. Failure to use this procedure,
e.g,, selection of records with the same "tens" digit instead of records
with the same "units" digit, would provide a one-in-ten subsample
but one that would be somewhat more clustered and as a result subject
to larger sampling error.
ENDNOTES:
-
Originally published as "Chapter 4, Sample
Design for the Public-Use Microdata Samples," Census of Population and
Housing, 1980: Public-use Microdata Samples Technical Documentation,
U.S. Department of Commerce, Bureau of the Census, Washington, DC, 1983,
pp. 35-42.
Go Back to Sampling Procedures Index
|