1980 Sample Design1

Go Back to Sampling Procedures Index

This chapter discusses the selection procedure for the public-use microdata samples in terms of three major operations: (1) the selection of the full 1980 census sample, (2) the estimation procedure for the full census sample, and (3) the selection of the public-use microdata samples from the persons and housing units included in the full 1980 census sample, using weights derived from the full sample estimation procedure.

1980 Census Sample Design and Estimation Procedure

While every person and housing unit in the United States was enumerated on a questionnaire that requested certain basic demographic information (e.g. age, race, relationship), a sample of persons and housing units was enumerated on a questionnaire that requested additional information. The basic sampling unit for the 1980 census was the housing unit, including all occupants. For persons living in group quarters, the sampling unit was the person. Two sampling rates were employed. In counties, incorporated places and minor civil divisions estimated to have fewer than 2500 persons (based on precensus estimates), one-half of all housing units and persons in group quarters were to be included in the sample. In all other places, one-sixth of the housing units or persons in group quarters were sampled. The purpose of this scheme was to provide relatively more reliable estimates for small places. When both sampling rates were taken into account across the nation, approximately 19 percent of the nation's housing units were included in the census sample.

The sample designation method depended on the data collection procedures. In about 95 percent of the country the census was taken by the mailout/mailback procedure. For these areas, the Bureau of the Census either purchased a commercial mailing list which was updated and corrected by Census Bureau field staff, or prepared a mailing list by canvassing and listing each address in the area prior to Census Day. These lists were computerized, and every sixth unit (for 1-in-6 areas) or every second unit (for 1-in-2 areas) was designated as a sample unit by computer. Both of these lists were also corrected by the Post Office.

In non-mailout/mailback areas, a blank listing book with designated sample lines (every sixth of every second line) was prepared for the enumerator. Beginning about Census Day, the enumerator systematically canvassed the areas and listed all housing units in the listing book in the order they were encountered. Completed questionnaires, including sample information for any housing unit which was listed an a designated sample line, were collected.

In both types of data collection procedure areas, an enumerator was responsible for a small geographic area known as an enumeration district, or ED. An ED usually represented the average workload area for one enumerator.

In order to reduce the cost of processing the full census sample, a scheme was designed, while the sample questionnaires were being processed, to select a sample of questionnaires on which the travel time to work, place of work and migration data items would be coded (hereafter referred to as POW/MIG items). The sample questionnaires were processed by work units consisting of 1980 census EDs. In work units (EDs) where these data items had not yet been coded, every second sample questionnaire within the work unit was selected for these coding operations. In work units where the POW/MIG data items already had been coded, all sample questionnaires were included in tabulations.

Estimation Procedure For Published Sample Data

The estimates which appear in census sample publications were obtained from an iterative ratio estimation procedure which resulted in the assignment of a weight to each sample person or housing unit record. For any given tabulation area, a characteristic total was estimated by summing the weights assigned to the persons or housing units in the tabulation area which possessed the characteristic. Estimates of family characteristics were based on the weights assigned to the family members designated as householders. Each sample person or housing unit record was assigned one weight to be used to produce estimates of all characteristics. (Persons with the migration, travel time to work, and place of work characteristic received an additional weight.) For example, if the weight given to a sample person or housing unit had the value of five, all characteristics of that person or housing unit would be tabulated with a weight of five. The estimation procedure, however, did assign weights which vary from person to person or housing unit to housing unit.

The estimation procedure used to assign the weights was performed in geographically defined "weighting areas." Weighting areas were generally formed of adjoining portions of geography, which closely agreed with census tabulation areas within counties. Weighting areas were required to have a minimum sample of 400 persons. Weighting areas were never allowed to cross state or county boundaries. In small counties with a sample count of less than 400 persons, the minimum required sample condition was relaxed to permit the entire county to become a weighting area.

Within a weighting area, the ratio estimation procedure for persons was performed in three stages. For persons, the first stage employed seventeen household-type groups. The second stage used two groups: householders and nonhouseholders. The third stage could potentially use 160 age-sex-race-Spanish origin groups. The stages were as follows:

Stage I - Type of Household

Group Persons in Housing Units with Family with Own Children Under 18:
1 2 persons in housing unit
2 3 persons in housing unit
3 4 persons in housing unit
4 5 to 7 persons in housing unit
5 8-or-more persons in housing unit
Persons in Housing Units with a Family without Own Children under 18:
6-10 2 persons in housing unit through 8-or-more persons in housing unit
Persons in All Other Housing Units:
11 1 person in housing unit
12-16 2 persons in housing unit through 8-or-more persons in housing unit
17 Persons in Group Quarters

Stage II - Householder/Nonhouseholder

Group
1 Householder
2 Nonhouseholder (including persons in group quarters)

Stage III - Age/Sex/Race/Spanish Origin

Group White Race
Persons of Spanish Origin
Male
1 0 to 4 years of age
2 5 to 14 years of age
3 15 to 19 years of age
4 20 to 24 years of age
5 25 to 34 years of age
6 35 to 44 years of age
7 45 to 64 years of age
8 65 years of age or older
Female
9-16 Same age categories as groups 1 to 8
Persons Not of Spanish origin
17-32 Same age and sex categories as groups 1 to 16
Black Race
33-64 Same age/sex/Spanish origin categories as groups 1 to 32
Asian, Pacific Islander Race
65-96 Same age/sex/Spanish origin categories as groups 1 to 32
Indian (American) or Eskimo or Aleut Race
97-128 Same age/sex/Spanish origin categories as groups 1 to 32
Other Race (includes those races not listed above)
129-160 Same age/sex/Spanish origin categories as groups 1 to 32

Within a weighting area, the first step in the estimation procedure was to assign each sample person record an initial weight. This weight was approximately equal to the inverse of the probability of selecting a person for the census sample, for example 6 in a 1-in-6 area.

The next step in the estimation procedure was to combine, if necessary, the groups within each of the three stages prior to the repeated ratio estimation in order to increase the reliability of the ratio estimation procedure. For the first and second stages, any group that did not meet certain criteria concerning the unweighted sample count or the ratio of the complete count to the initially weighted sample count, was combined, or collapsed, with another group in the same stage according to a specified collapsing pattern. At the third stage, the "Other" race category was collapsed with the "White" race category before the application of the above collapsing criteria as well as an additional criterion concerning the number of complete count persons in each category.

As a final step, the initial weights underwent three stages of ratio adjustment which used the groups listed above. At the first stage, the ratio of the complete census count to the sum of the initial weights for each sample person was computed for each stage I group. The initial weight assigned to each person in a group was then multiplied by the stage I group ratio to produce an adjusted weight. In stage II, the stage I adjusted weights were again adjusted by the ratio of the complete census count to the sum of the stage I weights for sample persons in each stage II group. Finally, the stage II weights were adjusted at stage III by the ratio of the complete census count and the sum of the stage II weights for sample persons in each stage III group. The three stages of adjustment were performed twice (two iterations) in the order given above. The weights obtained from the second iteration for stage III were assigned to the sample person records. However, to avoid complications in rounding for tabulated data, only whole number weights were assigned. For example, if the final weight for the persons in a particular group was 7.2, then one-fifth of the sample persons in this group were randomly assigned a weight of 8 and the remaining four-fifths received a weight of 7.

Separate weights were derived for tabulating the travel time to work, place of work, and migration data items. The weights were obtained by adjusting the weight derived above for persons on questionnaires selected for coding by the reciprocal of the ED coding rate and a ratio adjustment to ensure that the sum of the weights and the complete-count total population figure would agree.

The ratio estimation procedure for housing units was essentially the same as that for persons. The major difference was that the occupied housing unit ratio estimation procedure was done in two stages and the vacant housing unit ratio estimation procedure was done in one stage. The first stage for occupied housing units employed sixteen household type categories and the second stage could potentially use 190 tenure- race-Spanish origin-value/rent groups. For vacant housing units three groups were utilized. The stages for the ratio estimation for housing units were as follows:

Occupied Housing Units

Stage I - Type of Household

Group
1 Housing Units with a Family with Own Children Under 18
2 2 persons in housing unit
3 3 persons in housing unit
4 4 persons in housing unit
5 5 to 7 persons in housing unit through 8-or-more persons in housing unit
Housing Units with a Family Without Own Children Under 18
6-10 2 persons in housing unit through 8-or-more persons in housing unit
All Other Housing Units
11 1 person in housing unit
12 2 persons in housing unit through 8-or-more persons in housing unit

Stage II-Tenure/Race and Origin of Householder/Value or Rent

Note: the table directly below applies only to the United States samples. The Puerto Rico samples used a slightly different matrix, available below this table.

Group Owner:
White race (Householder):
Persons of Spanish origin (Householder)
Value of house
1 $ 1 - $ 9,999
2 $10,000 - $ 19,999
3 $20,000 - $ 24,999
4 $25,000 - $ 49,999
5 $50,000 - $ 99,999
6 $100,000 - $ 149,999
7 $150,000 +
8 Other Owners
Persons not of Spanish Origin
9-16 Same value categories as groups 1 to 8
Black Race:
17-32 Same value - Spanish origin categories as groups 1 to 16
Asian, Pacific Islander Race:
33-48 Same value - Spanish origin categories as groups 1 to 16
Indian (American) or Eskimo or Aleut Race:
49-64 Same value - Spanish origin categories as groups 1 to 16
Other Race (includes those races not listed above):
65-80 Same value - Spanish origin categories as groups 1 to 16
Renter:
White Race:
Persons of Spanish origin
Rent categories
81 $ 1 - $ 59
82 $ 60 - $ 99
83 $ 100 - $ 149
84 $ 150 - $ 199
85 $ 200 - $ 249
86 $ 250 - $ 299
87 $ 300 - $ 399
88 $ 400 - $ 499
89 $ 500 +
90 Other Renter
91 No Cash Rent
Persons not of Spanish origin
92-102 Same rent categories as groups 81 to 91
Black Race:
103-104 Same rent - Spanish origin categories as groups 81 to 102
Asian, Pacific Islander Race:
125-146 Same rent - Spanish origin categories as groups 81 to 102
Indian (American) or Eskimo or Aleut Race:
147-168 Same rent - Spanish origin categories as groups 81 to 102
Other Race (includes those not listed above):
169-190 Same rent - Spanish origin categories as groups 81 to 102

Stage II-Tenure/Value or Rent [Puerto Rico only]

Note: the table directly below applies only to the Puerto Rico samples. The United States samples used a slightly different version of this table, available above.

Owner:
Group Value of house
1 $ 0 - $ 1,999
2 $2,000 - $ 4,999
3 $5,000 - $ 9,999
4 $10,000 - $ 19,999
5 $20,000 - $ 49,999
6 $50,000 - $ 74,999
7 $75,000 +
8 Other Owners
Renter:
Rent categories
9 $ 1 - $ 29
10 $ 30 - $ 59
11 $ 60 - $ 99
12 $ 100 - $ 149
13 $ 150 - $ 199
14 $ 200 - $ 249
15 $ 250 - $ 299
16 $ 300 - $ 399
17 $ 400 +
18 Other Renter
19 No Cash Rent

Vacant Housing Units

Group
1 Vacant for Rent
2 Vacant for Sale
3 Other Vacant

The estimates produced by this procedure realize some of the gains in sampling efficiency that would have resulted if the population had been stratified into the ratio estimation groups before sampling, and the sampling rate had been applied independently to each group. The net effect is a reduction in both the standard error and the possible bias of most estimated characteristics to levels below what would have resulted from simply using the initial (unadjusted) weight A by-product of this estimation procedure is that the estimates from the sample will, for the most part, be consistent with the complete-count figures for the population and housing unit groups used in the estimation procedure.

Selection of the Public-Use-Microdata Samples

A stratified systematic selection procedure with probability proportional to a measure of size was used to select each public-use microdata sample. The sampling elements were the occupied housing unit including all occupants, the person in group quarters or the vacant housing unit. The measure of size was the full sample weight that resulted from the 1980 census ratio estimation procedure described above.

It was also necessary to employ a subsampling scheme to yield microdata samples with a consistent proportion of cases, from area to area, for which place of work, travel time and migration were coded. The subsampling scheme resulted in the occasional designation of selected microdata sample elements for which the place of work, travel time and migration information was blanked. This subsampling scheme was instituted so that the POW/MIG data would be uniformly available for one half of all microdata cases, not half in most areas but more than half in other areas. Thus, each 1-percent microdata sample gives a 0.5-percent sample of records containing POW/MIG data, and the 5-percent microdata sample gives a 2.5-percent sample for POW/MIG data. The subsampling scheme was also based on a probability-proportional-to-size sampling scheme which utilized measures of size based on both the POW/MIG half-sample and full sample weights.

The sample selection procedures were as follows. First, the sample units were stratified during the selection process. This stratification was intended to improve the reliability of the 5-percent, 1-percent, and 0.1-percent samples by defining strata within which there is an appreciable degree of homogeneity among the census sample households with respect to characteristics of major interest.

A total of 102 strata were defined: 72 strata for persons living in occupied housing units; 24 strata for persons in group quarters (GQ); and 6 strata for vacant housing units. The strata are shown on Figures 1, 2, and 3.

The sample selection procedures were applied on a state-by-state basis to obtain the microdata samples. Briefly, for any particular state, the procedure to accomplish the sample selection consisted of creating a number of cells in the computer which correspond to each of the strata defined above. A random value was assigned to each cell and the sample edited detail file (i.e., the internal-use microdata from the full census sample) was then passed and the appropriate weight from each sample housing unit/GQ person was cumulated into the cell corresponding to the appropriate stratum for each unit/person. For occupied housing units, the full sample person weight assigned to the householder of the unit was used. For GQ persons, the full sample person weight was used, while for vacant housing units, the full sample housing unit weight was used.

For a given 1-percent sample, when a unit/person caused the cumulation to exceed 100, that unit/person was designated for the sample, and the value of the cell was reset. The procedure was then repeated. For the 5-percent sample selection, the procedure was the same except that the cumulation cut-off was 20 instead of 100. The starting value of each cell was set so as to minimize the likelihood that any one case would be selected into more than one public-use microdata sample, and the overlap among the samples may be considered negligible. There is a small probability that a given individual unit (one with a high census weight) may have been selected into the 5-percent sample more than once, but this duplication should not have any particularly undesirable consequences.

The POW/MIG subsampling operation was performed by first assigning each selected microdata unit, from the POW/MIG coded strata, a measure equal to the ratio of the POW/MIG half-sample weight to the full sample weight for the selected microdata unit. These measures were cumulated from the selected microdata sample units until the cumulation exceeded 2. The POW/MIG data for the units which caused the cumulation to exceed 2 was retained; otherwise, the POW/MIG information was blanked.

Selection of One-In-One-Thousand and Other Subsamples

During the sample selection operation, consecutive two-digit subsample numbers from 00 to 99 were assigned to each sample case in the 5-percent and 1-percent samples to allow for the designation of various size subsamples and to allow for the calculation of standard errors As an example, for a B or C 1-percent public-use microdata sample, the choice of records having subsample numbers with the same "units" digit (e.g., the ones "units" digit includes subsample numbers 01, 11, 21, ... 91) will provide a one-in-one-thousand subsample.

The Bureau has chosen one one-in-one thousand subsample from each of the A, B, and C public-use microdata samples. The one-in-one-thousand subsample from the A Sample was obtained by selecting those records with a subsample number of 13 or 63. The one-in-one-thousand subsamples from the B and C Samples were obtained by selecting those records with subsample number having a units digit of 4 on the B Sample, or a units digit of 9 on the C Sample, ignoring the tens digit of the two-digit subsample number.

Samples of any size between 1/20 and 1/10,000 may be selected in a similar manner by using appropriate two-digit subsample numbers assigned to the A, B, or C microdata samples. Care must be exercised when selecting such samples If only one "units" digit is required, the "units" digit should be randomly selected. If two "units" digits are required, the first should be randomly selected and the second should be either five more or five less than the first. Failure to use this procedure, e.g,, selection of records with the same "tens" digit instead of records with the same "units" digit, would provide a one-in-ten subsample but one that would be somewhat more clustered and as a result subject to larger sampling error.

ENDNOTES:

  1. Originally published as "Chapter 4, Sample Design for the Public-Use Microdata Samples," Census of Population and Housing, 1980: Public-use Microdata Samples Technical Documentation, U.S. Department of Commerce, Bureau of the Census, Washington, DC, 1983, pp. 35-42.

Go Back to Sampling Procedures Index

Back to Top