1990 Sample Design and Estimation1
Go Back to Sampling Procedures Index
This chapter discusses the selection procedure for the public use microdata samples in terms of four major operations: (1) the selection of the full 1990 census sample, (2) the estimation procedure for the full census sample, and (3) the selection of the public use microdata samples from the persons and housing units included in the full 1990 census sample, and (4) estimation for the PUMS samples
Producing Estimates or Tabulations
Estimation of totals and percentages. The 1980 Public Use Microdata Samples (PUMS) were self- weighted. It is very important to note that the 1990 PUMS samples are not self-weighted. To produce estimates on tabulations of 100 percent characteristics from the PUMS files, simply add the weights of all persons or housing units that possess the characteristic of interest. For instance, if the characteristic of interest is total number of Hispanic males aged 5-17, simply determine the sex, age, and Hispanic origin of all persons and cumulate the weights of those who match the characteristic of interest. The PUMS weight is a function of the full census sample weight and the PUMS sample design.
To get estimates of proportions simply divide the weighted estimate of persons or housing units with a given characteristic by the base sample estimate. For example, the proportion of owner-occupied housing units with plumbing is obtained by dividing the PUMS estimate of owner-occupied housing units with plumbing facilities by the PUMS estimates of total housing units.
To get estimates of characteristics such as the total number of related children in households (for housing unit level aggregates), simply multiply the PUMS weight by the value of the characteristic and sum across all household records. If the desired estimate is the number of households with at least one related child in the household, add the PUMS householder weight for all households with a value not equal to zero for the characteristic.
Sample Design
Every person and housing unit in the United States was asked certain basic demographic questions (for example, race, age, relationship, housing value, or rent). A sample of these persons and housing units was asked more detailed questions about such items as income, occupation, and housing costs in addition to the basic demographic and housing information. The primary sampling unit for the 1990 census was the housing unit, including all occupants. For persons living in group quarters, the sampling unit was the person. Persons in group quarters were sampled at a 1-in-6 rate.
The sample designation method depended on the data collection procedures. Approximately 95 percent of the population was enumerated by the mailback procedure. In these areas, the Bureau of the Census either purchased a commercial mailing list, which was updated by the United States Postal Service and Census Bureau field staff, or prepared a mailing by canvassing and listing each address in the area prior to Census Day. These lists were computerized and the appropriate units were electronically designated as sample units. The questionnaires were either mailed or hand delivered to the addresses with instructions on how to complete and mail back the form. Housing units in governmental units with a precensus (1988) estimated population of fewer than 2,500 persons were sampled at 1-in-2. Governmental units were defined for sampling purposes as all incorporated places, all counties, all county equivalents such as parishes in Louisiana, and all minor civil divisions in Connecticut, Maine, Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, and Wisconsin. Housing units in tracts and block numbering areas (BNAs) with a precensus housing unit count below 2,000 housing units were sampled at 1-in-6 for those portions not in small governmental units. Housing units within census tracts and BNAs with 2,000 or more housing units were sampled at 1-in-8 for those portions not in small governmental units.
In list/enumerate areas (about 5 percent of the population) each enumerator was given a blank address register with designated sample lines. Beginning about Census Day, the enumerator systematically canvassed the area and listed all housing units in the address register in the order they were encountered. Completed questionnaires, including sample information for any housing unit listed on a designated sample line, were collected. For all governmental units with fewer than 2,500 persons in list/enumerate areas, a 1-in-2 sampling rate was used. All other list/enumerate areas were sampled at 1-in-6.
Housing units in American Indian reservations, Tribal Jurisdiction Statistical areas, and Alaska Native villages were sampled according to the same criteria as other governmental units, except the sampling rates were based on the size of the American Indian and Alaska native population in those areas as measured in the 1980 census. Trust lands were sampled at the same rate as their associated American Indian reservations. Census designated places in Hawaii were sampled at the same rate as governmental units because the Census Bureau does not recognize incorporated places in Hawaii.
The purpose of using variable sampling rates was to provide relatively more reliable estimates for small areas and decrease respondent burden in more densely populated areas while maintaining data reliability. When all sampling rates were taken into account across the nation, approximately one out of every six housing units in the nation was included in the 1990 census sample.
Estimation Procedure
Estimates from the census sample were obtained from an iterative ratio estimation procedure (iterative proportional fitting) resulting in the assignment of a weight to each sample person or housing unit record. For any given tabulation area, a characteristic total was estimated by summing the weights assigned to the persons or housing units possessing the characteristic in the tabulation area. Estimates of family or household characteristics were based on the weight assigned to the person designated as householder. Each sample person or housing unit record was assigned exactly one weight to be used to produce estimates of all characteristics. For example, if the weight given to a sample person or housing unit had the value 6, all characteristics of that person or housing unit would be tabulated with the weight of 6. The estimation procedure, however, did assign weights varying from person to person or housing unit to housing unit. The estimation procedure used to assign the weight was performed in geographically defined "weighting areas". Weighting areas were generally formed of contiguous portions of geography which closely agreed with census tabulation areas within counties. Weighting areas were never allowed to cross state or county boundaries. In small counties with a sample unit below 400 persons, the minimum required sample condition was relaxed to permit the entire county to become a weighting area.
Within a weighting area, the ratio estimation procedure for persons was performed in four stages. For persons, the first stage applied 17 household-type groups. The second stage used two groups: sampling rate of 1-in-2 and sampling rate below 1-in-2. The third stage used the dichotomy householder/non-householder. The fourth stage applied 180 aggregate age/sex/Hispanic origin/race categories.
The stages were as follows:
Stage I. Type of Household | |||||
---|---|---|---|---|---|
Group | Persons in Housing Units with a Family | ||||
with Own Children Under 18 | |||||
1 | 2 persons in housing unit | ||||
2 | 3 persons in housing unit | ||||
3 | 4 persons in housing unit | ||||
4 | 5 to 7 persons in housing unit | ||||
5 | 8 or more in housing unit | ||||
Persons in Housing Units with a Family | |||||
without Own Children Under 18 | |||||
6-10 | 2 through 8 or more persons in | ||||
housing unit | |||||
Persons in All Other Housing Units | |||||
11 | 1 person in housing unit | ||||
12-16 | 2 through 8 or more persons in | ||||
housing unit | |||||
Persons in Group Quarters | |||||
17 | Persons in Group Quarters | ||||
Stage II. Sampling Rates | |||||
1 | Sampling rate of 1-in-2 | ||||
2 | Sampling rate less than 1-in-2 | ||||
Stage III. Householder/Nonhouseholder | |||||
1 | Householder | ||||
2 | Nonhouseholder | ||||
Stage IV. Age/Sex/Hispanic Origin/Race | |||||
Group | White: | ||||
Persons of Hispanic Origin | |||||
Male | |||||
1 | 0 to 4 years of age | ||||
2 | 5 to 14 years of age | ||||
3 | 15 to 19 years of age | ||||
4 | 20 to 24 years of age | ||||
5 | 25 to 34 years of age | ||||
6 | 35 to 54 years of age | ||||
7 | 55 to 64 years of age | ||||
8 | 65 to 74 years of age | ||||
9 | 75 years of age or older | ||||
Female | |||||
10-18 | Same age categories as | ||||
groups 1 through 9 | |||||
Persons Not of Hispanic Origin | |||||
19-36 | Same age and sex categories as | ||||
groups 1 through 18 | |||||
Black: | |||||
37-72 | Same age/sex/Hispanic origin | ||||
categories as groups 1 through 36 | |||||
Asian or Pacific Islander: | |||||
73-108 | Same age/sex/Hispanic origin | ||||
categories as groups 1 through 36 | |||||
American Indian, Eskimo or Aleut: | |||||
109-144 | Same age/sex/Hispanic origin | ||||
categories as groups 1 through 36 | |||||
Other Race (includes those races not listed above): |
|||||
145-180 | Same age/sex/Hispanic origin | ||||
categories as groups 1 through 36 |
Within a weighting area, the first step in the estimation procedure was to assign an initial weight to each sample person record. This weight was approximately equal to the inverse of the probability of selecting a person for the census sample.
The next step in the estimation procedure, prior to iterative proportional fitting, was to combine categories in each of the four estimation stages, when needed, to increase the reliability of the ratio estimation procedure. For each stage, any group that did not meet certain criteria for the unweighted sample count or for the ratio of the 100 percent to the initially weighted sample count was combined or collapsed with another group in the same stage according to a specified collapsing pattern. At the forth stage, an additional criterion concerning the number of 100-percent persons in each race/Hispanic origin category was applied.
As the final step, the initial weights underwent four stages of ratio adjustment applying the grouping procedures described above. At the first stage, the ratio of the 100 percent to the sum of the initial weights for each sample person was computed for each stage I group. The initial weight assigned to each person in a group was then multiplied by the stage I group ratio to produce an adjusted weight.
In stage II, the stage I adjusted weights were again adjusted by the ratio of the 100-percent to the sum of the stage I weights for sample persons in each stage II group. Next, at stage III, the stage II weights were adjusted by the ratio of the 100-percent to the sum of the stage II weights for sample persons in each stage III group. Finally, at stage IV, the stage III weights were adjusted by the ratio of the 100-percent to the sum of the stage III weights for sample persons in each stage IV group. The four stages of ratio adjustment were performed two times (two iterations) in the order given above. The weights obtained from the second iteration for stage IV were assigned to the sample person records. However, to avoid complications in rounding for tabulated data, only whole number weights were assigned. For example, if the final weight of the persons in a particular group was 7.25 then 1/4 of the sample persons in this group were randomly assigned a weight of 8, while the remaining 3/4 received a weight of 7.
The ratio estimation procedure for housing units was essentially the same as that for persons except that vacant units were treated differently. The occupied housing unit ratio estimation procedure was done in four stages, and the vacant housing unit ratio estimation procedure was done in a single stage. The first stage for occupied housing units applied 16 household type categories, while the second stage used the two sampling categories described above for persons. The third stage applied three units-in- building categories, i.e. single units, multiunit less than 10 and multiunit 10 or more. The fourth stage could potentially use 200 tenure/race/Hispanic origin/ rent value groups. The stages for ratio estimation for housing units were as follows:
Occupied Housing Units | |||||
---|---|---|---|---|---|
Stage I. Type of Household | |||||
Group | Housing Units with a Family with Own | ||||
Children Under 18 | |||||
1 | 2 persons in housing unit | ||||
2 | 3 persons in housing unit | ||||
3 | 4 persons in housing unit | ||||
4 | 5 to 7 persons in housing unit | ||||
5 | 8 or more in housing unit | ||||
Housing Units with a Family without | |||||
Own Children Under 18 | |||||
6-10 | 2 through 8 or more persons in | ||||
housing unit | |||||
All Other Housing Units | |||||
11 | 1 person in housing unit | ||||
12-16 | 2 persons in housing unit through | ||||
8 or more persons in housing unit | |||||
Stage II. Sampling Rates | |||||
1 | Sampling rate of 1-in-2 | ||||
2 | Sampling rate less than 1-in-2 | ||||
Stage III. Units in Building | |||||
1 | Single unit structure | ||||
Multiunit consisting of: | |||||
2 | Fewer than 10 individual units | ||||
3 | 10 or more individual units | ||||
Stage IV. Tenure/Race & Origin of Householder/Value of Rent | |||||
Group | OWNER | ||||
White Householder: | |||||
Householder of Hispanic Origin | |||||
Value of Housing Unit: | |||||
1 | Less than $ 20,000 | ||||
2 | $ 20,000 to $ 39,999 | ||||
3 | $ 40,000 to $ 59,999 | ||||
4 | $ 60,000 to $ 79,999 | ||||
5 | $ 80,000 to $ 99,999 | ||||
6 | $ 100,000 to $ 149,999 | ||||
7 | $ 150,000 to $ 249,999 | ||||
8 | $ 250,000 to $ 299,999 | ||||
9 | $ 300,000+ | ||||
10 | Other | ||||
Householder Not of Hispanic Origin | |||||
11-20 | Same value categories as | ||||
groups 1 through 11 |
Black Householder: | |||||
21-40 | Same Hispanic origin/value | ||||
categories as groups 1 through 20 | |||||
Asian or Pacific Islander Householder: | |||||
41-60 | Same Hispanic origin/value | ||||
categories as groups 1 through 20 | |||||
American Indian, Eskimo or Aleut Householder: | |||||
61-80 | Same Hispanic origin/value | ||||
categories as groups 1 through 20 | |||||
Other Race Householder(includes those races not listed above): | |||||
81-100 | Same Hispanic origin/value | ||||
categories as groups 1 through 20 | |||||
Group | RENTER | ||||
White Householder: | |||||
Householder of Hispanic Origin | |||||
Rent Categories: | |||||
101 | Less than $ 100 | ||||
102 | $ 100 to $ 199 | ||||
103 | $ 200 to $ 299 | ||||
104 | $ 300 to $ 399 | ||||
105 | $ 400 to $ 499 | ||||
106 | $ 500 to $ 599 | ||||
107 | $ 600 to $ 749 | ||||
108 | $ 750 to $ 999 | ||||
109 | $ 1,000+ | ||||
110 | No cash rent | ||||
Householder Not of Hispanic Origin | |||||
111-120 | Same rent categories as groups | ||||
101 through 110 | |||||
Black Householder: | |||||
121-140 | Same Hispanic origin/rent categories | ||||
as groups 101 through 120. | |||||
Asian or Pacific Islander Householder: | |||||
141-160 | Same Hispanic origin/rent categories | ||||
as groups 101 through 120. | |||||
American Indian, Eskimo, or Aleut Householder: | |||||
161-180 | Same Hispanic origin/rent categories | ||||
as groups 101 through 120. | |||||
Other Race Householder: | |||||
181-200 | Same Hispanic origin/rent categories | ||||
as groups 101 through 120. | |||||
Vacant Housing Unit | |||||
Group | |||||
1 | Vacant for Rent | ||||
2 | Vacant for Sale | ||||
3 | Other Vacant |
The estimates produced by this procedure realize some of the gains in sampling efficiency that would have resulted if the population had been stratified into the ratio-estimation group before sampling, and the sampling rate had been applied independently to each group. The net effect is a reduction in both the standard error and the possible bias of most estimated characteristic to levels below what would have resulted from simply using the initial (unadjusted) weight. A by-product of this estimation procedure is that the estimates from the sample will for the most part be consistent with the 100-percent figures for the population and housing unit groups used in the estimation procedure.
Selection of the Public Use Microdata Samples
A stratified systematic selection procedure with equal probability was used to select each of the public use microdata samples. The sampling universe was defined as all occupied housing units including all occupants, vacant housing units, and GQ persons in the census sample. The sample units were stratified during the selection process. The stratification was intended to improve the reliability of estimates derived from the public use microdata. samples by defining strata within which there is a high degree of homogeneity among the census sample households with respect to characteristics of major interest.
A total of 1,049 strata were defined: 936 household strata, 104 strata for GQ persons, and 9 strata for vacant housing units. First, the units were divided into three major groups: households, vacant housing units, and GQ population. The household universe was stratified by family type and non-family, race/Hispanic origin of the householder, tenure, and age within sampling stratum. For the census sample selection the population was stratified by geographic size into three sampling strata, i.e., units in small tract/BNAs were sampled at 1-in-6, and the remainder of the units were sampled at 1-in-8.
The vacant housing units universe was stratified by vacancy status and sampling rate. Finally, the GQ population was stratified by GQ type (institutions, non-institutions), race, Hispanic origin, and age. The stratification matrices are defined in tables 1, 2, and 3.
Subsampling the PUMS Files
The sample selection procedures were as follows: The number of 1-percent public use microdata samples for a given state was determined by the full census sample size for that state. For instance, if the full census sample for a state was 20 percent, then the census sample was divided into 20 subsamples of equal size. The 1-percent public use microdata sample was designated at random from the 20 subsamples. From the remaining 19 subsamples, five 1 percent subsamples were designated at random and merged to produce the 5-percent public use microdata sample. The 3-percent elderly public use microdata sample was produced in the same way as the 5-percent but required an extra step. The three subsamples were merged and the elderly household and person records, households with at least one person age 60 years or more, or GQ persons 60 years and older, were selected and designated as the elderly PUMS file.
During the sample selection operation, consecutive two-digit subsample numbers from 00 to 99 were assigned to each sample case in the 5-percent and 1-percent samples to allow for the designation of various size subsamples and to allow for the calculation of standard error. As an example, for a 1-percent public use microdata sample, the choice of records having subsample numbers with the same "units" digit (e.g., the two "units" digit includes subsample numbers (2,12,22,Ñ92) will provide a 1-in-1000 subsample.
Samples of any size between 1/20 and 1/10,000 may be selected in a similar manner by using appropriate two-digit subsample numbers assigned to either of the microdata samples. Care must be exercised when selecting such samples. If only one "units" digit is required, the units digit should be randomly selected. If two "units" digits are required, the first should be randomly selected and the second should be either 5 more or 5 less than the first. Failure to use this procedure, e.g., selection of records with the same "tens" digit instead of records with the same "units" digit, would provide a 1-in-10 subsample but one that would be somewhat more clustered and as a result subject to larger sampling error
Table 1. PUMS Stratification Matrix -- Households | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Household Type | Age | Race & Origin/Household Type/Tenure | |||||||||||||||||||||||||
White/Other | CH | FI | HA | KO | VI | JA | AI | SA | GU | OT | Black/ American Indian, Eskimo or Aleut | ||||||||||||||||
Hispanic Origin | Non-Hispanic Origin | ||||||||||||||||||||||||||
Family with own children under 18 | 0-59 | O | R | O | R | O | R | O | R | O | R | O | R | O | R | O | R | O | R | O | R | O | R | O | R | O | R |
60-74 | |||||||||||||||||||||||||||
75-89 | |||||||||||||||||||||||||||
90+ | |||||||||||||||||||||||||||
Family without own children under 18 | 0-59 | ||||||||||||||||||||||||||
60-74 | |||||||||||||||||||||||||||
75-89 | |||||||||||||||||||||||||||
90+ | |||||||||||||||||||||||||||
Other Household (non-family) | 0-59 | ||||||||||||||||||||||||||
60-74 | |||||||||||||||||||||||||||
75-89 | |||||||||||||||||||||||||||
90+ |
Table 2. PUMS Stratification Matrix -- Vacant Housing Units | |||
---|---|---|---|
Sampling Rate | |||
1-in-2 | 1-in-6 | 1-in-8 | |
Vacant, for sale | |||
Vacant, for rent | |||
Vacant, other |
Table 3. PUMS Stratification Matrix -- Group Quarters | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GQ Type/ Race/ Hispanic Origin/ Age | Institutional/Military | Non-Institutional/ Military | ||||||||||||
White/Other | Black/ American Indian/ Eskimo or Aleut | CH | FI | HA | KO | VI | JA | AI | SA | GU | OT | Repeat Race/ Origin Groups | ||
Hispanic | Non- Hispanic | |||||||||||||
0-59 | ||||||||||||||
60-74 | ||||||||||||||
75-89 | ||||||||||||||
90+ |
ENDNOTES:
- Originally published as "Chapter 4, Sample Design and Estimation," 1990 Census of Population and Housing: Public-use Microdata Samples Technical Documentation, U.S. Department of Commerce, Bureau of the Census, Washington, DC, 1992, pp. 4-1 to 4-7.