1990 Sample Design and Estimation¹

Go Back to Sampling Procedures Index

This chapter discusses the selection procedure for the public use microdata samples in terms of four major operations: (1) the selection of the full 1990 census sample, (2) the estimation procedure for the full census sample, and (3) the selection of the public use microdata samples from the persons and housing units included in the full 1990 census sample, and (4) estimation for the PUMS samples

Producing Estimates or Tabulations

Estimation of totals and percentages. The 1980 Public Use Microdata Samples (PUMS) were self- weighted. It is very important to note that the 1990 PUMS samples are not self-weighted. To produce estimates on tabulations of 100 percent characteristics from the PUMS files, simply add the weights of all persons or housing units that possess the characteristic of interest. For instance, if the characteristic of interest is total number of Hispanic males aged 5-17, simply determine the sex, age, and Hispanic origin of all persons and cumulate the weights of those who match the characteristic of interest. The PUMS weight is a function of the full census sample weight and the PUMS sample design.

To get estimates of proportions simply divide the weighted estimate of persons or housing units with a given characteristic by the base sample estimate. For example, the proportion of owner-occupied housing units with plumbing is obtained by dividing the PUMS estimate of owner-occupied housing units with plumbing facilities by the PUMS estimates of total housing units.

To get estimates of characteristics such as the total number of related children in households (for housing unit level aggregates), simply multiply the PUMS weight by the value of the characteristic and sum across all household records. If the desired estimate is the number of households with at least one related child in the household, add the PUMS householder weight for all households with a value not equal to zero for the characteristic.

Sample Design

Every person and housing unit in the United States was asked certain basic demographic questions (for example, race, age, relationship, housing value, or rent). A sample of these persons and housing units was asked more detailed questions about such items as income, occupation, and housing costs in addition to the basic demographic and housing information. The primary sampling unit for the 1990 census was the housing unit, including all occupants. For persons living in group quarters, the sampling unit was the person. Persons in group quarters were sampled at a 1-in-6 rate.

The sample designation method depended on the data collection procedures. Approximately 95 percent of the population was enumerated by the mailback procedure. In these areas, the Bureau of the Census either purchased a commercial mailing list, which was updated by the United States Postal Service and Census Bureau field staff, or prepared a mailing by canvassing and listing each address in the area prior to Census Day. These lists were computerized and the appropriate units were electronically designated as sample units. The questionnaires were either mailed or hand delivered to the addresses with instructions on how to complete and mail back the form. Housing units in governmental units with a precensus (1988) estimated population of fewer than 2,500 persons were sampled at 1-in-2. Governmental units were defined for sampling purposes as all incorporated places, all counties, all county equivalents such as parishes in Louisiana, and all minor civil divisions in Connecticut, Maine, Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, and Wisconsin. Housing units in tracts and block numbering areas (BNAs) with a precensus housing unit count below 2,000 housing units were sampled at 1-in-6 for those portions not in small governmental units. Housing units within census tracts and BNAs with 2,000 or more housing units were sampled at 1-in-8 for those portions not in small governmental units.

In list/enumerate areas (about 5 percent of the population) each enumerator was given a blank address register with designated sample lines. Beginning about Census Day, the enumerator systematically canvassed the area and listed all housing units in the address register in the order they were encountered. Completed questionnaires, including sample information for any housing unit listed on a designated sample line, were collected. For all governmental units with fewer than 2,500 persons in list/enumerate areas, a 1-in-2 sampling rate was used. All other list/enumerate areas were sampled at 1-in-6.

Housing units in American Indian reservations, Tribal Jurisdiction Statistical areas, and Alaska Native villages were sampled according to the same criteria as other governmental units, except the sampling rates were based on the size of the American Indian and Alaska native population in those areas as measured in the 1980 census. Trust lands were sampled at the same rate as their associated American Indian reservations. Census designated places in Hawaii were sampled at the same rate as governmental units because the Census Bureau does not recognize incorporated places in Hawaii.

The purpose of using variable sampling rates was to provide relatively more reliable estimates for small areas and decrease respondent burden in more densely populated areas while maintaining data reliability. When all sampling rates were taken into account across the nation, approximately one out of every six housing units in the nation was included in the 1990 census sample.

Estimation Procedure

Estimates from the census sample were obtained from an iterative ratio estimation procedure (iterative proportional fitting) resulting in the assignment of a weight to each sample person or housing unit record. For any given tabulation area, a characteristic total was estimated by summing the weights assigned to the persons or housing units possessing the characteristic in the tabulation area. Estimates of family or household characteristics were based on the weight assigned to the person designated as householder. Each sample person or housing unit record was assigned exactly one weight to be used to produce estimates of all characteristics. For example, if the weight given to a sample person or housing unit had the value 6, all characteristics of that person or housing unit would be tabulated with the weight of 6. The estimation procedure, however, did assign weights varying from person to person or housing unit to housing unit. The estimation procedure used to assign the weight was performed in geographically defined "weighting areas". Weighting areas were generally formed of contiguous portions of geography which closely agreed with census tabulation areas within counties. Weighting areas were never allowed to cross state or county boundaries. In small counties with a sample unit below 400 persons, the minimum required sample condition was relaxed to permit the entire county to become a weighting area.

Within a weighting area, the ratio estimation procedure for persons was performed in four stages. For persons, the first stage applied 17 household-type groups. The second stage used two groups: sampling rate of 1-in-2 and sampling rate below 1-in-2. The third stage used the dichotomy householder/non-householder. The fourth stage applied 180 aggregate age/sex/Hispanic origin/race categories.

The stages were as follows:

Stage I. Type of Household

Perons in Housing Units with a Family

with Own Children Under 18

1: 2 persons in housing unit
2: 3 persons in housing unit
3: 4 persons in housing unit
4: 5 to 7 persons in housing unit
5: 8 or more persons in housing unit

without Own Children Under 18

6-10: 2 through 8 or more persons in housing unit

Persons in All Other Housing Units

11: 1 person in housing unit
12-16: 2 through 8 or more persons in housing unit

Persons in Group Quarters

Stage II. Sampling Rates

1: Sampling rate of 1-in-2
2: Sampling rate of less than 1-in-2

Stage III. Householder/Nonhouseholder

1: Householder
2: Nonhouseholder

Stage IV. Age/Sex/Hispanic Origin/Race

White

Persons of Hispanic Origin

Male

1: 0 to 4 years of age
2: 5 to 14 years of age
3: 15 to 19 years of age
4: 20 to 24 years of age
5: 25 to 34 years of age
6: 35 to 54 years of age
7: 55 to 64 years of age
8: 65 to 74 years of age
9: 75 years of age or older

Female

10-18: Same age categories as Males of Hispanic Origin

Persons Not of Hispanic Origin

Male

19-27: Same age categories as Males of Hispanic Origin

Female

28-36: Same age categories as Males of Hispanic Origin

Black

Persons of Hispanic Origin

Male

37-45: Same age categories as Males of Hispanic Origin

Female

46-54: Same age categories as Males of Hispanic Origin

Persons Not of Hispanic Origin

Male

55-63: Same age categories as Males of Hispanic Origin

Female

64-72: Same age categories as Males of Hispanic Origin

Asian or Pacific Islander

Persons of Hispanic Origin

Male

73-81: Same age categories as Males of Hispanic Origin

Female

82-90: Same age categories as Males of Hispanic Origin

Persons Not of Hispanic Origin

Male

91-99: Same age categories as Males of Hispanic Origin

Female

100-108: Same age categories as Males of Hispanic Origin

American Indian, Eskimo, or Aleut

Persons of Hispanic Origin

Male

109-117: Same age categories as Males of Hispanic Origin

Female

118-126: Same age categories as Males of Hispanic Origin

Persons Not of Hispanic Origin

Male

127-135: Same age categories as Males of Hispanic Origin

Female

136-144: Same age categories as Males of Hispanic Origin

Other Race (includes those races not listed above)

Persons of Hispanic Origin

Male

145-153: Same age categories as Males of Hispanic Origin

Female

154-162: Same age categories as Males of Hispanic Origin

Persons Not of Hispanic Origin

Male

163-171: Same age categories as Males of Hispanic Origin

Female

172-180: Same age categories as Males of Hispanic Origin

Within a weighting area, the first step in the estimation procedure was to assign an initial weight to each sample person record. This weight was approximately equal to the inverse of the probability of selecting a person for the census sample.

The next step in the estimation procedure, prior to iterative proportional fitting, was to combine categories in each of the four estimation stages, when needed, to increase the reliability of the ratio estimation procedure. For each stage, any group that did not meet certain criteria for the unweighted sample count or for the ratio of the 100 percent to the initially weighted sample count was combined or collapsed with another group in the same stage according to a specified collapsing pattern. At the forth stage, an additional criterion concerning the number of 100-percent persons in each race/Hispanic origin category was applied.

As the final step, the initial weights underwent four stages of ratio adjustment applying the grouping procedures described above. At the first stage, the ratio of the 100 percent to the sum of the initial weights for each sample person was computed for each stage I group. The initial weight assigned to each person in a group was then multiplied by the stage I group ratio to produce an adjusted weight.

In stage II, the stage I adjusted weights were again adjusted by the ratio of the 100-percent to the sum of the stage I weights for sample persons in each stage II group. Next, at stage III, the stage II weights were adjusted by the ratio of the 100-percent to the sum of the stage II weights for sample persons in each stage III group. Finally, at stage IV, the stage III weights were adjusted by the ratio of the 100-percent to the sum of the stage III weights for sample persons in each stage IV group. The four stages of ratio adjustment were performed two times (two iterations) in the order given above. The weights obtained from the second iteration for stage IV were assigned to the sample person records. However, to avoid complications in rounding for tabulated data, only whole number weights were assigned. For example, if the final weight of the persons in a particular group was 7.25 then 1/4 of the sample persons in this group were randomly assigned a weight of 8, while the remaining 3/4 received a weight of 7.

The ratio estimation procedure for housing units was essentially the same as that for persons except that vacant units were treated differently. The occupied housing unit ratio estimation procedure was done in four stages, and the vacant housing unit ratio estimation procedure was done in a single stage. The first stage for occupied housing units applied 16 household type categories, while the second stage used the two sampling categories described above for persons. The third stage applied three units-in- building categories, i.e. single units, multiunit less than 10 and multiunit 10 or more. The fourth stage could potentially use 200 tenure/race/Hispanic origin/ rent value groups. The stages for ratio estimation for housing units were as follows:

Occupied Housing Units

Stage 1. Type of Household

Housing Units with a Family with Own Children Under 18

1: 2 persons in housing unit
2: 3 persons in housing unit
3: 4 persons in housing unit
4: 5 to 7 persons in housing unit
5: 8 or more persons in housing unit

Housing Units with a Family without Own Children Under 18

6-10: 2 through 8 or more persons in housing unit

Persons in All Other Housing Units

11: 1 person in housing unit
12-16: 2 through 8 or more persons in housing unit

Stage II. Sampling Rates

1: Sampling rate of 1-in-2
2: Sampling rate of less than 1-in-2

Stage III. Units in Building

1: Single unit structure
2: Multiunit consisting of < 10 individual units
3: Multiunit consisting of >= 10 individual units

Stage IV. Tenure/Race & Origin of Householder/Value of Rent

OWNER

White Householder

Householder of Hispanic Origin

Value of Housing unit

1: Less than $ 20,000
2: $ 20,000 to $ 39,999
3: $ 40,000 to $ 59,999
4: $ 60,000 to $ 79,999
5: $ 80,000 to $ 99,999
6: $ 100,000 to $ 149,999
7: $ 150,000 to $ 249,999
8: $ 250,000 to $ 299,999
9: $ 300,000 +
10: Other

Householder Not of Hispanic Origin

Value of Housing unit

11-20: Same value categories as White Householder of Hispanic Origin

Black Householder

Householder of Hispanic Origin

Value of Housing unit

21-30: Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Value of Housing unit

31-40: Same value categories as White Householder of Hispanic Origin

Asian or Pacific Islander Householder

Householder of Hispanic Origin

Value of Housing unit

41-50: : Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Value of Housing unit

51-60: Same value categories as White Householder of Hispanic Origin

American Indian, Eskimo, or Aleut Householder

Householder of Hispanic Origin

Value of Housing unit

61-70: Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Value of Housing unit

71-80: Same value categories as White Householder of Hispanic Origin

Other Race Householder

Householder of Hispanic Origin

Value of Housing unit

81-90: Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Value of Housing unit

91-100: Same value categories as White Householder of Hispanic Origin

RENTER

White Householder

Householder of Hispanic Origin

Rent Categories

101: Less than $100
102: $ 100 to $ 199
103: $ 200 to $ 299
104: $ 300 to $ 399
105: $ 400 to $ 499
106: $ 500 to $ 599
107: $ 600 to $ 749
108: $ 750 to $ 999
109: $ 1,000 +
110: No cash rent

Householder Not of Hispanic Origin

Rent Categories

111-120: Same value categories as White Householder of Hispanic Origin

Black Householder

Householder of Hispanic Origin

Rent Categories

121-130: Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Rent Categories

131-140: Same value categories as White Householder of Hispanic Origin

Asian or Pacific Islander Householder

Householder of Hispanic Origin

Rent Categories

141-150: : Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Rent Categories

151-160: Same value categories as White Householder of Hispanic Origin

American Indian, Eskimo, or Aleut Householder

Householder of Hispanic Origin

Rent Categories

161-170: Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Rent Categories

171-180: Same value categories as White Householder of Hispanic Origin

Other Race Householder

Householder of Hispanic Origin

Rent Categories

181-190: Same value categories as White Householder of Hispanic Origin

Householder Not of Hispanic Origin

Rent Categories

191-200: Same value categories as White Householder of Hispanic Origin

VACANT HOUSING UNIT

1: Vacant for rent
2: Vacant for sale
3: Other vacant

The estimates produced by this procedure realize some of the gains in sampling efficiency that would have resulted if the population had been stratified into the ratio-estimation group before sampling, and the sampling rate had been applied independently to each group. The net effect is a reduction in both the standard error and the possible bias of most estimated characteristic to levels below what would have resulted from simply using the initial (unadjusted) weight. A by-product of this estimation procedure is that the estimates from the sample will for the most part be consistent with the 100-percent figures for the population and housing unit groups used in the estimation procedure.

Selection of the Public Use Microdata Samples

A stratified systematic selection procedure with equal probability was used to select each of the public use microdata samples. The sampling universe was defined as all occupied housing units including all occupants, vacant housing units, and GQ persons in the census sample. The sample units were stratified during the selection process. The stratification was intended to improve the reliability of estimates derived from the public use microdata. samples by defining strata within which there is a high degree of homogeneity among the census sample households with respect to characteristics of major interest.

A total of 1,049 strata were defined: 936 household strata, 104 strata for GQ persons, and 9 strata for vacant housing units. First, the units were divided into three major groups: households, vacant housing units, and GQ population. The household universe was stratified by family type and non-family, race/Hispanic origin of the householder, tenure, and age within sampling stratum. For the census sample selection the population was stratified by geographic size into three sampling strata, i.e., units in small tract/BNAs were sampled at 1-in-6, and the remainder of the units were sampled at 1-in-8.

The vacant housing units universe was stratified by vacancy status and sampling rate. Finally, the GQ population was stratified by GQ type (institutions, non-institutions), race, Hispanic origin, and age. The stratification matrices are defined in tables 1, 2, and 3.

Subsampling the PUMS Files

The sample selection procedures were as follows: The number of 1-percent public use microdata samples for a given state was determined by the full census sample size for that state. For instance, if the full census sample for a state was 20 percent, then the census sample was divided into 20 subsamples of equal size. The 1-percent public use microdata sample was designated at random from the 20 subsamples. From the remaining 19 subsamples, five 1 percent subsamples were designated at random and merged to produce the 5-percent public use microdata sample. The 3-percent elderly public use microdata sample was produced in the same way as the 5-percent but required an extra step. The three subsamples were merged and the elderly household and person records, households with at least one person age 60 years or more, or GQ persons 60 years and older, were selected and designated as the elderly PUMS file.

During the sample selection operation, consecutive two-digit subsample numbers from 00 to 99 were assigned to each sample case in the 5-percent and 1-percent samples to allow for the designation of various size subsamples and to allow for the calculation of standard error. As an example, for a 1-percent public use microdata sample, the choice of records having subsample numbers with the same "units" digit (e.g., the two "units" digit includes subsample numbers (2,12,22, 92) will provide a 1-in-1000 subsample.

Samples of any size between 1/20 and 1/10,000 may be selected in a similar manner by using appropriate two-digit subsample numbers assigned to either of the microdata samples. Care must be exercised when selecting such samples. If only one "units" digit is required, the units digit should be randomly selected. If two "units" digits are required, the first should be randomly selected and the second should be either 5 more or 5 less than the first. Failure to use this procedure, e.g., selection of records with the same "tens" digit instead of records with the same "units" digit, would provide a 1-in-10 subsample but one that would be somewhat more clustered and as a result subject to larger sampling error

Table 1. PUMS Stratification Matrix -- Households

Household Type	Age	Race & Origin/Household Type/Tenure
				White/Other Hispanic Origin		White/Other Non-Hispanic Origin		CH		FI		HA		KO		VI		JA		AI		SA		GU		OT		Black/ American Indian, Eskimo or Aleut
		Family with own children under 18	0-59	O	R	O	R	O	R	O	R	O	R	O	R	O	R	O	R	O	R	O	R	O	R	O	R	O	R
60-74
75-89
90+
Family without own children under 18	0-59
	60-74
	75-89
	90+
Other Household (non-family)	0-59
	60-74
	75-89
	90+

Table 2. PUMS Stratification Matrix -- Vacant Housing Units

Vacancy Status
Vacancy Status	1-in-2	1-in-6	1-in-8
Vacant, for sale
Vacant, for rent
Vacant, other

Table 3. PUMS Stratification Matrix -- Group Quarters

GQ Type/ Race/ Hispanic Origin/ Age	Institutional/Military											Non-Institutional/ Military
	White/Other - Hispanic	White/Other - Non-Hispanic	Black/ American Indian/ Eskimo or Aleut	CH	FI	HA	KO	VI	JA	AI	SA	GU	OT	Repeat Race/ Origin Groups
	0-59		Black/ American Indian/ Eskimo or Aleut	CH	FI	HA	KO	VI	JA	AI	SA	GU	OT	Repeat Race/ Origin Groups
60-74
75-89
90+

ENDNOTES:

Originally published as "Chapter 4, Sample Design and Estimation," 1990 Census of Population and Housing: Public-use Microdata Samples Technical Documentation, U.S. Department of Commerce, Bureau of the Census, Washington, DC, 1992, pp. 4-1 to 4-7.

Go Back to Sampling Procedures Index

1990 Sample Design and Estimation1

Producing Estimates or Tabulations

Sample Design

Estimation Procedure

Stage I. Type of Household

Stage II. Sampling Rates

Stage III. Householder/Nonhouseholder

Stage IV. Age/Sex/Hispanic Origin/Race

Occupied Housing Units

Stage 1. Type of Household

Stage II. Sampling Rates

Stage III. Units in Building

Stage IV. Tenure/Race & Origin of Householder/Value of Rent

OWNER

RENTER

VACANT HOUSING UNIT

Selection of the Public Use Microdata Samples

Subsampling the PUMS Files

Table 1. PUMS Stratification Matrix -- Households

Table 2. PUMS Stratification Matrix -- Vacant Housing Units

Table 3. PUMS Stratification Matrix -- Group Quarters

ENDNOTES:

Supported By

1990 Sample Design and Estimation¹