1960 Sample Design and Sampling Variability1
A. Sample Design
The one-in-one-hundred sample is a subsample of the 5-percent sample used for some of the tabulations of the 1960 Census of Population. The 5-percent sample was selected from the original 25-percent sample households using a stratified, systematic sample design. The one-in-one-hundred, one-in-one-thousand, and one-in-ten-thousand samples are designed to take advantage of the selection and estimation processes used for the two larger samples. Selection and estimation at each of these levels is described in more detail below:
The 25-Percent Sample
Selection of the 25-Percent Sample. For persons living in housing units at the time of the 1960 Census, the sampling unit was a housing unit along with all of its occupants; for persons in group quarters, it was the person. On the first visit to an address the enumerator assigned a sample key letter (A, B, C, or D) to each housing unit sequentially in the order that he first visited the unit -- whether or not he completed the interview. Each interviewer was given a random key letter to start his assignment. The order of canvassing was indicated in advance, although the instructions allowed some latitude in the order of visiting addresses. Each housing unit to which the key letter "A" was assigned was designated as a sample unit and all persons enumerated in the unit were included in the sample. In group quarters, the sample consisted of every fourth person in the order listed. Although the sampling procedures did automatically insure an exact 25-percent sample of persons, the sample design was unbiased if carried through according to instructions. Biases my have arisen, however, when the enumerator failed to follow his listing and sample instructions exactly.
Estimation Procedure for the 1960 Census 25-Percent Sample. Statistics based on the 25-percent sample were estimated through the use of a ratio- estimation procedure. This procedure was carried out or each of the following 44 groups of persons in each of the smallest weighting areas (SWA's).
Person Categories for Smallest Weighting Areas (SWAís)
|Group||Sex and Color||Age||Relationship and Tenure|
|1||Male, White||Under 5|
|3||14-24||Head of owner household|
|4||14-24||Head of renter household|
|5||14-24||Not head of household|
|6||25-44||Head of owner household|
|7||25-44||Head of renter household|
|8||25-44||Not head of household|
|9||45 and over||Head of owner household|
|10||45 and over||Head of renter household|
|11||45 and over||Not head of household|
|12-22||Male, Nonwhite||(repeat age categories)||(repeat relationship categories)|
|23-33||Female, White||(repeat age categories)||(repeat relationship categories)|
|34-44||Female, Nonwhite||(repeat age categories)||(repeat relationship categories)|
The SWA's established for the 25-percent sample were the smallest separate geographic areas for which statistics had to be prepared in order to produce data for any of the geographic tabulations planned for the Census. Estimates for such areas were combined so as to produce all of the geographical detail required for the publication program of the Census, such as places of 2,500 inhabitants or more, Urbanized Areas, Standard Metropolitan Statistical Areas, or census tracts. Typical examples of SWA's are census tracts (in tracted cities), complete cities in smaller urban places, urban fringe areas defined outside of large cities, the rural balance in a minor civil division, etc. There were roughly 33,000 SWA's involved in the estimation procedure used for the 25-percent sample.
Estimates of characteristics from the sample for a given weighting area are produced using the formula:
where x´ is the estimate of the characteristic for a weighting area obtained through the use of the ratio-estimation procedure;
xi is the count of sample persons with the characteristic in one of the 44 groups (group i) within the SWA;
Yi is the count of all sample persons for the area in the same one of the 44 groups;
and Yi is the complete Census count of persons in the same one of the 44 groups in the SWA.
For each of the 44 groups, the ratio of the complete Census count to the sample count of the population in the group was determined. Each specific sample person in the group was assigned an integral weight so that the sum of the weights would equal the complete count for the group. For example, if the ratio for a group was 4.1, one-tenth of the persons (selected at random) within the group were assigned a weight of 5, and the remaining nine-tenths a weight of 4. The use of such a combination of integral weights rather than a single fractional weight was adopted to avoid the complications involved in rounding in the final tables. Where there were fewer than 50 persons in the complete count in a group or where the resulting weight was over 16, groups were combined in a specific order to satisfy both of these two conditions and a common weight used for the combined groups.
These ratio estimates reduce the component of sampling error arising from the variation in the size of household and achieve many of the gains of stratification in the selection of the sample with the strata being the groups for which separate ratio estimates are computed. The net effect is a reduction in the sampling error and bias of most statistics below what would be obtained by weighting the results of the 25-percent sample by a uniform factor of four. A by-product of this type of estimation procedure is that estimates for the sample are generally consistent with the complete count with respect to the total population and for the subdivisions used as groups in the estimation procedure.
The 5-Percent Population Sample
Selection of the 5-Percent Population Sample. For some tabulations, a subsample of one-fifth of the original 25-percent sample schedules was selected by the computer using a stratified, systematic, sample design. The strata were made up as follows: For persons in regular housing units, there were 36 strata -- 9 household-size groups (1, 2, 3, 4, 5, 6, 7, 8, and 9 or more persons per household) by two tenure groups, by two color groups (white or nonwhite): for persons in group quarters there were two strata (two color groups). Within each of these 38 strata the computer selected the sample by calculating the weight assigned (by the ratio estimation process for the 25-percent sample) to the household head and selecting the household that caused the cumulative weight to pass a multiple of 20.
Estimation Procedure for the 5-Percent Sample. Statistics based on the 5-percent sample were estimated through the use of a ratio- estimation procedure. The procedure used for this sample was similar to the estimation process described for the 25-percent sample, with two important differences. First, larger SWA's were used for the 5-percent sample than for the 25-percent sample. They were defined as the combined total of areas within a state comprising central cities of urbanized areas, the remaining portions of urbanized areas, urban places not in urbanized areas, and rural areas. However, each urbanized area of more than 1,000,000 inhabitants made up two SWA's: the central city and the balance. Second, groups were sometimes combined in a specific order during the ratio- estimation process as in the 25-percent sample. For the 5-percent sample, this was done when there were fewer than 275 persons in the complete count in a group, where the resulting weight was over 80.
Public Use Samples
Selection of the One-in-One-Hundred Sample. The one-in-one-hundred sample was selected as a subsample of the 5-percent population sample, using a systematic selection of one-in-five within each of 38 strata. The strata are shown in the table on the next page; they differ slightly from the strata used in the selection of the 5-percent sample. The subsampling was done in such a way as to take into consideration the weights assigned in the ratio-estimation procedures used in the 5-percent population sample described above. Within each stratum, using random-start numbers in the range 0-99 (i.e., those in the table), the 5-percent weights for each household head (or group quarters person) were accumulated and the entire household (or group quarters person) was selected each time the sum passed a multiple of 100.
Selection of the One-in-One-Thousand Sample. At the time a household or group quarters person was selected in the one-in-one-hundred sample, it was assigned a subsample number within each of the 38 strata. These subsample numbers range from 00 to 99. A sample of one-in-one-thousand was selected using the units digit of the subsample numbers. Thus, the one-in-one-thousand sample is a stratified systematic subsample of the one-in-one-hundred sample. Each record selected contains the subsample units digit 2 from Item H93-94 (Subsample Number).
Selection of the One-in-Ten-Thousand Sample. The one-in-ten-thousand sample is a subsample of the one-in-one-hundred sample consisting of those households and group quarters persons assigned one of the subsample numbers 00 to 99. The one-in-ten- thousand sample is not a subsample of the one-in-one- thousand sample. Records selected contain the subsample number 79, from Item H93-94 (Subsample Number).
Strata Used in Selecting the Sample. The records in the source file were grouped by households, each household record carrying a separate weight. These weights ranged around a value of 20. The weight of each household head or group quarters person was added to the start number or to the previous accumulation for the applicable stratum. Every time the total in the stratum accumulated to an additional hundred, the household head and al1 members of his household were selected for inclusion in the sample. Group quarters persons were selected without regard to other members of the group quarters whenever the accumulated total passed an additional hundred. Thirty-eight strata with their random start numbers are shown on the next page.
Selection of Intermediate Samples. Samples in the range between one-in-one-hundred and one-in-ten-thousand may be selected by using an appropriate combination of the other samples; for instance, a stratified systematic three-in-one-thousand sample would consist of the households and group quarters comprising three systematically-selected one-in-one- thousand samples. Such intermediate samples are also stratified systematic subsamples of the next larger sample.
Since some of the housing items were collected as systematic four-fifths or one-fifth subsamples of the initial 25-percent sample, the resulting sampling rates for these items are: For the one-in-one-hundred sample, subsamples of one-in-125 and one-in-500, respectively; for the one-in-one-thousand sample, subsamples of one-in-1,250 and one-in-5,000, respectively; and for the one-in-ten-thousand sample, subsamples of one-in-12,500 and 50,000,respectively.
Estimates of true values may be made from sample data by using a simple inflation estimate, that is, by multiplying the sample figure by the reciprocal of the sampling rate. For example, to estimate the total number of persons with a certain population characteristic from a one-in-one-hundred sample, multiply the sample total by 100. Sampling rates for housing data are given in the preceding section. The sampling rate for an intermediate-sized sample is the number of smaller tabulated samples used multiplied by the sampling rate for the smaller sample. Table I at the end of this section, provides estimates of total population, black population and other groups from the one-in-one-hundred sample, compared with the full 1960 Census count.
Stratification Cells and Random Start Numbers
1960 Census Public Use Sample
|Size of Household||White||Nonwhite|
|Owner head||Renter head||Owner head||Renter head|
|1 person household||76||96||13||36|
|2 person household||04||36||84||48|
|3 person household||33||05||17||05|
|4 person household||89||43||47||44|
|5 person household||77||48||43||07|
|6 person household||51||16||09||01|
|7 person household||05||73||60||40|
|8 to 11 person household||66||64||95||01|
|12 or more person household||02||97||17||84|
|Group quarters person||53||60|
C. Sampling Variability
Estimates derived from the sample tabulations are subject to sampling variability, which can be measured by the standard error. The chances are two out of three that the difference (due to sampling variability) between the sample estimate and the figure that would have been obtained from a complete count of the population is less than the standard error . The chances are about 19 out of 20 that the difference is less than twice the standard error and about 99 out of 100 that it is less than 2-1/2 times the standard error. The amount by which the estimated standard error must be multiplied to obtain other odds can be found in most statistical textbooks.
Approximate Standard Errors Using Tables A - D and Tables E-G
Grouping Items by Type of Standard Error. The standard error of an estimate depends on sample size, method of sampling, and the estimation process. In a cluster sample as was used for population items (clustering was by household, and persons in each household comprised the cluster), the standard errors may vary from one type of statistic to another, depending on the homogeneity of the item within a cluster. The effectiveness of stratification also tends to vary among items. As a result, a simple table containing the exact standard errors cannot be constructed for sophisticated designs of this type. Instead, approximations to the standard errors are produced for classes of items which behave in a fairly similar manner. Empirical studies show that for the samples considered here, three classes are sufficient for most analyses. The grouping of items into these three classes is shown in Table A.
Standard errors for the one-in-one-hundred, one-in-one-thousand, and one-in-ten-thousand samples have been produced for three classes of characteristics. The characteristics within the three classes differ mainly in the way they are affected by the homogeneity of persons within a household. The standard errors given in Tables B and E (and described as for Type I) are for characteristics having relatively low homogeneity among persons within households. These are for characteristics which are little affected by the fact that the entire household was selected as the sampling unit; that is, for items such as housing data and population data which would tend to describe only one person in the household (for example, married women, age 25 - 34). Standard errors for characteristics of Type III are given in Tables D and G, and apply to characteristics which tend to cluster by household (i.e., the characteristics tend to describe all or nearly all persons in the household), such as mobility, color, and agricultural residence. Standard errors for characteristics of Type II are shown in Tables C and F, and apply to characteristics which are moderately affected by the sampling of households -- that is, characteristics which tend to lie between the two extremes: items such as number of males and number of employed.
Table A displays examples of how tabulations of various code categories would relate to the three types of standard errors. The table shows, for example, that "highest grade attended" may be considered as a Type I or as a Type II characteristic. Highest grade attended of the household head and highest grade attended for all children aged 6-16 would be Type I characteristics, since the first would be limited to one person-per-household and the second would show a low degree of homogeneity within a household. However, highest grade attended for all persons over 25 would be a Type II characteristic since you would expect some homogeneity among the adults of a household.
Table A also shows that "race" is classified as a characteristic of Type III. The nature of the code categories for race, however used, would usually define households in which either all household members or none would satisfy the condition. Note, however, that a cross-tabulation of item race and single years of age would require the resulting statistics to be treated as a characteristic of Type I.
Many items can be used either to obtain counts of households having a characteristic or to obtain total counts of households with what characteristic. For the former purpose, the characteristic would be treated as a Type I; for the latter, a Type III. For example, the number of households with contract rent under $150 would be a Type I characteristic, while the number of persons in households in this rent category would be a Type III.
Standard Errors of Estimated Numbers and Percentages. Tables B through D contain approximations to the standard errors for the data produced by multiplying the sample count by the reciprocal of the probability of selection -- that is, a factor of 100 for the one-in-one-hundred sample, 1,000 for the one-in-one-thousand sample and 10,000 for the one-in-ten-thousand sample. Tables E through G contain standard errors for percentages calculated from these data.
Standard Errors of Differences. The tables of standard errors are to be applied differently in the three following situations:
- For a difference between the sample figure and one based or. a complete count (e.g., arising from comparisons between one-in-one-hundred sample statistics and complete-count statistics for 1950 or 1970), the standard error is identical with the standard error of the sample estimate alone.
- For a difference between two sample figures (e.g., one from the one-in-one-hundred sample and the other based on a different sample or for a different area based on the same sample), the standard error is approximately the square root of the sum of the squares of the standard error of each estimate considered separately. This formula will represent the actual standard error quite accurately for the difference between estimates of the same characteristic in two different areas, or for the difference between separate and uncorrelated characteristics in the same area. If, however, there is a high positive correlation between the two characteristics, the formula will overestimate the true standard error.
When the sample estimate is a difference between two sample estimates, one of which is a subclass of the other with respect to the same universe, the previous method of finding the standard error of the difference should not be used. Instead, the tables should be used directly by locating the appropriate standard error for an estimate equal to the size of the difference.
For example, it might be desired to estimate the total number of persons who completed junior high school but did not complete high school. The sample estimate for total persons who completed high school is available as is the sample estimate for persons who completed junior-high school. The standard error of this difference should be found by locating the size of the estimated difference in the appropriate table and sample size column.
Sampling Variability of Medians. The sampling variability of a median depends on the size of the base and on the distribution on which the median is based. An approximate method for measuring the reliability of an estimated median is to determine an interval about the estimated median such that there is a stated degree of confidence that the true median lies within the interval. As the first step in estimating the upper and lower limits of the interval (that is, the confidence limits) about the median, compute one-half the number on which the median is based (designated N/2). Look up or compute the standard error of N/2; subtract this standard error from N/2. Cumulate the frequencies (in the table on which the median is based) up to the interval containing the difference between N/2 and its standard error, and by linear interpolation obtain a value corresponding to this number. In a corresponding manner, add the standard error to N/2, cumulate the frequencies in the table, and obtain a value in the table on which the median is based corresponding to the sum of N/2 and its standard error. The chances are two out of three that the median would lie between these two values. The range for 19 chances out of 20 and for 99 in 100 can be computed in a similar manner by multiplying the standard error by the appropriate factors before subtracting and adding to N/2. Interpolation to obtain the values corresponding to these numbers gives the confidence limits for the medians.
Sampling Variability of Estimated Means. The sampling variability of a mean, such as the number of children ever born per 1,000 women, depends on the variability of the distribution on which the mean is based, the size of the sample, the sample design, and the use of ratio estimates. An approximation to the variability of the mean may be obtained as follows: Compute the standard deviation of the distribution on which the mean is based; then multiply this figure by a factor of .8 for characteristics of Type I, 1.1 for Type II, and 1.7 for Type III. Divide this product. by the square root of the sample size.
Nonsampling Errors. Sampling error is only one of the components of the total error of a survey. Further contributions may come, for example, from biases in sample selection, from errors introduced by imputations for non-reporting, and from errors introduced in the coding and other processing of the questionnaires. For estimates of totals representing relatively small proportions of the population, the major component of the survey error tends to be the sampling error. As the estimated totals approach the level of the total population, the sampling errors decrease. This is not necessarily true of the nonsampling errors and they assume a relatively larger role in the total survey error. For this reason, standard errors of totals are not shown for all estimated levels even though the standard errors are actually present. For example, for regional estimates from these samples, the presence of nonsampling errors should be recognized in increasingly larger proportions for estimates close to the total population of the region.
Sampling Variability for the Housing Subsamples. Data obtained from the 20-percent and 5-percent housing are subject to greater sampling variability than data from the 25-percent sample. An approximation to the standard errors of data from these samples can be made by obtaining the standard errors from the appropriate tables and adjusting them for the smaller sample size. For this adjustment, multiply the standard error of the 25-percent sample by 1.1 if the data are from the 20-percent subsample, or by 2.2, if the data are from the 5-percent subsample. Data in the 5-percent sample are items H59, H61-H63 (air conditioning, number of bedrooms, clothes washing machine and clothes dryer), and H65-H68 (home food freezer, television set, house heating fuel, and radio). Items H26, H53, H56-H58, and H64 (basement, number of units in structure, source of water, sewage disposal, number of bathrooms and stories and elevator) are 20-percent sample items. Note that data on item H60 (automobile) were collected as a 20-percent item in large urban areas and as a 5-percent item in other parts of the country. Tabulations of this item for areas encompassing both types of areas, therefore, will require a factor somewhere between 1.1 and 2.2.
Standard errors are usually not computed for every possible statistic of interest produced in a survey, partly because of the high cost of the necessary calculations and because the estimated standard error is a sample statistic also subject to sampling variability. The sampling variability of the estimated standard errors is often relatively large, but more stable estimates can frequently be obtained by combing the estimates of the sampling errors for several statistics known from previous experience to have sampling errors with similar behavior. One way of generalizing estimated sampling errors is to select representative characteristics, calculate the relvariances (the squares of the coefficients of variation) of these characteristics and use them to represent the relvariances of similar characteristics, using a relationship which typically holds concerning the relvariance and the size of the estimate2. Investigation of this relationship has led to the development of a set of regression functions which, experience has shown, represent the behavior of the relvariances fairly well. Curves taking the form of
Vx'2 = a + b
have been used to produce the tables of standard errors for this report. The value Vxí2 is the relvariance of the estimate x', and a and b are parameters of the regression curve. The values of a and b were determined by using a method of successive approximations. The standard errors appearing in Tables B through G were then obtained from the regression curve for the one-in-one-thousand sample. The standard errors for the others were obtained by adjusting the one-in-one-thousand figures to reflect the different sample sizes. To calculate the relvariances required for the above generalization technique, a point estimate of the variance was determined for each of the selected terns using the random group method as follows:
The sample was divided into ten randomly determined mutually-exclusive and exhaustive subsamples (random groups). Each of the random groups is a subsample selected in the sane manner as the full sample. In order to simplify the variance calculations, the number of strata was reduced from 38 to 17. The reduction in the number of strata probably results in a slight overstatement of the variance. The ten random groups were selected by using the terminal digit of the subsample number. Since these subsample numbers were assigned sequentially at the time of selection, the systematic selection process is incorporated in each random group. The formula used to estimate the variance of a total for each Census region follows (the variances for U.S. totals were obtained by adding the variances for the four regions):
is the weighted total for the jth random group in the ith stratum in the rth Census region for characteristic X, and
the average x characteristic per random group in the rith stratum.
the number of random groups (t-10 in this instance).
The 17 strata used in the estimation of the standard errors are:
- White owner households by size groups:
1 and 2 persons
8 or more persons
- White renter households by size groups:
7 or more persons,
- Nonwhite owner households,
- Nonwhite renter households, and
- All group-quarters persons.
Comparison of Generalized and Computed Standard Errors
Table H shows the relationship between the standard error as computed from the one-in-one- thousand sample and the approximate standard error given by Tables E, C, or D. Column (1) of Table H shows the estimate of the characteristic; column (2) shows the standard error computed for the item by the method of random groups; column (3) shows standard error as read from the appropriate Table B, C, or D; and column (4) shows the characteristic type. The table also shows further examples of characteristics in Types I, II, and III. The effect of the composition of the typical household on the standard error is also illustrated. Note, for example, female non-heads ages under 24 are classed as Type II while for ages 25 and above (i.e., typically the housewife) the characteristic is treated as Type I.
Note: The figures in this table were calculated using data from an earlier one-in-one-thousand sample selected using the same sample design employed here. While the figures may not be in exact agreement with figures from the present sample, they accurately reflect the comparisons which the table was designed to illustrate.
Standard Errors for Intermediate-Sized Samples
The tables provided for the one- in-one-hundred, one-in-one-thousand, and one- in-ten-thousand samples may be used to measure sampling variability for intermediate-sized samples by adjusting for sample size. This is done by dividing the tabulated standard error for the next smallest tabulated sample size by the square root of the number of smaller samples used. For example, to find the standard error for an estimate based on a three-in-one-thousand sample, first find the tabulated standard error for the next smallest sample size; that is, locate the size of estimate in the appropriate table and read the standard error for the one-in-one-thousand sample. Divide this figure by the square root of 3 to get the standard error for data from a three-in-one-thousand sample.
Estimation of Standard Error for Types of Statistics Not Shown in Tables
The tabulated values for standard errors will be adequate for data giving a level or percentage. In some instances, however, it may be necessary to have a rough estimate of sampling variability for a statistic not covered by the tables (e.g. , an index). The random group method may be used directly in such a situation by applying the following formula:
q is the statistic of interest computed from the full sample;
q j is the statistic computed from the jth random group; and
t is the number of random groups.
The statistic q may be an index, a regression or correlation coefficient, or other measure for which the following approximate relationship holds:
Eq j = Eq
where Eq j and Eqindicate the expected values of the measure for the jth random group and the full sample respectively. Thus, for example, the formula does not apply for q defined as an estimated total number of persons or housing units in a category.
The procedure specified by the formula for sq2 requires the follow steps:
- Use t = 50 when dealing with the 1-in-100 sample or t = 10 with the 1-in-1,000 sample. (See note below for use of other values of t.)
Assign each of the records to one of the t random groups basedon the subsample number.
- To construct 50 random groups, assign all records in which the subsample number is 01 or 1 to the first random group; all records in which the subsample number is 02 or 52 to the second random group, etc. Finally, assign all records in which the subsample number is 50 or 00 to random group number 50.
- To construct 10 random groups of the 1-in-1,000 sample, assign all records with a 1 in the tens digit to random group 1; those with a 2 in the tens digit to random group 2, etc.
- Independently within each random group, calculate the desired measure (qj) as if the records for that random. group were the entire sample.
- Compute the desired measure (q j) on the entire sample.
- Compute the t values (q j - q )2 and apply in the above formula.
- The square root of the result is the desired standard error.
Note: The user is cautioned that the standard error given by this procedure is itself subject to sampling error. Some control can be put on the reliability of the estimated s q2 by the number of random groups used. The total sample can be separated into different numbers of random groups by different schemes involving the subsample numbers which for the 1-in-100 sample range from 00 to 99. Ideally, each random group should represent the total sample.
The number of random groups should be chosen to maximize the reliability of the estimate s q2. The reliability of the estimated sq 2 will depend on the number of degrees of freedom in the estimate (i.e., by the choice of t, the number of random groups), and the number of units in each of the resulting random groups. As a general rule, when estimating sq2 using the 1-in-100 sample, choosing t = 50 would produce reasonably reliable estimates of sq2 for most statistics. One might choose a smaller value for t to estimate the sampling errors for statistics for a given State. The considerations leading to the proper choice of t are described in several statistical texts (for example, Hansen., Hurwitz and Madow, Sample Survey Methods and Theory, Vol. I, Chapter 10, Section 16, page 440 ff.)
For the 1-in-1,000 sample, the tens digit of the subsample number ranges from 0 to 9 and can be used to define 10 random groups (the units digit is fixed). When this sample is being used, the choice of t is limited to a maximum of t = 10.
- Originally published as "Section IV. Sample Design and Sampling Variability," Technical Documentation for the 1960 Public Use Sample, PUS - 1960, Prepared by National Data Use and Access Laboratories (DUALabs), January 1973, pp. 53-77.
- Some of the standard errors used in the generalization here are displayed in Table H.