1960 Sample Design and Sampling Variability^{1}
Go Back to Sampling Procedures Index
A. Sample Design
General
The oneinonehundred sample is a subsample of the 5percent sample used for some of the tabulations of the 1960 Census of Population. The 5percent sample was selected from the original 25percent sample households using a stratified, systematic sample design. The oneinonehundred, oneinonethousand, and oneintenthousand samples are designed to take advantage of the selection and estimation processes used for the two larger samples. Selection and estimation at each of these levels is described in more detail below:
The 25Percent Sample
Selection of the 25Percent Sample. For persons living in housing units at the time of the 1960 Census, the sampling unit was a housing unit along with all of its occupants; for persons in group quarters, it was the person. On the first visit to an address the enumerator assigned a sample key letter (A, B, C, or D) to each housing unit sequentially in the order that he first visited the unit  whether or not he completed the interview. Each interviewer was given a random key letter to start his assignment. The order of canvassing was indicated in advance, although the instructions allowed some latitude in the order of visiting addresses. Each housing unit to which the key letter "A" was assigned was designated as a sample unit and all persons enumerated in the unit were included in the sample. In group quarters, the sample consisted of every fourth person in the order listed. Although the sampling procedures did automatically insure an exact 25percent sample of persons, the sample design was unbiased if carried through according to instructions. Biases my have arisen, however, when the enumerator failed to follow his listing and sample instructions exactly.
Estimation Procedure for the 1960 Census 25Percent Sample. Statistics based on the 25percent sample were estimated through the use of a ratio estimation procedure. This procedure was carried out or each of the following 44 groups of persons in each of the smallest weighting areas (SWA's).
Person Categories for Smallest Weighting Areas (SWA's)
Group  Sex and Color  Age  Relationship and Tenure 

1  Male, White  Under 5  
2  513  
3  1424  Head of owner household  
4  1424  Head of renter household  
5  1424  Not head of household  
6  2544  Head of owner household  
7  2544  Head of renter household  
8  2544  Not head of household  
9  45 and over  Head of owner household  
10  45 and over  Head of renter household  
11  45 and over  Not head of household  
1222  Male, Nonwhite  (repeat age categories)  (repeat relationship categories) 
2333  Female, White  (repeat age categories)  (repeat relationship categories) 
3444  Female, Nonwhite  (repeat age categories)  (repeat relationship categories) 
The SWA's established for the 25percent sample were the smallest separate geographic areas for which statistics had to be prepared in order to produce data for any of the geographic tabulations planned for the Census. Estimates for such areas were combined so as to produce all of the geographical detail required for the publication program of the Census, such as places of 2,500 inhabitants or more, Urbanized Areas, Standard Metropolitan Statistical Areas, or census tracts. Typical examples of SWA's are census tracts (in tracted cities), complete cities in smaller urban places, urban fringe areas defined outside of large cities, the rural balance in a minor civil division, etc. There were roughly 33,000 SWA's involved in the estimation procedure used for the 25percent sample.
Estimates of characteristics from the sample for a given weighting area are produced using the formula:
where x´ is the estimate of the characteristic for a weighting area obtained through the use of the ratioestimation procedure;
x_{i} is the count of sample persons with the characteristic in one of the 44 groups (group i) within the SWA;
Y_{i} is the count of all sample persons for the area in the same one of the 44 groups;
and Y_{i} is the complete Census count of persons in the same one of the 44 groups in the SWA.
For each of the 44 groups, the ratio of the complete Census count to the sample count of the population in the group was determined. Each specific sample person in the group was assigned an integral weight so that the sum of the weights would equal the complete count for the group. For example, if the ratio for a group was 4.1, onetenth of the persons (selected at random) within the group were assigned a weight of 5, and the remaining ninetenths a weight of 4. The use of such a combination of integral weights rather than a single fractional weight was adopted to avoid the complications involved in rounding in the final tables. Where there were fewer than 50 persons in the complete count in a group or where the resulting weight was over 16, groups were combined in a specific order to satisfy both of these two conditions and a common weight used for the combined groups.
These ratio estimates reduce the component of sampling error arising from the variation in the size of household and achieve many of the gains of stratification in the selection of the sample with the strata being the groups for which separate ratio estimates are computed. The net effect is a reduction in the sampling error and bias of most statistics below what would be obtained by weighting the results of the 25percent sample by a uniform factor of four. A byproduct of this type of estimation procedure is that estimates for the sample are generally consistent with the complete count with respect to the total population and for the subdivisions used as groups in the estimation procedure.
The 5Percent Population Sample
Selection of the 5Percent Population Sample. For some tabulations, a subsample of onefifth of the original 25percent sample schedules was selected by the computer using a stratified, systematic, sample design. The strata were made up as follows: For persons in regular housing units, there were 36 strata  9 householdsize groups (1, 2, 3, 4, 5, 6, 7, 8, and 9 or more persons per household) by two tenure groups, by two color groups (white or nonwhite): for persons in group quarters there were two strata (two color groups). Within each of these 38 strata the computer selected the sample by calculating the weight assigned (by the ratio estimation process for the 25percent sample) to the household head and selecting the household that caused the cumulative weight to pass a multiple of 20.
Estimation Procedure for the 5Percent Sample. Statistics based on the 5percent sample were estimated through the use of a ratio estimation procedure. The procedure used for this sample was similar to the estimation process described for the 25percent sample, with two important differences. First, larger SWA's were used for the 5percent sample than for the 25percent sample. They were defined as the combined total of areas within a state comprising central cities of urbanized areas, the remaining portions of urbanized areas, urban places not in urbanized areas, and rural areas. However, each urbanized area of more than 1,000,000 inhabitants made up two SWA's: the central city and the balance. Second, groups were sometimes combined in a specific order during the ratio estimation process as in the 25percent sample. For the 5percent sample, this was done when there were fewer than 275 persons in the complete count in a group, where the resulting weight was over 80.
Public Use Samples
Selection of the OneinOneHundred Sample. The oneinonehundred sample was selected as a subsample of the 5percent population sample, using a systematic selection of oneinfive within each of 38 strata. The strata are shown in the table on the next page; they differ slightly from the strata used in the selection of the 5percent sample. The subsampling was done in such a way as to take into consideration the weights assigned in the ratioestimation procedures used in the 5percent population sample described above. Within each stratum, using randomstart numbers in the range 099 (i.e., those in the table), the 5percent weights for each household head (or group quarters person) were accumulated and the entire household (or group quarters person) was selected each time the sum passed a multiple of 100.
Selection of the OneinOneThousand Sample. At the time a household or group quarters person was selected in the oneinonehundred sample, it was assigned a subsample number within each of the 38 strata. These subsample numbers range from 00 to 99. A sample of oneinonethousand was selected using the units digit of the subsample numbers. Thus, the oneinonethousand sample is a stratified systematic subsample of the oneinonehundred sample. Each record selected contains the subsample units digit 2 from Item H9394 (Subsample Number).
Selection of the OneinTenThousand Sample. The oneintenthousand sample is a subsample of the oneinonehundred sample consisting of those households and group quarters persons assigned one of the subsample numbers 00 to 99. The oneinten thousand sample is not a subsample of the oneinone thousand sample. Records selected contain the subsample number 79, from Item H9394 (Subsample Number).
Strata Used in Selecting the Sample. The records in the source file were grouped by households, each household record carrying a separate weight. These weights ranged around a value of 20. The weight of each household head or group quarters person was added to the start number or to the previous accumulation for the applicable stratum. Every time the total in the stratum accumulated to an additional hundred, the household head and al1 members of his household were selected for inclusion in the sample. Group quarters persons were selected without regard to other members of the group quarters whenever the accumulated total passed an additional hundred. Thirtyeight strata with their random start numbers are shown on the next page.
Selection of Intermediate Samples. Samples in the range between oneinonehundred and oneintenthousand may be selected by using an appropriate combination of the other samples; for instance, a stratified systematic threeinonethousand sample would consist of the households and group quarters comprising three systematicallyselected oneinone thousand samples. Such intermediate samples are also stratified systematic subsamples of the next larger sample.
Housing Subsamples
Since some of the housing items were collected as systematic fourfifths or onefifth subsamples of the initial 25percent sample, the resulting sampling rates for these items are: For the oneinonehundred sample, subsamples of onein125 and onein500, respectively; for the oneinonethousand sample, subsamples of onein1,250 and onein5,000, respectively; and for the oneintenthousand sample, subsamples of onein12,500 and 50,000,respectively.
B. Estimation
Estimates of true values may be made from sample data by using a simple inflation estimate, that is, by multiplying the sample figure by the reciprocal of the sampling rate. For example, to estimate the total number of persons with a certain population characteristic from a oneinonehundred sample, multiply the sample total by 100. Sampling rates for housing data are given in the preceding section. The sampling rate for an intermediatesized sample is the number of smaller tabulated samples used multiplied by the sampling rate for the smaller sample. Table I at the end of this section, provides estimates of total population, black population and other groups from the oneinonehundred sample, compared with the full 1960 Census count.
Stratification Cells and Random Start Numbers
1960 Census Public Use Sample
Size of Household  White  Nonwhite  

Owner head  Renter head  Owner head  Renter head  
1 person household  76  96  13  36 
2 person household  04  36  84  48 
3 person household  33  05  17  05 
4 person household  89  43  47  44 
5 person household  77  48  43  07 
6 person household  51  16  09  01 
7 person household  05  73  60  40 
8 to 11 person household  66  64  95  01 
12 or more person household  02  97  17  84 
Group quarters person  53  60 
C. Sampling Variability
General
Estimates derived from the sample tabulations are subject to sampling variability, which can be measured by the standard error. The chances are two out of three that the difference (due to sampling variability) between the sample estimate and the figure that would have been obtained from a complete count of the population is less than the standard error . The chances are about 19 out of 20 that the difference is less than twice the standard error and about 99 out of 100 that it is less than 21/2 times the standard error. The amount by which the estimated standard error must be multiplied to obtain other odds can be found in most statistical textbooks.
Approximate Standard Errors Using Tables A  D and Tables EG
Grouping Items by Type of Standard Error. The standard error of an estimate depends on sample size, method of sampling, and the estimation process. In a cluster sample as was used for population items (clustering was by household, and persons in each household comprised the cluster), the standard errors may vary from one type of statistic to another, depending on the homogeneity of the item within a cluster. The effectiveness of stratification also tends to vary among items. As a result, a simple table containing the exact standard errors cannot be constructed for sophisticated designs of this type. Instead, approximations to the standard errors are produced for classes of items which behave in a fairly similar manner. Empirical studies show that for the samples considered here, three classes are sufficient for most analyses. The grouping of items into these three classes is shown in Table A.
Standard errors for the oneinonehundred, oneinonethousand, and oneintenthousand samples have been produced for three classes of characteristics. The characteristics within the three classes differ mainly in the way they are affected by the homogeneity of persons within a household. The standard errors given in Tables B and E (and described as for Type I) are for characteristics having relatively low homogeneity among persons within households. These are for characteristics which are little affected by the fact that the entire household was selected as the sampling unit; that is, for items such as housing data and population data which would tend to describe only one person in the household (for example, married women, age 25  34). Standard errors for characteristics of Type III are given in Tables D and G, and apply to characteristics which tend to cluster by household (i.e., the characteristics tend to describe all or nearly all persons in the household), such as mobility, color, and agricultural residence. Standard errors for characteristics of Type II are shown in Tables C and F, and apply to characteristics which are moderately affected by the sampling of households  that is, characteristics which tend to lie between the two extremes: items such as number of males and number of employed.
Table A displays examples of how tabulations of various code categories would relate to the three types of standard errors. The table shows, for example, that "highest grade attended" may be considered as a Type I or as a Type II characteristic. Highest grade attended of the household head and highest grade attended for all children aged 616 would be Type I characteristics, since the first would be limited to one personperhousehold and the second would show a low degree of homogeneity within a household. However, highest grade attended for all persons over 25 would be a Type II characteristic since you would expect some homogeneity among the adults of a household.
Table A also shows that "race" is classified as a characteristic of Type III. The nature of the code categories for race, however used, would usually define households in which either all household members or none would satisfy the condition. Note, however, that a crosstabulation of item race and single years of age would require the resulting statistics to be treated as a characteristic of Type I.
Many items can be used either to obtain counts of households having a characteristic or to obtain total counts of households with what characteristic. For the former purpose, the characteristic would be treated as a Type I; for the latter, a Type III. For example, the number of households with contract rent under $150 would be a Type I characteristic, while the number of persons in households in this rent category would be a Type III.
Standard Errors of Estimated Numbers and Percentages. Tables B through D contain approximations to the standard errors for the data produced by multiplying the sample count by the reciprocal of the probability of selection  that is, a factor of 100 for the oneinonehundred sample, 1,000 for the oneinonethousand sample and 10,000 for the oneintenthousand sample. Tables E through G contain standard errors for percentages calculated from these data.
Standard Errors of Differences. The tables of standard errors are to be applied differently in the three following situations:
 For a difference between the sample figure and one based or. a complete count (e.g., arising from comparisons between oneinonehundred sample statistics and completecount statistics for 1950 or 1970), the standard error is identical with the standard error of the sample estimate alone.
 For a difference between two sample figures (e.g., one from the oneinonehundred sample and the other based on a different sample or for a different area based on the same sample), the standard error is approximately the square root of the sum of the squares of the standard error of each estimate considered separately. This formula will represent the actual standard error quite accurately for the difference between estimates of the same characteristic in two different areas, or for the difference between separate and uncorrelated characteristics in the same area. If, however, there is a high positive correlation between the two characteristics, the formula will overestimate the true standard error.

When the sample estimate is a difference between two sample estimates, one of which is a subclass of the other with respect to the same universe, the previous method of finding the standard error of the difference should not be used. Instead, the tables should be used directly by locating the appropriate standard error for an estimate equal to the size of the difference.
For example, it might be desired to estimate the total number of persons who completed junior high school but did not complete high school. The sample estimate for total persons who completed high school is available as is the sample estimate for persons who completed juniorhigh school. The standard error of this difference should be found by locating the size of the estimated difference in the appropriate table and sample size column.
Sampling Variability of Medians. The sampling variability of a median depends on the size of the base and on the distribution on which the median is based. An approximate method for measuring the reliability of an estimated median is to determine an interval about the estimated median such that there is a stated degree of confidence that the true median lies within the interval. As the first step in estimating the upper and lower limits of the interval (that is, the confidence limits) about the median, compute onehalf the number on which the median is based (designated N/2). Look up or compute the standard error of N/2; subtract this standard error from N/2. Cumulate the frequencies (in the table on which the median is based) up to the interval containing the difference between N/2 and its standard error, and by linear interpolation obtain a value corresponding to this number. In a corresponding manner, add the standard error to N/2, cumulate the frequencies in the table, and obtain a value in the table on which the median is based corresponding to the sum of N/2 and its standard error. The chances are two out of three that the median would lie between these two values. The range for 19 chances out of 20 and for 99 in 100 can be computed in a similar manner by multiplying the standard error by the appropriate factors before subtracting and adding to N/2. Interpolation to obtain the values corresponding to these numbers gives the confidence limits for the medians.
Sampling Variability of Estimated Means. The sampling variability of a mean, such as the number of children ever born per 1,000 women, depends on the variability of the distribution on which the mean is based, the size of the sample, the sample design, and the use of ratio estimates. An approximation to the variability of the mean may be obtained as follows: Compute the standard deviation of the distribution on which the mean is based; then multiply this figure by a factor of .8 for characteristics of Type I, 1.1 for Type II, and 1.7 for Type III. Divide this product. by the square root of the sample size.
Nonsampling Errors. Sampling error is only one of the components of the total error of a survey. Further contributions may come, for example, from biases in sample selection, from errors introduced by imputations for nonreporting, and from errors introduced in the coding and other processing of the questionnaires. For estimates of totals representing relatively small proportions of the population, the major component of the survey error tends to be the sampling error. As the estimated totals approach the level of the total population, the sampling errors decrease. This is not necessarily true of the nonsampling errors and they assume a relatively larger role in the total survey error. For this reason, standard errors of totals are not shown for all estimated levels even though the standard errors are actually present. For example, for regional estimates from these samples, the presence of nonsampling errors should be recognized in increasingly larger proportions for estimates close to the total population of the region.
Sampling Variability for the Housing Subsamples. Data obtained from the 20percent and 5percent housing are subject to greater sampling variability than data from the 25percent sample. An approximation to the standard errors of data from these samples can be made by obtaining the standard errors from the appropriate tables and adjusting them for the smaller sample size. For this adjustment, multiply the standard error of the 25percent sample by 1.1 if the data are from the 20percent subsample, or by 2.2, if the data are from the 5percent subsample. Data in the 5percent sample are items H59, H61H63 (air conditioning, number of bedrooms, clothes washing machine and clothes dryer), and H65H68 (home food freezer, television set, house heating fuel, and radio). Items H26, H53, H56H58, and H64 (basement, number of units in structure, source of water, sewage disposal, number of bathrooms and stories and elevator) are 20percent sample items. Note that data on item H60 (automobile) were collected as a 20percent item in large urban areas and as a 5percent item in other parts of the country. Tabulations of this item for areas encompassing both types of areas, therefore, will require a factor somewhere between 1.1 and 2.2.
Derivation of the Estimated Standard Errors in Tables B  G
Standard errors are usually not computed for every possible statistic of interest produced in a survey, partly because of the high cost of the necessary calculations and because the estimated standard error is a sample statistic also subject to sampling variability. The sampling variability of the estimated standard errors is often relatively large, but more stable estimates can frequently be obtained by combing the estimates of the sampling errors for several statistics known from previous experience to have sampling errors with similar behavior. One way of generalizing estimated sampling errors is to select representative characteristics, calculate the relvariances (the squares of the coefficients of variation) of these characteristics and use them to represent the relvariances of similar characteristics, using a relationship which typically holds concerning the relvariance and the size of the estimate^{2}. Investigation of this relationship has led to the development of a set of regression functions which, experience has shown, represent the behavior of the relvariances fairly well. Curves taking the form of
V_{x'}^{2 }= a + b
x'
have been used to produce the tables of standard errors for this report. The value V_{xí}^{2} is the relvariance of the estimate x', and a and b are parameters of the regression curve. The values of a and b were determined by using a method of successive approximations. The standard errors appearing in Tables B through G were then obtained from the regression curve for the oneinonethousand sample. The standard errors for the others were obtained by adjusting the oneinonethousand figures to reflect the different sample sizes. To calculate the relvariances required for the above generalization technique, a point estimate of the variance was determined for each of the selected terns using the random group method as follows:
The sample was divided into ten randomly determined mutuallyexclusive and exhaustive subsamples (random groups). Each of the random groups is a subsample selected in the sane manner as the full sample. In order to simplify the variance calculations, the number of strata was reduced from 38 to 17. The reduction in the number of strata probably results in a slight overstatement of the variance. The ten random groups were selected by using the terminal digit of the subsample number. Since these subsample numbers were assigned sequentially at the time of selection, the systematic selection process is incorporated in each random group. The formula used to estimate the variance of a total for each Census region follows (the variances for U.S. totals were obtained by adding the variances for the four regions):
where
Xrij
is the weighted total for the j^{th} random group in the i^{th} stratum in the r^{t}h Census region for characteristic X, and
the average x characteristic per random group in the ri^{th} stratum.
t
the number of random groups (t10 in this instance).
The 17 strata used in the estimation of the standard errors are:
 White owner households by size groups:
1 and 2 persons
3 persons
4 persons
5 persons
6 persons
7 persons
8 or more persons  White renter households by size groups:
1 person
2 persons
3 persons
4 persons
5 persons
6 persons
7 or more persons,  Nonwhite owner households,
 Nonwhite renter households, and
 All groupquarters persons.
Comparison of Generalized and Computed Standard Errors
Table H shows the relationship between the standard error as computed from the oneinone thousand sample and the approximate standard error given by Tables E, C, or D. Column (1) of Table H shows the estimate of the characteristic; column (2) shows the standard error computed for the item by the method of random groups; column (3) shows standard error as read from the appropriate Table B, C, or D; and column (4) shows the characteristic type. The table also shows further examples of characteristics in Types I, II, and III. The effect of the composition of the typical household on the standard error is also illustrated. Note, for example, female nonheads ages under 24 are classed as Type II while for ages 25 and above (i.e., typically the housewife) the characteristic is treated as Type I.
Note: The figures in this table were calculated using data from an earlier oneinonethousand sample selected using the same sample design employed here. While the figures may not be in exact agreement with figures from the present sample, they accurately reflect the comparisons which the table was designed to illustrate.
Standard Errors for IntermediateSized Samples
The tables provided for the one inonehundred, oneinonethousand, and one intenthousand samples may be used to measure sampling variability for intermediatesized samples by adjusting for sample size. This is done by dividing the tabulated standard error for the next smallest tabulated sample size by the square root of the number of smaller samples used. For example, to find the standard error for an estimate based on a threeinonethousand sample, first find the tabulated standard error for the next smallest sample size; that is, locate the size of estimate in the appropriate table and read the standard error for the oneinonethousand sample. Divide this figure by the square root of 3 to get the standard error for data from a threeinonethousand sample.
Estimation of Standard Error for Types of Statistics Not Shown in Tables
The tabulated values for standard errors will be adequate for data giving a level or percentage. In some instances, however, it may be necessary to have a rough estimate of sampling variability for a statistic not covered by the tables (e.g. , an index). The random group method may be used directly in such a situation by applying the following formula:
where
q is the statistic of interest computed from the full sample;
q j is the statistic computed from the j^{th} random group; and
t is the number of random groups.
The statistic q may be an index, a regression or correlation coefficient, or other measure for which the following approximate relationship holds:
Eq _{j} = Eq
where Eq _{j }and Eqindicate the expected values of the measure for the j^{th} random group and the full sample respectively. Thus, for example, the formula does not apply for q defined as an estimated total number of persons or housing units in a category.
The procedure specified by the formula for sq^{2} requires the follow steps:
 Use t = 50 when dealing with the 1in100 sample or t = 10 with the 1in1,000 sample. (See note below for use of other values of t.)

Assign each of the records to one of the t random groups basedon the subsample number.
 To construct 50 random groups, assign all records in which the subsample number is 01 or 1 to the first random group; all records in which the subsample number is 02 or 52 to the second random group, etc. Finally, assign all records in which the subsample number is 50 or 00 to random group number 50.
 To construct 10 random groups of the 1in1,000 sample, assign all records with a 1 in the tens digit to random group 1; those with a 2 in the tens digit to random group 2, etc.
 Independently within each random group, calculate the desired measure (q_{j}) as if the records for that random. group were the entire sample.
 Compute the desired measure (q _{j}) on the entire sample.
 Compute the t values (q _{j } q )^{2} and apply in the above formula.
 The square root of the result is the desired standard error.
Note: The user is cautioned that the standard error given by this procedure is itself subject to sampling error. Some control can be put on the reliability of the estimated s q^{2} by the number of random groups used. The total sample can be separated into different numbers of random groups by different schemes involving the subsample numbers which for the 1in100 sample range from 00 to 99. Ideally, each random group should represent the total sample.
The number of random groups should be chosen to maximize the reliability of the estimate s q^{2}. The reliability of the estimated sq ^{2} will depend on the number of degrees of freedom in the estimate (i.e., by the choice of t, the number of random groups), and the number of units in each of the resulting random groups. As a general rule, when estimating sq^{2 }using the 1in100 sample, choosing t = 50 would produce reasonably reliable estimates of sq^{2} for most statistics. One might choose a smaller value for t to estimate the sampling errors for statistics for a given State. The considerations leading to the proper choice of t are described in several statistical texts (for example, Hansen., Hurwitz and Madow, Sample Survey Methods and Theory, Vol. I, Chapter 10, Section 16, page 440 ff.)
For the 1in1,000 sample, the tens digit of the subsample number ranges from 0 to 9 and can be used to define 10 random groups (the units digit is fixed). When this sample is being used, the choice of t is limited to a maximum of t = 10.
ENDNOTES:
 Originally published as "Section IV. Sample Design and Sampling Variability," Technical Documentation for the 1960 Public Use Sample, PUS  1960, Prepared by National Data Use and Access Laboratories (DUALabs), January 1973, pp. 5377.
 Some of the standard errors used in the generalization here are displayed in Table H.