|
|
1960
Sample Design and Sampling Variability1
Go Back to Sampling Procedures
Index
A. Sample Design
General
The one-in-one-hundred sample is a subsample of the 5-percent sample
used for some of the tabulations of the 1960 Census of Population. The
5-percent sample was selected from the original 25-percent sample households
using a stratified, systematic sample design. The one-in-one-hundred, one-in-one-thousand,
and one-in-ten-thousand samples are designed to take advantage of the selection
and estimation processes used for the two larger samples. Selection and
estimation at each of these levels is described in more detail below:
The 25-Percent Sample
Selection of the 25-Percent Sample. For persons living
in housing units at the time of the 1960 Census, the sampling unit was
a housing unit along with all of its occupants; for persons in group quarters,
it was the person. On the first visit to an address the enumerator assigned
a sample key letter (A, B, C, or D) to each housing unit sequentially in
the order that he first visited the unit -- whether or not he completed
the interview. Each interviewer was given a random key letter to start
his assignment. The order of canvassing was indicated in advance, although
the instructions allowed some latitude in the order of visiting addresses.
Each housing unit to which the key letter "A" was assigned was designated
as a sample unit and all persons enumerated in the unit were included in
the sample. In group quarters, the sample consisted of every fourth person
in the order listed. Although the sampling procedures did automatically
insure an exact 25-percent sample of persons, the sample design was unbiased
if carried through according to instructions. Biases my have arisen, however,
when the enumerator failed to follow his listing and sample instructions
exactly.
Estimation Procedure for the 1960 Census 25-Percent Sample. Statistics
based on the 25-percent sample were estimated through the use of a ratio-
estimation procedure. This procedure was carried out or each of the following
44 groups of persons in each of the smallest weighting areas (SWA's).
Person Categories for Smallest Weighting Areas (SWA’s)
| Group |
Sex and Color |
Age |
Relationship and Tenure |
| 1 |
Male, White |
Under 5 |
|
| 2 |
|
5-13 |
|
| 3 |
|
14-24 |
Head of owner household |
| 4 |
|
14-24 |
Head of renter household |
| 5 |
|
14-24 |
Not head of household |
| 6 |
|
25-44 |
Head of owner household |
| 7 |
|
25-44 |
Head of renter household |
| 8 |
|
25-44 |
Not head of household |
| 9 |
|
45 and over |
Head of owner household |
| 10 |
|
45 and over |
Head of renter household |
| 11 |
|
45 and over |
Not head of household |
| 12-22 |
Male, Nonwhite |
(repeat age categories) |
(repeat relationship categories) |
| 23-33 |
Female, White |
(repeat age categories) |
(repeat relationship categories) |
| 34-44 |
Female, Nonwhite |
(repeat age categories) |
(repeat relationship categories) |
The SWA's established for the 25-percent sample were the smallest separate
geographic areas for which statistics had to be prepared in order to produce
data for any of the geographic tabulations planned for the Census. Estimates
for such areas were combined so as to produce all of the geographical detail
required for the publication program of the Census, such as places of 2,500
inhabitants or more, Urbanized Areas, Standard Metropolitan Statistical
Areas, or census tracts. Typical examples of SWA's are census tracts (in
tracted cities), complete cities in smaller urban places, urban fringe
areas defined outside of large cities, the rural balance in a minor civil
division, etc. There were roughly 33,000 SWA's involved in the estimation
procedure used for the 25-percent sample.
Estimates of characteristics from the sample for a given weighting area
are produced using the formula:
-

where x´ is the estimate of the
characteristic for a weighting area obtained through the use of the ratio-estimation
procedure;
xi
is the count of sample persons with the characteristic in one of the 44
groups (group i) within the SWA;
Yi is the
count of all sample persons for the area in the same one of the 44 groups;
and Yi
is the complete Census count of persons in the same one of the 44 groups
in the SWA.
For each of the 44 groups, the ratio of the complete Census count to
the sample count of the population in the group was determined. Each specific
sample person in the group was assigned an integral weight so that the
sum of the weights would equal the complete count for the group. For example,
if the ratio for a group was 4.1, one-tenth of the persons (selected at
random) within the group were assigned a weight of 5, and the remaining
nine-tenths a weight of 4. The use of such a combination of integral weights
rather than a single fractional weight was adopted to avoid the complications
involved in rounding in the final tables. Where there were fewer than 50
persons in the complete count in a group or where the resulting weight
was over 16, groups were combined in a specific order to satisfy both of
these two conditions and a common weight used for the combined groups.
These ratio estimates reduce the component of sampling error arising
from the variation in the size of household and achieve many of the gains
of stratification in the selection of the sample with the strata being
the groups for which separate ratio estimates are computed. The net effect
is a reduction in the sampling error and bias of most statistics below
what would be obtained by weighting the results of the 25-percent sample
by a uniform factor of four. A by-product of this type of estimation procedure
is that estimates for the sample are generally consistent with the complete
count with respect to the total population and for the subdivisions used
as groups in the estimation procedure.
The 5-Percent Population
Sample
Selection of the 5-Percent Population Sample. For some tabulations,
a subsample of one-fifth of the original 25-percent sample schedules was
selected by the computer using a stratified, systematic, sample design.
The strata were made up as follows: For persons in regular housing units,
there
were 36 strata -- 9 household-size groups (1, 2, 3, 4, 5, 6, 7, 8, and
9 or more persons per household) by two tenure groups, by two color groups
(white or nonwhite): for persons in group quarters there were two strata
(two color groups). Within each of these 38 strata the computer selected
the sample by calculating the weight assigned (by the ratio estimation
process for the 25-percent sample) to the household head and selecting
the household that caused the cumulative weight to pass a multiple of 20.
Estimation Procedure for the 5-Percent Sample. Statistics based
on the 5-percent sample were estimated through the use of a ratio- estimation
procedure. The procedure used for this sample was similar to the estimation
process described for the 25-percent sample, with two important differences.
First, larger SWA's were used for the 5-percent sample than for the 25-percent
sample. They were defined as the combined total of areas within a state
comprising central cities of urbanized areas, the remaining portions of
urbanized areas, urban places not in urbanized areas, and rural areas.
However, each urbanized area of more than 1,000,000 inhabitants made up
two SWA's: the central city and the balance. Second, groups were sometimes
combined in a specific order during the ratio- estimation process as in
the 25-percent sample. For the 5-percent sample, this was done when there
were fewer than 275 persons in the complete count in a group, where the
resulting weight was over 80.
Public Use Samples
Selection of the One-in-One-Hundred Sample. The one-in-one-hundred
sample was selected as a subsample of the 5-percent population sample,
using a systematic selection of one-in-five within each of 38 strata. The
strata are shown in the table on the next page; they differ slightly from
the strata used in the selection of the 5-percent sample. The subsampling
was done in such a way as to take into consideration the weights assigned
in the ratio-estimation procedures used in the 5-percent population sample
described above. Within each stratum, using random-start numbers in the
range 0-99 (i.e., those in the table), the 5-percent weights for each household
head (or group quarters person) were accumulated and the entire household
(or group quarters person) was selected each time the sum passed a multiple
of 100.
Selection of the One-in-One-Thousand Sample. At the time a household
or group quarters person was selected in the one-in-one-hundred sample,
it was assigned a subsample number within each of the 38 strata. These
subsample numbers range from 00 to 99. A sample of one-in-one-thousand
was selected using the units digit of the subsample numbers. Thus, the
one-in-one-thousand sample is a stratified systematic subsample of the
one-in-one-hundred sample. Each record selected contains the subsample
units digit 2 from Item H93-94 (Subsample Number).
Selection of the One-in-Ten-Thousand Sample. The one-in-ten-thousand
sample is a subsample of the one-in-one-hundred sample consisting of those
households and group quarters persons assigned one of the subsample numbers
00 to 99. The one-in-ten- thousand sample is not a subsample of the one-in-one-
thousand sample. Records selected contain the subsample number 79, from
Item H93-94 (Subsample Number).
Strata Used in Selecting the Sample. The records in the source
file were grouped by households, each household record carrying a separate
weight. These weights ranged around a value of 20. The weight of each household
head or group quarters person was added to the start number or to the previous
accumulation for the applicable stratum. Every time the tota1 in the stratum
accumulated to an additional hundred, the household head and al1 members
of his household were selected for inclusion in the sample. Group quarters
persons were selected without regard to other members of the group quarters
whenever the accumulated total passed an additiona1 hundred. Thirty-eight
strata with their random start numbers are shown on the next page.
Selection of Intermediate Samples. Samples in the range between
one-in-one-hundred and one-in-ten-thousand may be selected by using an
appropriate combination of the other samples; for instance, a stratified
systematic three-in-one-thousand sample would consist of the households
and group quarters comprising three systematically-selected one-in-one-
thousand samples. Such intermediate samples are also stratified systematic
subsamples of the next larger sample.
Housing Subsamples
Since some of the housing items were collected as sytematic four-fifths
or one-fifth subsamples of the initial 25-percent sample, the resulting
sampling rates for these items are: For the one-in-one-hundred sample,
subsamples of one-in-125 and one-in-500, respectively; for the one-in-one-thousand
sample, subsamples of one-in-1,250 and one-in-5,000, respectively; and
for the one-in-ten-thousand sample, subsamples of one-in-12,500 and 50,000,respectively.
B. Estimation
Estimates of true values may be made from sample data by using a simple
inflation estimate, that is, by multiplying the sample figure by the reciprocal
of the sampling rate. For example, to estimate the total number of persons
with a certain population characteristic from a one-in-one-hundred sample,
multiply the sample total by 100. Sampling rates for housing data are given
in the preceding section. The sampling rate for an intermediate-sized sample
is the number of smaller tabulated samples used multiplied by the sampling
rate for the smaller sample. Table I at the end of this section, provides
estimates of total population, black population and other groups from the
one-in-one-hundred sample, compared with the full 1960 Census count.
Stratification Cells and Random Start Numbers
1960 Census Public Use Sample
|
Size of Household
|
White
|
Nonwhite
|
|
Owner head
|
Renter head
|
Owner head
|
Renter head
|
|
1 person household
|
76
|
96
|
13
|
36
|
|
2 person household
|
04
|
36
|
84
|
48
|
|
3 person household
|
33
|
05
|
17
|
05
|
|
4 person household
|
89
|
43
|
47
|
44
|
|
5 person household
|
77
|
48
|
43
|
07
|
|
6 person household
|
51
|
16
|
09
|
01
|
|
7 person household
|
05
|
73
|
60
|
40
|
|
8 to 11 person household
|
66
|
64
|
95
|
01
|
|
12 or more person household
|
02
|
97
|
17
|
84
|
|
Group quarters person
|
53
|
60
|
C. Sampling Variability
General
Estimates derived from the sample tabulations are subject to sampling
variability, which can be measured by the standard error. The chances are
two out of three that the difference (due to sampling variability) between
the sample estimate and the figure that would have been obtained from a
complete count of the population is less than the standard error . The
chances are about 19 out of 20 that the difference is less than twice the
standard error and about 99 out of 100 that it is less than 2-1/2 times
the standard error. The amount by which the estimated standard error must
be multiplied to obtain other odds can be found in most statistical textbooks.
Approximate Standard Errors
Using Tables A - D and Tables
E-G
Grouping Items by Type of Standard Error. The standard error
of an estimate depends on sample size, method of sampling, and the estimation
process. In a cluster sample as was used for population items (clustering
was by household, and persons in each household comprised the cluster),
the standard errors may vary from one type of statistic to another, depending
on the homogeneity of the item within a cluster. The effectiveness of stratification
also tends to vary among items. As a result, a simple table containing
the exact standard errors cannot be constructed for sophisticated designs
of this type. Instead, approximations to the standard errors are produced
for classes of items which behave in a fairly similar manner. Empirical
studies show that for the samples considered here, three classes are sufficient
for most analyses. The grouping of items into these three classes is shown
in Table A.
Standard errors for the one-in-one-hundred, one-in-one-thousand,
and one-in-ten-thousand samples have been produced for three classes
of characteristics. The characteristics within the three classes
differ mainly in the way they are affected by the homogeneity of
persons within a household. The standard errors given in Tables
B and E (and described
as for Type I) are for characteristics having relatively low homogeneity
among persons within households. These are for characteristics which
are little affected by the fact that the entire household was selected
as the sampling unit; that is, for items such as housing data and
population data which would tend to describe only one person in
the household (for example, married women, age 25 - 34). Standard
errors for characteristics of Type III are given in Tables
D and G, and apply to characteristics
which tend to cluster by household (i.e., the characteristics tend
to describe all or nearly all persons in the household), such as
mobility, color, and agricultural residence. Standard errors for
characteristics of Type II are shown in Tables
C and F, and apply to characteristics
which are moderately affected by the sampling of households -- that
is, characteristics which tend to lie between the two extremes:
items such as number of males and number of employed.
Table A displays examples of how
tabulations of various code categories would relate to the three types
of standard errors. The table shows, for example, that "highest grade attended"
may be considered as a Type I or as a Type II characteristic. Highest grade
attended of the household head and highest grade attended for all children
aged 6-16 would be Type I characteristics, since the first would be 1imited
to one person-per-household and the second would show a low degree of homogeneity
within a household. However, highest grade attended for all persons over
25 would be a Type II characteristic since you would expect some homogeneity
among the adults of a household.
Table A also shows that "race" is classified as a characteristic of
Type III. The nature of the code categories for race, however used, would
usually define househo1ds in which either all household members or none
would satisfy the condition. Note, however, that a cross-tabulation of
item race and single years of age would require the resulting statistics
to be treated as a characteristic of Type I.
Many items can be used either to obtain counts of households having
a characteristic or to obtain total counts of households with what characteristic.
For the former purpose, the characteristic would be treated as a Type I;
for the latter, a Type III. For example, the number of households with
contract rent under $150 would be a Type I characteristic, while the number
of persons in households in this rent category would be a Type III.
Standard Errors of Estimated Numbers and Percentages. Tables
E through D contain approximations
to the standard errors for the data produced by multiplying the sample
count by the reciprocal of the probability of selection -- that is, a factor
of 100 for the one-in-one-hundred sample, 1,000 for the one-in-one-thousand
sample and 10,000 for the one-in-ten-thousand sample. Tables
E through G contain standard errors for percentages calculated from
these data.
Standard Errors of Differences. The tables of standard errors
are to be applied differently in the three following situations:
-
For a difference between the sample figure and one based or. a complete
count (e.g., arising from comparisons between one-in-one-hundred sample
statistics and complete-count statistics for 1950 or 1970), the standard
error is identical with the standard error of the sample estimate alone.
-
For a difference between two sample figures (e.g., one from the one-in-one-hundred
sample and the other based on a different sample or for a different area
based on the same sample), the standard error is approximately the square
root of the sum of the squares of the standard error of each estimate considered
separately. This formula will represent the actual standard error quite
accurately for the difference between estimates of the same characteristic
in two different areas, or for the difference between separate and uncorrelated
characteristics in the same area. If, however, there is a high positive
correlation between the two characteristics, the formula will overestimate
the true standard error.
-
When the sample estimate is a difference between two sample estimates,
one of which is a subclass of the other with respect to the same universe,
the previous method of finding the standard error of the difference should
not be used. Instead, the tables should be used directly by locating the
appropriate standard error for an estimate equal to the size of the difference.
For example, it might be desired to estimate the total number of
persons who completed junior high school but did not complete high school.
The sample estimate for total persons who completed high school is available
as is the sample estimate for persons who completed junior-high school.
The standard error of this difference should be found by locating the size
of the estimated difference in the appropriate table and sample size column.
Sampling Variability of Medians. The sampling variability of a median
depends on the size of the base and on the distribution on which the median
is based. An approximate method for measuring the reliability of an estimated
median is to determine an interval about the estimated median such that
there is a stated degree of confidence that the true median lies within
the interval. As the first step in estimating the upper and lower limits
of the interval (that is, the confidence limits) about the median, compute
one-half the number on which the median is based (designated N/2). Look
up or compute the standard error of N/2; subtract this standard error from
N/2. Cumulate the frequencies (in the table on which the median is based)
up to the interval containing the difference between N/2 and its standard
error, and by linear interpolation obtain a value corresponding to this
number. In a corresponding manner, add the standard error to N/2, cumulate
the frequencies in the table, and obtain a value in the table on which
the median is based corresponding to the sum of N/2 and its standard error.
The chances are two out of three that the median would lie between these
two values. The range for 19 chances out of 20 and for 99 in 100 can be
computed in a similar manner by multiplying the standard error by the appropriate
factors before subtracting and adding to N/2. Interpolation to obtain the
values corresponding to these numbers gives the confidence limits for the
medians.
Sampling Variability of Estimated Means. The sampling variability
of a mean, such as the number of children ever born per 1,000 women, depends
on the variability of the distribution on which the mean is based, the
size of the sample, the sample design, and the use of ratio estimates.
An approximation to the variability of the mean may be obtained as follows:
Compute the standard deviation of the distribution on which the mean is
based; then multiply this figure by a factor of .8 for characteristics
of Type I, 1.1 for Type II, and 1.7 for Type III. Divide this product.
by the square root of the sample size.
Nonsampling Errors. Sampling error is only one of the components
of the total error of a survey. Further contributions may come, for example,
from biases in sample selection, from errors introduced by imputations
for non-reporting, and from errors introduced in the coding and other processing
of the questionnaires. For estimates of totals representing relatively
small proportions of the population, the major component of the survey
error tends to be the sampling error. As the estimated totals approach
the level of the total population, the sampling errors decrease. This is
not necessarily true of the nonsampling errors and they assume a relatively
larger role in the total survey error. For this reason, standard errors
of totals are not shown for all estimated levels even though the standard
errors are actually present. For example, for regional estimates from these
samples, the presence of nonsampling errors should be recognized in increasingly
larger proportions for estimates close to the total population of the region.
Sampling Variability for the Housing Subsamples. Data obtained
from the 20-percent and 5-percent housing are subject to greater sampling
variability than data from the 25-percen sample. An approximation to the
standard errors of data from these samples can be made by obtaining the
standard errors from the appropriate tables and adjusting them for the
smaller sample size. For this adjustment, multiply the standard error of
the 25-percent sample by 1.1 if the data are from the 20-percent subsample,
or by 2.2, if the data are from the 5-percent subsample. Data in the 5-percent
sample are items H59, H61-H63 (air conditioning, number of bedrooms, clothes
washing machine and clothes dryer), and H65-H68 (home food freezer, television
set, house heating fuel, and radio). Items H26, H53, H56-H58, and H64 (basement,
number of units in structure, source of water, sewage disposal, number
of bathrooms and stories and elevator) are 20-percent sample items. Note
that data on item H60 (automobile) were collected as a 20-percent item
in large urban areas and as a 5-percent item in other parts of the country.
Tabulations of this item for areas encompassing both types of areas, therefore,
will require a factor somewhere between 1.1 and 2.2.
Derivation of the Estimated Standard Errors in Tables
B - G
Standard errors are usually not computed for every possible statistic
of interest produced in a survey, partly because of the high cost of the
necessary calculations and because the estimated standard error is a sample
statistic also subject to sampling variability. The sampling variability
of the estimated standard errors is often relatively large, but more stable
estimates can frequently be obtained by combing the estimates of the sampling
errors for several statistics known from previous experience to have sampling
errors with similar behavior. One way of generalizing estimated sampling
errors is to select representative characteristics, calculate the relvariances
(the squares of the coefficients of variation) of these characteristics
and use them to represent the relvariances of similar characteristics,
using a relationship which typically holds concerning the relvariance and
the size of the estimate.2 Investigation
of this relationship has led to the development of a set of regression
functions which, experience has shown, represent the behavior of the relvariances
fairly well. Curves taking the form of
Vx’2 = a + b
x'
have been used to produce the tables of standard errors for this report.
The value Vx’2 is the relvariance of the estimate
x', and a and b are parameters of the regression curve. The values of a
and b were determined by using a method of successive approximations. The
standard errors appearing in Tables B
through G were then obtained from the
regression curve for the one-in-one-thousand sample. The standard errors
for the others were obtained by adjusting the one-in-one-thousand figures
to reflect the different sample sizes. To calculate the relvariances required
for the above generalization technique, a point estimate of the variance
was determined for each of the selected terns using the random group method
as follows:
The sample was divided into ten randomlydetermined mutually-exclusive
and exhaustive subsamples (random groups). Each of the random groups is
a subsample selected in the sane manner as the full sample. In order to
simplify the variance calculations, the number of strata was reduced from
38 to 17. The reduction in the number of strata probably results in a slight
overstatement of the variance. The ten random groups were selected by using
the terminal digit of the subsample number. Since these subsample numbers
were assigned sequentially at the time of selection, the systematic selection
process is incorporated in each random group. The formula used to estimate
the variance of a total for each Census region follows (the variances for
U.S. totals were obtained by adding the variances for the four regions):
where Xrij
is the weighted total for the jth random group in the
ith
stratum in the rth Census
region for characteristic X, and
the average x characteristic
per random group in the rith stratum.
t
the number of random groups (t-10 in this instance).
The 17 strata used in the estimation of the standard errors are:
White owner households by size groups:
1 and 2 persons
3 persons
4 persons
5 persons
6 persons
7 persons
8 or more persons
White renter households by size groups:
1 person
2 persons
3 persons
4 persons
5 persons
6 persons
7 or more persons,
Nonwhite owner households,
Nonwhite renter households, and
All group-quarters persons.
Comparison of Generalized and
Computed Standard Errors
Table H shows the relationship between
the standard error as computed from the one-in-one- thousand sample and
the approximate standard error given by Tables
E, C, or D. Column (1) of Table
H shows the estimate of the characteristic; column (2) shows the standard
error computed for the item by the method of random groups; column (3)
shows standard error as read from the appropriate Table
B, C, or D; and column (4) shows the characteristic type. The table
also shows further examples of characteristics in Types I, II, and III.
The effect of the composition of the typical household on the standard
error is also illustrated. Note, for example, female non-heads ages under
24 are classed as Type II while for ages 25 and above (i.e., typically
the housewife) the characteristic is treated as Type I.
Note: The figures in this table were calculated using data from an earlier
one-in-one-thousand sample selected using the same sample design employed
here. While the figures may not be in exact agreement with figures from
the present sample, they accurately reflect the comparisons which the table
was designed to illustrate.
Standard Errors for Intermediate-Sized
Samples
The tables provided for the one- in-one-hundred, one-in-one-thousand,
and one- in-ten-thousand samples may be used to measure sampling variability
for intermediate-sized samples by adjusting for sample size. This is done
by dividing the tabulated standard error for the next smallest tabulated
sample size by the square root of the number of smaller samples used. For
example, to find the standard error for an estimate based on a three-in-one-thousand
sample, first find the tabulated standard error for the next smallest samp1e
size; that is, locate the size of estimate in the appropriate table and
read the standard error for the one-in-one-thousand sample. Divide this
figure by the square root of 3 to get the standard error for data from
a three-in-one-thousand sample.
Estimation of Standard Error for Types of Statistics Not Shown in Tables
The tabulated values for standard errors will be adequate for data giving
a level or percentage. In some instances, however, it may be necessary
to have a rough estimate of sampling variability for a statistic not covered
by the tables (e.g. , an index). The random group method may be used directly
in such a situation by applying the following formula:
where q
is the statistic of interest computed from the full sample;
q j is the statistic computed from the
jth
random group; and
t is the number of random groups.
The statistic q may be an index, a regression
or correlation coefficient, or other measure for which the following approximate
relationship holds:
Eq j = Eq
where Eq j and Eqindicate
the expected values of the measure for the jth random
group and the full sample respectively. Thus, for example, the formula
does not apply for q defined as an estimated
total number of persons or housing units in a category.
The procedure specified by the formula for sq2
requires the follow steps:
-
Use t = 50 when dealing with the 1-in-100 sample or t = 10 with the 1-in-1,000
sample. (See note below for use of other values of t.)
-
Assign each of the records to one of the t random groups basedon the subsample
number.
-
To construct 50 random groups, assign all records in which the subsample
number is 01 or 1 to the first random group; all records in which the subsample
number is 02 or 52 to the second random group, etc. Finally, assign all
records in which the subsample number is 50 or 00 to random group number
50.
-
To construct 10 random groups of the 1-in-1,000 sample, assign all records
with a 1 in the tens digit to random group 1; those with a 2 in the tens
digit to random group 2, etc.
-
Independently within each random group, calculate the desired measure (qj)
as if the records for that random. group were the entire sample.
-
Compute the desired measure (q j)
on the entire sample.
-
Compute the t values (q j -
q
)2 and apply in the above formula.
-
The square root of the result is the desired standard error.
Note: The user is cautioned that the standard error given by this procedure
is itself subject to sampling error. Some control can be put on the reliability
of the estimated s q2
by the number of random groups used. The total sample can be separated
into different numbers of random groups by different schemes involving
the subsample numbers which for the 1-in-100 sample range from 00 to 99.
Ideally, each random group should represent the total sample.
The number of random groups should be chosen to maximize the reliability
of the estimate
s q2.
The reliability of the estimated sq 2
will depend on the number of degrees of freedom in the estimate (i.e.,
by the choice of t, the number of random groups), and the number of units
in each of the resulting random groups. As a general rule, when estimating
sq2
using
the 1-in-100 sample, choosing t = 50 would produce reasonably reliable
estimates of sq2 for most statistics.
One might choose a smaller value for t to estimate the sampling errors
for statistics for a given State. The considerations leading to the proper
choice of t are described in several statistical texts (for example, Hansen.,
Hurwitz and Madow, Sample Survey Methods and Theory, Vol. I, Chapter
10, Section 16, page 440 ff.)
For the 1-in-1,000 sample, the tens digit of the subsample number ranges
from 0 to 9 and can be used to define 10 random groups (the units digit
is fixed). When this sample is being used, the choice of t is limited to
a maximum of t = 10.
ENDNOTES:
-
Originally published as "Section IV. Sample
Design and Sampling Variability," Technical Documentation for the 1960
Public Use Sample, PUS - 1960, Prepared by National Data Use and Access
Laboratories (DUALabs), January 1973, pp. 53-77.
- Some of the standard errors used in the generalization
here are displayed in Table H.
Go Back to Sampling Procedures
Index
|