Replicate Weights in the American Community Survey / Puerto Rican Community Survey
- What are replicate weights?
- Why might I want to use replicate weights?
- Does using replicate weights make any substantive difference?
- How do I obtain replicate standard errors from IPUMS-USA data?
- Is there any way to do this automatically in major statistical packages?
- Can I simply divide the full sample into 80 random subsamples from the full sample and calculate replicate standard errors manually?
- How are the ACS/PRCS replicate weights calculated?
What are replicate weights?
Replicate weights are currently available for the 2005-onward American Community Survey and Puerto Rican Community Survey data. In the ACS and PRCS, there are 80 separate replicate weights at the household and person levels that allow users to generate empirically derived standard error estimates. These standard errors can then be used in hypothesis testing and in the construction of confidence intervals around the sample estimate of interest.
Why might I want to use replicate weights?
In theory, the standard error of an estimate measures the variation of a statistic across multiple samples of a given population. Thus the true standard error of any characteristic calculated from a single sample can never be known with certainty; sample standard errors are simply estimated. Replicate weights allow a single sample to simulate multiple samples, thus generating more informed standard error estimates that mimic the theoretical basis of standard errors while retaining all information about the complex sample design. These standard errors can then be used to obtain more precise confidence intervals and significance tests.
Does using replicate weights make any substantive difference?
In IPUMS testing of ACS/PRCS data, replicate weights usually increase standard errors. This increase is generally not large enough to alter the significance level of coefficients, though marginally significant coefficients may become clearly nonsignificant. The more obvious effect of using replicate weights is on the width of confidence intervals, which can change substantially.
How do I obtain replicate standard errors from IPUMS-USA data?
There are 3 main steps:
- Run your analysis using the full-sample weights (PERWT and HHWT are the main IPUMS-USA weights). Record the statistic you are interested in (e.g., the mean income of veterans, or the coefficient describing the relationship between income and whether one has health insurance coverage).
- Run your analysis again using each set of replicate weights. First, run the analysis using REPWTP1, then again using REPWTP2, then again using REPWTP3, and so on up to the final set of replicate weights. After each set, record the statistic you are interested in. (N.B.: If you are analyzing a household-only file, be sure to use REPWT1, REPWT2, etc.)
Insert the above results into the following formula:
where X is the result from the analysis using the full-sample weight and Xr is the result from the analysis using the r-th set of replicate weights.
Is there any way to do this automatically in major statistical packages?
Yes. Although the replicate standard errors contained in the IPUMS-USA data are calculated using the successive difference replication method, which is different from the types of replicate weights that most statistical software packages can handle, Stata can process IPUMS-USA replicate weights automatically as of version 11.1 (released June 3, 2010).
To use IPUMS-USA replicate weights in Stata, you must first svyset the data.
. svyset[pweight=perwt], vce(brr) brrweight(repwtp1-repwtp80) fay(.5)mse
- The sample should be treated as a single stratum (the weights contain the relevant information from the sample design), so no PSU should be specified.
- The full-sample weight must be specified.
- You then specify the replicate weights in the
brrweight()option. Note that specifying the variable list with a wildcard character (
repwtp*) rather than with a range of variables (
repwtp1-repwtp80) will not produce correct results because IPUMS-USA data contain a variable called REPWTP, which merely indicates the presence of replicate weights and is coded 1 for every case. The
fpc()suboption should not be specified.
- You must also specify the
Earlier versions of Stata can also handle successive difference replicate weights. Correspondence with StataCorp statisticians and IPUMS testing revealed that successive difference replicate weights can be treated as jackknife replicate weights if the options are specified correctly.
svyset command for Stata versions 11.0 and before is slightly different:
. svyset [iw=perwt], jkrweight(repwtp1-repwtp80, multiplier(.05)) ///
- As above, the sample should be treated as a single stratum (the weights contain the relevant information from the sample design), so no PSU should be specified.
- Also as above, the full-sample weight must be specified; some replicate weights in the ACS/PRCS are negative, which is why
iweightsare specified instead of
You must place the replicate weight variables in the
jkrweight()option. Note that specifying the variable list with a wildcard character (
repwtp*) rather than with a range of variables (
repwtp1-repwtp80) will not produce correct results because IPUMS-USA data contain a variable called REPWTP, which merely indicates the presence of replicate weights and is coded 1 for every case.
multiplier()suboption gives the quotient from the above formula (4/80 = 0.05). If you are not using ACS/PRCS data and have a different number of replicate weights, you will need to adjust the multiplier accordingly.
- Neither the
fpc()suboptions should be specified.
- You must also specify the
svysetting the data, you run the command using the
svy: prefix, which passes along the options you defined above.
. svy: command
Stata will execute this command using the full-sample weights and again for each set of replicate weights. There are two important things to note:
- Not all Stata commands can be run with the
. help svy_estimationto see a list of valid commands.
If you want to limit your replicate analyses to a subset of the sample (for example, all persons aged 25-64 or all African Americans), you should not use if or in. Instead, use the
subpop()option before the colon, as in
. gen byte age25_64 = age>=25 & age<=64 . svy, subpop(age25_64): command
In SAS, instead of declaring a set of survey options all at once, you must declare them in each statistical procedure. Replicate weights can be applied in any statistical procedure that accepts the jackknife method of variance estimation by using the JKCOEFS option to set the coefficients for the jackknife replicate weights to be 0.05 (the result of 4/80 in the formula above). In regression, for example, type:
PROC SURVEYREG data=dataset VARMETHOD=jackknife; MODEL dependent_variable=independent_variable(s) WEIGHT perwt; REPWEIGHTS repwtp1-repwtp80 /JKCOEFS=0.05; run;
See also the Census Bureau's "Estimating ASEC Variances with Replicate Weights" document for sample SAS code that can be adapted to calculate replicate standard errors manually. (Although this document describes replicate weights in the Current Population Survey, the material on using replicate weights applies to ACS/PRCS replicate weights as well, with some adaptation.)
As of September 2011, SPSS (version 19.0) cannot handle successive difference replicate weights. SPSS does not allow for replicate-based variance estimation unless it performs the resampling itself.
Can I simply divide the full sample into 80 random subsamples from the full sample and calculate replicate standard errors manually?
No. Replicate weights contain full information about the complex sample design of the ACS/PRCS, and this information would be lost when drawing random subsamples. Furthermore, replicate samples incorporate information from all cases in the full sample. In contrast, random subsamples would each be 1/80th the size of a single replicate subsample.
How are the ACS/PRCS replicate weights calculated?
As mentioned, replicate weights in the ACS and PRCS are constructed using the successive difference replication method. This involves creating a k x k Hadamard matrix (where k is the number of replicate weights desired), assigning sample cases to rows in the matrix and calculating a replicate factor from the row values, and finally multiplying the full-sample weight by these replicate factors. For more details, see the Census Bureau's "Estimating ASEC Variances with Replicate Weights" document, written for the CPS, as well as the following:
- Fay, Robert, and George Train. 1995. "Aspects of Survey and Model-Based Postcensal Estimation of Income and Poverty Characteristics for States and Counties." Proceedings of the Section on Government Statistics, American Statistical Association, Alexandria, VA, pp. 154-159. (pdf)
- Wolter, Kirk. 2007. Introduction to Variance Estimation, 2nd ed. New York: Springer.