Replicate Weights in the American Community Survey / Puerto Rican Community Survey

Go Back to IPUMS-USA Documentation

Summary

Why might I want to use replicate weights?

Replicate weights allow researchers to obtain confidence intervals and run significance tests for estimates that parallel the balanced half-sample procedure used by statisticians at the Census Bureau. By dividing surveyed strata into multiple balanced half-samples, a single sample is made to simulate a range of possible alternative samples, generating more informed standard error estimates that mimic the theoretical basis of standard errors while retaining all information about the complex sample design.

What are replicate weights?

Calculating standard errors with balanced half-samples requires knowledge of survey design parameters, specifically the Primary Sampling Unit (PSU) and stratum for each respondent. However, these are typically kept confidential to safeguard individuals' identities. Replicate weights solve this dilemma by containing the necessary information to derive the Census Bureau’s standard error calculations without requiring access to PSU or strata identifiers. Each set of weights represents a balanced half-sample created by the Census Bureau using Successive Difference Replication, allowing researchers to replicate calculations without compromising the identity of respondents.

Replicate weights are currently available for the 2005-onward American Community Survey and Puerto Rican Community Survey data. In the ACS and PRCS, there are 80 separate replicate weights at the household and person levels.

Does using replicate weights make any substantive difference?

In IPUMS testing of ACS/PRCS data, replicate weights usually increase standard errors. This increase is generally not large enough to alter the significance level of coefficients, though marginally significant coefficients may become clearly nonsignificant. The more obvious effect of using replicate weights is on the width of confidence intervals, which can change substantially.

How do I obtain replicate standard errors from IPUMS-USA data?

There are 3 main steps:

  1. Run your analysis using the full-sample weights (PERWT and HHWT are the main IPUMS-USA weights). Record the statistic you are interested in (e.g., the mean income of veterans, or the coefficient describing the relationship between income and whether one has health insurance coverage).
  2. Run your analysis again using each set of replicate weights. First, run the analysis using REPWTP1, then again using REPWTP2, then again using REPWTP3, and so on up to the final set of replicate weights. After each set, record the statistic you are interested in. (N.B.: If you are analyzing a household-only file, be sure to use REPWT1, REPWT2, etc.)
  3. Insert the above results into the following formula:
    Formula image
    where X is the result from the analysis using the full-sample weight and Xr is the result from the analysis using the r-th set of replicate weights.

Is there any way to do this automatically in major statistical packages?

Yes. Although the replicate weights contained in the IPUMS-USA data are calculated using the successive difference replication method, there are multiple equivalent methods for implementing these as survey design parameters.

R

To use IPUMS-USA replicate weights in R, you must use the srvyr package.

install.packages("srvyr")
library("srvyr")

Next, you'll create a survey object using the replicate weights.

svy <- as_survey_rep(data, weight = PERWT,
        repweights = matches("REPWTP[0-9]+"),
        type = "ACS", mse = TRUE)

Any calculations you'd like to make with the replicate weights should be done with the object 'svy' instead of the object 'data'.

svy %>%
        group_by(RACE)%>%  
        summarize(mean_age = survey_mean((AGE), vartype="ci"))

Stata

To use IPUMS-USA replicate weights in Stata, you must first svyset the data.

. svyset[pweight=perwt], vce(sdr) sdrweight(repwtp1-repwtp80) dof(79) mse

Earlier versions of Stata (versions 11.0 and before) can also handle successive difference replicate weights. Correspondence with StataCorp statisticians and IPUMS testing revealed that successive difference replicate weights can be treated as Jackknife replicate weights if the options are specified correctly.

. svyset [pw=perwt], jkrweight(repwtp1-repwtp80, multiplier(.05)) ///
    vce(jackknife) dof(79) mse