Excerpts from The User's Guide for the Oversample of Black-Headed Households 1910 United States Census of Population1
[NOTE: The Black Oversample has been reformatted using IPUMS codes. Variable-specific information is available in the Data Dictionary in IPUMS-98 Volume 1: User's Guide.]
In order to facilitate analysis of the status of African Americans around the turn of the century in the U.S., a sample of African American-headed households was taken from the 1910 census manuscripts. This sample complements the 1/250 sample of the 1910 census manuscripts gathered and produced at the University of Pennsylvania. This African American oversample, was collected by the same personnel using the same procedures as the general 1/250 sample (Strong et al., 1989). The oversample of African Americans was funded by a grant to Douglas Ewbank (NICHD 1-ROl- HD18651); the creation of the Public Use Tape was funded by grants to S. Philip Morgan from the Research Foundation of the University of Pennsylvania and by NICHD (NICHD 1-R01- 0-25856).
Sampling of the 1910 census manuscripts was carried out using a computer-generated reel, sequence, page and line number. If the person at this micro-film location was a head of a household (and, in the case of the oversample, an African American household head) then the household was included in the sample. Ewbank drew the African American oversample in order to make infant mortality estimates by state. The oversample was drawn from counties with at least 10 percent of the population African American (negro, black or mulatto); and only from states where a reasonably large number of counties had this proportion of African Americans. (The restriction on using only counties with 10% of the population African American was imposed to reduce the costs of searching for sampling points that infrequently produced usable, i.e. African American -headed, households.) In addition, states with large concentrations of Blacks (Alabama, Georgia, South Carolina, Mississippi) were not oversampled because the 1/250 1910 PUS provides sufficient cases for most analyses.
The oversample can be combined with the 1/250 PUS sample by differential weighting of households (or individuals) by county of enumeration as described in a following section.
Number of African American households in the 1910 PUS and in the oversamples
The tables below show the distribution of households by race and "sample area". The sample area refers to whether the household was located in a county that was included only in the 1/250 sample (1/250), included in the 1/250 sample and in oversample 1 (over 1), or was included in the 1/250 sample and in oversample 2 (over 2). There are three tables: the first shows the frequencies in the African American oversample the households (N-5533) added by these oversamples. The second shows the frequencies in the 1/250 PUS sample. These are households available from the earlier released, nationally representative sample. The third table shows the frequencies in the combined data sets.
|1/250||Over 1||Over 2||1/250||Total|
|Race of Head|
|Race of Head||1/250||Over 1||Over 2||1/250||Total|
|Race of Head||1/250||Over 1||Over 2||1/250||Total|
Items H05 to H07 in the codebook show the states from which this oversample was taken: Maryland, Virginia, North Carolina, Florida, Kentucky, Tennessee, Arkansas, Louisiana and Texas. The four states with the largest population of Blacks were South Carolina, Alabama, Mississippi and Georgia, and these were excluded from the oversample. Counties with over 10 percent African American populations in Maryland, Kentucky, Texas were sampled using a 0.01 sampling fraction. Such counties in other states (Virginia, North Carolina, Florida, Tennessee, Arkansas) were sampled using a 0.005 sampling fraction. Louisiana has some counties sampled at .005 and some at .01 fractions because it was used to test optimum sampling fractions.
The table below shows a sample produced by combining the 1910 PUS with the oversamples provided on this tape. Maryland is represented by 191 African American headed households in the 1910 PUS. These households are in counties that were also included in oversample 2. This oversample provides an additional 489 households. There are 13 Virginian households in the 1910 PUS which are in counties not oversampled (counties that contained fewer than 10% African Americans) and 574 are from counties oversampled. The oversample provides data on an additional 528 households. The total number of households added by oversample 1 is 3870 and by oversample 2 is 1663.
|State or Region||Not oversamp||over samp 1||over samp 2||White||over samp 1||over samp 2||Total|
The original 1/250 PUS file was a self-weighted sample of all households of the U.S., whereas this sample is an oversample of only some counties. The variable H26 is on both this tape and the 1910 PUS tape and indicates which households (regardless of race of household head) were in counties that were oversampled. Note that this variable is not a weighting variable and (in the case of the 1910 PUS) note that it does not imply that this household (individual) was chosen in the oversample. But this variable can be used to assign appropriate weights as will be described below.
The oversample, as stated above, was taken with two different sampling fractions: 0.005 and 0.01. A value is assigned to the household to indicate whether it was selected from a county which was oversampled, and if so, which oversample it was taken from. Clearly, there will be black-headed households in the main PUS which were taken from the oversample counties, but which were in neither part of the oversample. But the same weights (the inverse of the sampling fractions) should be assigned to all African American households in a given county regardless of the sample that produced this household record. The table below shows the frequencies of households in the main sample (by race) and the oversamples by the CNTYWT variable.
|CNTYWT||1/250||Over 1||Over 2||1/250||Total|
|CNTYWT||1/250||Over 1||Over 2||Overall|
If the household head is not African American then a weight of 1.0 should be assigned (regardless of the value of CNTYWT). If the household head is African American, then the sampling fractions for counties coded 1, 2 and 3 are .004, by the ratio of these sampling fraction and adjusted by some constant 'x'.
If we let the weight for CNTYWT 1 = 1, values for 2 = (.004/.009=) .444, and the weight to be assigned to CNTYWT 3 = (.004/.014=) .29. These ratios of weights must be maintained in order to maintain "equal probability of selection". But the weights can be shifted by different factors to achieve different aims. In order to produce a nationally representative sample of African Americans (where the mean weight is 1.0) using the 1910 PUS and these oversamples the following should be assigned to CNTYWT values 1-3, 1.633, .726 and .466. To provide a nationally representative sample (of all races), these weights would have to be adjusted so that the weighted N is equal to the observed N in the 1910 PUS N (adjust the above weights by a factor of .6236). Weighting could also be done at the state or regional level.
Note also that these samples are of households and that a few whites are found in black-headed households. As a result some white individuals will have weights other than 1.0.
1. Excerpted from Mark Hereward, S. Philip Morgan and Douglas Ewbank, User's Guide, Oversample of Black-Headed Households, 1910 United States Census of Population, Philadelphia, PA: University of Pennsylvania, Population Studies Center, 1990.
Michael A. Strong, et al., "Occupation, Industry and Class of Worker," User's Guide: Public Use Sample, 1910 United States Census of Population, Philadelphia: Population Studies Center, University of Pennsylvania, 1989.