Excerpts from The User's Guide for the Oversample of Black-Headed Households 1910 United States Census of Population1

Go Back to IPUMS Documentation Index

[NOTE: The Black Oversample has been reformatted using IPUMS codes. Variable-specific information is available in the Data Dictionary in IPUMS-98 Volume 1: User's Guide.]

Introduction

In order to facilitate analysis of the status of African Americans around the turn of the century in the U.S., a sample of African American-headed households was taken from the 1910 census manuscripts. This sample complements the 1/250 sample of the 1910 census manuscripts gathered and produced at the University of Pennsylvania. This African American oversample, was collected by the same personnel using the same procedures as the general 1/250 sample (Strong et al., 1989). The oversample of African Americans was funded by a grant to Douglas Ewbank (NICHD 1-ROl- HD18651); the creation of the Public Use Tape was funded by grants to S. Philip Morgan from the Research Foundation of the University of Pennsylvania and by NICHD (NICHD 1-R01- 0-25856).

Sampling of the 1910 census manuscripts was carried out using a computer-generated reel, sequence, page and line number. If the person at this micro-film location was a head of a household (and, in the case of the oversample, an African American household head) then the household was included in the sample. Ewbank drew the African American oversample in order to make infant mortality estimates by state. The oversample was drawn from counties with at least 10 percent of the population African American (negro, black or mulatto); and only from states where a reasonably large number of counties had this proportion of African Americans. (The restriction on using only counties with 10% of the population African American was imposed to reduce the costs of searching for sampling points that infrequently produced usable, i.e. African American -headed, households.) In addition, states with large concentrations of Blacks (Alabama, Georgia, South Carolina, Mississippi) were not oversampled because the 1/250 1910 PUS provides sufficient cases for most analyses.

The oversample can be combined with the 1/250 PUS sample by differential weighting of households (or individuals) by county of enumeration as described in a following section.

Number of African American households in the 1910 PUS and in the oversamples

The tables below show the distribution of households by race and "sample area". The sample area refers to whether the household was located in a county that was included only in the 1/250 sample (1/250), included in the 1/250 sample and in oversample 1 (over 1), or was included in the 1/250 sample and in oversample 2 (over 2). There are three tables: the first shows the frequencies in the African American oversample the households (N-5533) added by these oversamples. The second shows the frequencies in the 1/250 PUS sample. These are households available from the earlier released, nationally representative sample. The third table shows the frequencies in the combined data sets.

Table 1:
Frequencies of Households in the Oversample.
African American Others
1/250 Over 1 Over 2 1/250 Total
Race of Head
-3 Unknown
-2 Illegible
-1 Blank
0 White
1 Black 3160 1390 4550
2 Mulatto 710 273 983
3 Indian
4 Japanese
5 Chinese
6 Hawaiian
7 Other
Overall 3870 1663 5533

Table 2
Frequencies of Households Only in the 1/250 PUS
African American Others
Race of Head 1/250 Over 1 Over 2 1/250 Total
-3 Unknown 1 1
-2 Illegible 13 13
-1 Blank 38 38
0 White 78,721 78,721
1 Black 4,244 2,587 623 7,454
2 Mulatto 893 666 152 1,711
3 Indian 274 274
4 Japanese 232 232
5 Chinese 160 160
6 Hawaiian 45 45
7 Other 165 165
Overall 5,137 3,253 775 79,649 88,814

Table 3
Frequencies of Households in the Combined PUS and Oversample
African American Others
Race of Head 1/250 Over 1 Over 2 1/250 Total
-3 Unknown 1 1
-2 Illegible 13 13
-1 Blank 38 38
0 White 78,721 78,721
1 Black 4,244 5,747 2013 12,004
2 Mulatto 893 1,376 425 2,694
3 Indian 274 274
4 Japanese 232 232
5 Chinese 160 160
6 Hawaiian 45 45
7 Other 165 165
Overall 5,137 7,123 2,438 79,649 94,347

Sampling areas

Items H05 to H07 in the codebook show the states from which this oversample was taken: Maryland, Virginia, North Carolina, Florida, Kentucky, Tennessee, Arkansas, Louisiana and Texas. The four states with the largest population of Blacks were South Carolina, Alabama, Mississippi and Georgia, and these were excluded from the oversample. Counties with over 10 percent African American populations in Maryland, Kentucky, Texas were sampled using a 0.01 sampling fraction. Such counties in other states (Virginia, North Carolina, Florida, Tennessee, Arkansas) were sampled using a 0.005 sampling fraction. Louisiana has some counties sampled at .005 and some at .01 fractions because it was used to test optimum sampling fractions.

The table below shows a sample produced by combining the 1910 PUS with the oversamples provided on this tape. Maryland is represented by 191 African American headed households in the 1910 PUS. These households are in counties that were also included in oversample 2. This oversample provides an additional 489 households. There are 13 Virginian households in the 1910 PUS which are in counties not oversampled (counties that contained fewer than 10% African Americans) and 574 are from counties oversampled. The oversample provides data on an additional 528 households. The total number of households added by oversample 1 is 3870 and by oversample 2 is 1663.

Table 4
Number of Households in 1910 P.U.S. and Oversample, by State
1/250 P.U.S. Oversample
State or Region Not oversamp over samp 1 over samp 2 White over samp 1 over samp 2 Total
Missg/Milit 4 209 213
North East 76 6,436 6,512
New York 157 9,014 9,171
New Jersey 82 2,278 2,360
Pennsylvania 203 6,813 7,016
Mid-West 622 28,548 29,170
Delaware 25 180 205
Maryland 191 1,011 489 1,691
D.C. 84 248 332
Virginia 13 574 1,216 528 2,331
West Virginia 54 990 1,044
North Carolina 14 573 1,229 695 2,511
South Carolina 766 551 1,317
Georgia 1,034 106 1,205 2,345
Florida 297 398 401 1,096
Kentucky 21 235 1,827 565 2,648
Tennessee 34 410 1,535 546 2,525
Alabama 808 974 1,782
Mississippi 898 644 1,542
Arkansas 11 388 906 441 1,746
Louisiana 405 243 833 512 609 2,602
Oklahoma 125 1,400 1,525
Texas 36 606 2,721 747 4,110
West 70 8,483 8,553
ALL 5,137 3,253 775 79,649 3,870 1,663 94,347

Weighting strategy

The original 1/250 PUS file was a self-weighted sample of all households of the U.S., whereas this sample is an oversample of only some counties. The variable H26 is on both this tape and the 1910 PUS tape and indicates which households (regardless of race of household head) were in counties that were oversampled. Note that this variable is not a weighting variable and (in the case of the 1910 PUS) note that it does not imply that this household (individual) was chosen in the oversample. But this variable can be used to assign appropriate weights as will be described below.

The oversample, as stated above, was taken with two different sampling fractions: 0.005 and 0.01. A value is assigned to the household to indicate whether it was selected from a county which was oversampled, and if so, which oversample it was taken from. Clearly, there will be black-headed households in the main PUS which were taken from the oversample counties, but which were in neither part of the oversample. But the same weights (the inverse of the sampling fractions) should be assigned to all African American households in a given county regardless of the sample that produced this household record. The table below shows the frequencies of households in the main sample (by race) and the oversamples by the CNTYWT variable.

Table 5
Frequency of Households in the PUS and Oversample, by Race and County Weight
African American Others
CNTYWT 1/250 Over 1 Over 2 1/250 Total
1 5,137 71,207 76,344
2 3,253 3,870 5,882 13,005
3 775 1,663 2,560 4,998
ALL 9,165 3,870 1,663 79,649 94,347
Table 6
Sampling Fractions
(1) (2) (3) (4)
CNTYWT 1/250 Over 1 Over 2 Overall
1 .004 .004
2 .004 .005 .009
3 .004 .010 .014

If the household head is not African American then a weight of 1.0 should be assigned (regardless of the value of CNTYWT). If the household head is African American, then the sampling fractions for counties coded 1, 2 and 3 are .004, by the ratio of these sampling fraction and adjusted by some constant 'x'.

If we let the weight for CNTYWT 1 = 1, values for 2 = (.004/.009=) .444, and the weight to be assigned to CNTYWT 3 = (.004/.014=) .29. These ratios of weights must be maintained in order to maintain "equal probability of selection". But the weights can be shifted by different factors to achieve different aims. In order to produce a nationally representative sample of African Americans (where the mean weight is 1.0) using the 1910 PUS and these oversamples the following should be assigned to CNTYWT values 1-3, 1.633, .726 and .466. To provide a nationally representative sample (of all races), these weights would have to be adjusted so that the weighted N is equal to the observed N in the 1910 PUS N (adjust the above weights by a factor of .6236). Weighting could also be done at the state or regional level.

Note also that these samples are of households and that a few whites are found in black-headed households. As a result some white individuals will have weights other than 1.0.


1. Excerpted from Mark Hereward, S. Philip Morgan and Douglas Ewbank, User's Guide, Oversample of Black-Headed Households, 1910 United States Census of Population, Philadelphia, PA: University of Pennsylvania, Population Studies Center, 1990.

Works Cited:

Michael A. Strong, et al., "Occupation, Industry and Class of Worker," User's Guide: Public Use Sample, 1910 United States Census of Population, Philadelphia: Population Studies Center, University of Pennsylvania, 1989.

Go Back to IPUMS Documentation Index