1940 Sampling Procedures1
Sample Selection from Microfilmed Population Schedules
Population schedules from the 1940 census are preserved on 4,576 one hundred foot reels of 35mm microfilm. The original schedules of the 1940 Census of Population and Housing and the punch cards produced from them were destroyed. Copies of the microfilm are stored at the National Archives in Washington, D.C. and the Personal Records Service Branch of the Bureau of the Census in Pittsburg, Kansas. All microfilm processing for the public use sample project was performed by the Census Bureau at its Pittsburg, Kansas, facility. The 4,576 microfilm reels were randomly assigned to 20 subsamples. Final sampling of cases for transcription was conducted independently within each of the 20 subsamples. This was done because cost estimates were uncertain and working with 20 independent replicates provided a means of coping with a premature end of the sampling and data collection. The microfilm is organized alphabetically by state, within states alphabetically by county, and within counties numerically by enumeration district. Thus, each subsample is a representative, albeit clustered, sample of the United States population.
The 20 subsamples were processed separately and public use sample records are sequenced by subsample. The household record item SUBSAMP identifies each subsample. The value in SUBSAMP does not indicate the order of processing; the original subsample numbers have been reassigned to protect the confidentiality of the microfilm reels. On the microfilm, the population schedules within enumeration district are arranged by sheet number with the "A" side followed by the "B" side. Sheets numbered 1 through 60 list households that were contacted during the original enumeration canvass. Sheets 61 through 80 contain individuals and entire households that were missed during the original tour. Sheets 81 through 100 were used to enumerate persons living in transient types of dwellings (hotels, tourist facilities, flophouses). (See chapter 1 of the technical documentation for a summary of enumeration procedures and chapter III of the Procedural History of the 1940 Census for a detailed description).
Within an enumeration district, households were numbered in order of visitation. This number is recorded in column 3 ("household visitation number") of the population schedule. Persons listed on sheets 61 through 80 who were members of households that were listed on Sheets 1 through 60 were listed with the same household visitation number as the originally listed household.
Sampling Procedures for Household Selection
The sampling procedure was designed to produce a household sample with each sample household containing one member who answered the supplementary questions at the bottom of the population schedule. A systematic random sample of population schedules within an enumeration district was made to select a particular population schedule. A random selection of one of the two supplementary questions lines at the bottom of the population schedule was then made. This systematic random sample of 1 in 5 of all supplementary question lines, i.e., 20% of the 5% census sample, provided an overall sample of 1 in a 100 supplemental question lines. The household of the person listed on the selected supplementary line was made the "target" household for selection.
The probability of including the target household was calculated as the inverse of the number of persons included in the original listing of the household, i.e., a target household of size "h" was selected for the sample with probability l/h for h = 1, 2, ... 7. Households with eight or more persons were selected with a probability of one in seven to insure an adequate number of observations of large households. Since the chance that a household had one of its members listed on the supplementary question line is proportional to the household's size, this 1 in h selection probability provided an overall 1 in 100 sample of households and their members for households with seven or fewer persons. As an illustration of the selection procedure, five person households that contained a person on a selected supplementary question line were retained in the public use sample at a rate of one in five after a random start for the first selection. For example, if the random start value were two, then the second, seventh, twelfth, etc. five person households were included in the sample.
All single person target households were selected with a probability of one. If the target supplementary line person lived not in a household but in "group quarters" (persons living in institutions, transient type dwelling units, and persons living in households with five or more persons who are unrelated to the household head), the selection probability was one.
Operation of the sampling procedure was directed by a computer program which provided instructions to the sampling clerks. The clerks, sitting at stations with a microfilm reader and video display terminal, were instructed to find a designated population schedule (based on the enumeration district number and the "sheet number" entry in the heading section of the population schedule) and a designated supplementary line. The clerk then determined the type of household of the person on the supplementary line. If there were five or more unrelated individuals in the household listing, it was designated a group quarters. If the target supplemental question line person lived in an institution (based on the entry in the "institution" item in the heading of the population schedule or the relationship description in column 7), or was a transient (listed on population schedules numbered 81 through 100), the person was designated as a resident in group quarters. For target line persons in group quarters, the person record was automatically selected for inclusion in the public use sample. For persons in regular private households, the clerk entered into the computer the number of lines used to list the private household. The computer calculated the number of persons listed in the household and determined whether the household was selected for inclusion in the sample. If not, the computer instructed the clerk to proceed to another designated population schedule and supplementary line.
If the target supplemental question line person was selected into the sample, the computer then instructed the clerk to transcribe the items from the population schedule for all members of selected households and the single person record from all nonregular households (group quarters, institutions, transients).
Upon reaching the end of the microfilmed population schedules for an entire enumeration district, the clerk was instructed to make a second pass through the population schedule sheets numbered 61 through 80. The purpose of the second pass was to identify and transcribe data for persons from selected households who were not enumerated with the main body of the household on sheets 1 through 60. Household visitation numbers or surnames, if the household visitation number was missing from sample households on sheets 1 through 60 and on sheets 61 through 80, were matched to determine if any persons listed on sheets 61 through 80 were part of selected households. If persons on sheets 61 through 80 were found to be members of previously selected households, the data for these persons were transcribed and later merged with the data for the rest of the household.
- U.S. Department of Commerce, Bureau of the Census, "Chapter 2, Sample Selection and Data Processing Procedures," Census of Population, 1940: Public Use Microdata Sample, Technical Documentation, Washington, D.C., 1983, pp. 2.1-2.3.