Coding for Occupation, Industry, and Class of Worker in the 1910 PUMS
- P25 OCC10 - Occupation 1910 classification
- P26 IND10 - Industry 1910 classification
- P27 OCC80 - Occupation 1980 classification
- P28 IND80 - Industry 1980 classification
- P29 EMPSTAT - Class of worker1
The 1910 Census was the first in the United States to provide space for separate entries for the activities now designated as occupation, industry, and class of worker. As the schedule reproduced in Volume III shows, queries on these activities, and two additional questions on unemployment, were all placed under the general heading "occupation" (a practice continued in the 1920 Census) but the nature of the distinction sought is clear from the headings of columns 18, 19, and 20, and the instructions to enumerators specifically use the terms "occupation" for column 18 and "industry" for column 19. No general term is used to describe column 20, which is headed "whether an employer, employee, or working on own account" and is discussed similarly in the instructions to enumerators; the term "class of worker" was first introduced in the 1930 Census.
In 1910, as in all censuses up until 1940, there were no preceding "sorter" questions, to determine whether or not the respondent was (or had ever been) economically active, before the questions on occupation, industry, and class of worker were asked. The instructions to enumerators in 1910 (see Volume III, 1910 Enumerator Instructions) make clear that this sorting function was to be performed by column 18, the occupation question: "An entry should be made in this column for every person enumerated," with the entry falling in one of three categories, (1) the occupation pursued, (2) an entry of "own income" for persons following no specific occupation but having an independent income upon which they live, or (3) an entry of "none" for persons not falling in either of the two previous categories. Columns 19 and 20 were to be left blank when the entry in 18 was either "own income" or "none." Further instructions specified that retired persons were to be included in either the "own income" or "none" categories but that for those temporarily unemployed, "the occupation followed when the person is employed" should be entered.
Noteworthy in the general instruction is the stress on every person. The 1900 instructions had specified that the occupation item was to be filled only for persons 10 years of age and over with an occupation2 and instructions for other items in 1910 (for example, whether able to read or write) specify 10 as a lower age limit, so it is clear that the intent was to have comprehensive coverage in item 18. Published data for 1910 present gainful worker tabulations only for persons 10 years of age and over but a considerable literature has developed on the effect of the broad character of the 1910 instructions, particularly on the proportion of women counted as economically active.3
Despite the instructions, however, it is apparent that most enumerators continued to regard the occupation item as relevant only for the gainfully occupied population. Our sample shows item 18 as blank for about half of all respondents, with a small proportion of these cases having an entry recorded in item 19.
Coding in 1910
Two indexes presenting the coding structure used by the Census Bureau in processing the 1910 returns are available.4 Figures F1 and F2 are sample pages from these indexes. The "punch symbol" shown on figure F1 gives the code to be used for the listed entry when the occupational title shown occurs in the industry specified. Figure F2 shows the rearrangement of these codes by industry, referred to as the Classified Index. Thus the first line on the right hand side of Figure F1, "0-5 x-O Trucker...Boat company" is found in Figure F2 in the industry "x-0 WATER TRANSPORTATION" as one of several titles included in "0-5 Stevedores."
The system differs markedly from those currently developed by the Bureau of the Census because the symbol 0-5 cannot be used without its accompanying industry symbol; 0-5 refers to an entirely different activity when accompanied by another industry symbol (for example, 0-5 73 is the symbol for "Starchers...Cotton mill").
An intriguing aspect of the 1910 Census is that, except for data for the country as a whole, the results of this elaborate classification scheme were never published in the form in which they were coded. Described below are the procedures we have used for "translating" the coded data into the format published for all sub-national areas in 1910.
There is some dispute in the historical sources about the exact nature of the coding operation. According to Truesdale (1965, p.134) the punch card operators were given a list of codes for common titles and were to punch these directly onto the Hollerith card for the individual respondent. Entries not appearing on the list were to be skipped, for later processing by expert coders. The introduction to the alphabetical index, however, contains "Instructions to coders" which imply a different procedure... . Coders were to enter the code for each schedule return corresponding to a line in the coding index and to refer the item to a supervisor if unable to find an exact match for the occupation/industry title in the index. Codes were to be entered at a specified schedule location. Although it is conceivable that both procedures could have been followed, examination of the schedules suggests that coding was probably a separate operation that preceded punching. About half of the occupation/industry entries have codes written on the census manuscripts and, in general, the uncoded lines appear to be the more difficult coding problems.
Coding in the Public Use Sample
Occupation and industry returns for the 1910 Census have been coded by both the 1910 and the 1980 classification systems for the Public Use Sample. The decision to use the 1980 systems, rather than the 1950 systems used for other historical public use samples, was made in accordance with the recommendation of the 1983 joint SSRC Census Subcommittee on Comparability of Occupation measurement which concluded that the most desirable procedure in preparing historical series is to code forward to the most recent system.
Some minor differences in work force status occur under the two classification systems. In terms of numbers, the largest of these is the manner in which activities of the institutionalized population are counted. According to 1910 rules, if an activity in the institution was reported for an inmate, for example, prison work on a road gang, that activity was to be coded and the person counted in the gainfully occupied category. In 1980, the institutionalized population is, by definition, not in the labor force. A second (very small) difference is in the treatment of women members of religious orders. If such a person reported an occupation (as "teacher," "nurse") in 1910, she was coded to that occupation and considered gainfully occupied; if, however, only the term "nun" (or similar) appeared on the schedule, the person was excluded from the gainfully occupied. For 1980, cases of the first type are treated as in 1910, but a return of "nun" also qualifies as an economic activity, receiving the code for religious worker.
We have followed the rules governing each system, a procedure that has resulted in small discrepancies in the count of total economically active persons as between variables P25 OCC10 and P26 IND10, on the one hand, and P27 OCC80 and P28 IND80, on the other.
Two further points may be of interest to users who wish to compare P27 and P28 with occupation and industry data for recent censuses. In the 1980 Census reports, although members of the armed forces are included in the labor force, they do not appear in the occupation or industry tables. For simplicity of processing, we have added a code (940) to both 1980 classification systems and assigned this to entries indicating armed forces personnel. A second departure from the standard census classification affects only P28: we have added a code (038) for persons in mining where the type of mine could not be determined.
Except as noted below, coding under the two systems (1910 and 1980) was carried out independently for each. The basic items used in both operations, however, are identical: the written entries in column 18, in column 19, and in column 20 were keyed in exactly as they appeared on the schedule for each respondent. Where available, the hand written codes appearing on the schedule were also keyed on the archival tape but these codes were not used in the general processing of the PUS. In our judgment it was desirable to use a consistent set of rules for all entries; moreover, as noted above, the status of the hand written codes (i.e. whether they were, in fact, the codes used for the 1910 tabulations) is unclear.
A series of coding "dictionaries" were produced for entry into the computer. These were derived from actual schedule entries rather than from any prior expectations of what those entries might be. The computer was instructed to search each dictionary in a specified order until an exact match for the particular case under search was found. Once found, the proper codes were to be entered for that case and the next case was to be searched. If no exact match was found, the schedule entries for that case were placed in a special file of residuals for subsequent coding. As these residuals were coded they were, in turn, entered in a dictionary for use in further processing.
In developing these dictionaries, only the three items of concern were used. For example, an entry "Carpenter" (col. 18), "Building" (col. 19), "O" (Col. 20) received the codes 122 98 2 in the 1910 dictionary and 567 060 2, in the 1980 dictionary. In principal, of course, this is the proper way to code, that is, other information on the schedule should not be consulted because it may provide an opportunity for preconceptions to bias coding decisions. A commonly cited instance of this is the general instruction in some early censuses to exercise caution in coding women to occupations not usually pursued by them (see, for example, U.S. Bureau of the Census 1933, p.9), a practice that some scholars believe may have biased the published data. In the event, however, information relevant to certain coding issues was lost by this procedure, a situation necessitating additional manipulation of the data after the computer coding was completed. These additional procedures are described below.
One dictionary included special codes for persons whose entries indicated that they were not economically active (for example, "retired"). Initially, a number of categories were distinguished in the expectation of presenting a distribution of major noneconomic activities similar to that available for the 1940 and subsequent censuses. As we proceeded, however, it became obvious that, in view of the widespread omission of any entry, the frequencies yielded by these special codes would not generally be helpful. We therefore have retained only three of these "noneconomic" codes for variables P25, P26, P27, and P28:
a-2: Because of the particular interest in the classification of women's activities this code has been attached to entries of "housework," "housekeeper," etc. where, as a consequence of reviewing other items on the schedule (c.f. below) we have judged the activity to be "noneconomic" under the definitions used in census procedures. The user should remember, however, that for the great majority of women engaged in the "noneconomic" activity of taking care of their own households, there was no entry in items 18, 19, or 20 of the schedule and they are therefore not included in this category.
b-5: This code distinguishes inmates of correctional institutions. The frequencies shown differ as between variables P25 and P26, on the one hand, and variables P27 and P28, on the other, because of varied treatment of institutionalized persons in the 1910 and 1980 classification systems as noted above.
c-5: A few schedules carried the term "Ration Indian" in items 18 and/or 19. We have assumed the reference was to a special type of dole and that these persons were not engaged in an "economic" activity; it seemed wise, however, to distinguish them separately.
All other persons judged not to be economically active, either because of the nature of the entry in items 18/19 or because those items were blank, were assigned the code -1 for variables P25, P26, P27, and P28.
After both the 1910 and the 1980 codes had been assigned to all sample cases by the computer dictionaries certain special "edit" tabulations were performed in order to check and further refine the coding. In these procedures, codes assigned under both classification systems, as well as the written entries, were displayed jointly since the situations under examination were usually common to both systems.
1. All persons with codes of 5000 or above in variable P06 REL, Relationship to head, were listed by the written REL term, with their O and I codes and entries, Type of Group Quarters if any (H13 GQPUS), and various other relevant and identifying information. The procedure revealed a number of instances in which items 18 and 19 were blank on the schedule and where, therefore, the computer assigned codes were -1. If REL indicated an economic activity, these cases were assigned O and I codes appropriate to that activity (modified by GQPUS where relevant) in both systems. For example, REL entry of "Servant," with blanks in items 18 and 19, and with GQPUS -1 (not a group quarters) received codes 03 7X for 1910 and 407 761 for 1980. If GQPUS indicated that the dwelling unit was a hotel, the 1980 codes became 469 762; the 1910 codes, however, were again 03 7X since the 1910 classification system does not distinguish servants in private households from other servants.
In cases where REL indicated that the respondent was an inmate of an institution and the computer dictionaries had placed the person among the economically active, the 1980 codes were altered to the appropriate noneconomic designation. REL was also useful in distinguishing hired housekeepers from family members performing home household tasks. The computer dictionaries coded entries of "Housework at home" as -2 in variable P25-P28; when REL showed an employee relationship the O and I codes were changed to the appropriate economically active designation. Similarly, REL became an indicator of hired vs. family farm laborers in cases with an industry entry of "Home farm" (see below for further discussion of this).
In general, however, where items 18/19 were not blank we retained the codes appropriate to those entries even if REL appeared to be in conflict with them (for example, REL "Servant," items 18/19 "Tailoress Shop").
2. A second tabulation took H13 (Type of Group Quarters) as the primary sort and listed persons with REL of less than 5000 (including 1, -2, -3) who were resident in Group Quarters. The major purpose here was to identify inmates of correctional facilities (or other institutions) and members of the armed forces where these statuses had not been revealed by REL or by the entries in items 18/19.5
The tabulation was also helpful in enabling us to code industry in some cases where item 19 had been blank (for example, "Laborer" with no industry but with GQPUS "Construction site").
3. The classification system and enumeration procedures in 1910 do not provide a way of distinguishing unpaid family workers from wage and salary workers. As the instructions to enumerators make clear (Volume III: 1910 Enumerator Instructions, paragraph 176), both types of work attachment were to be entered as "W" in column 20. For farm workers, however, a distinction was to be made between women and children working on the "home farm" and those "working out" (Volume III: 1910 Enumerator Instructions, paragraph 154-155). This distinction was carried forward to the coding index which gave different codes for "Laborer Farm" and "Laborer Home Farm;" the published report from the 1910 Census also shows separate lines for "Farm Laborers (home farm)" and "Farm Laborers (working out)." Unfortunately, some ambiguity occurs in the instructions to coders on how to handle returns of "Home farm." Page 5 of the introductory text to the alphabetical index ...implies that the entry "Hired man-home farm" should receive the "Laborer-home farm" code; the classified index, on the other hand, makes it clear that only members of the family should receive this code. Moreover, the manuscript schedules suggest that enumerators did not use the "Home farm," "Working out" distinction consistently. Faced with these assorted problems, we elected to emphasize the distinction between family members and nonfamily members. Although this distinction is clearly not entirely comparable with the later census category of unpaid family workers in agriculture, it is probably closer to it than any other coding rule that might be used. As noted above, then, we used the REL tabulation to recode farm laborers not related to the household head if the computerized dictionary had placed them in the family worker category. Where the entry specified that a family member was "Working out" we of course continued to classify the person in the nonfamily worker category.
Since the 1980 occupation classification does not distinguish between family and nonfamily farm laborers (the published distinction is based on the class of worker rather than the occupation code), the issues discussed above do not affect variable P27.
4. The several censuses prior to 1910 carried a substantial category labeled "Laborers, not specified" for returns on which only the term "Laborer" appeared. This category was particularly troublesome for analysts interested in tracing the shift from agricultural to nonagricultural activities.6 In an attempt to deal with the problem, the 1910 coders were instructed to assign such entries to "farm laborer" for persons "living in an unincorporated place" (Section C, p.6). We adopted this procedure. The computerized dictionaries coded all not specified laborers to the category "General and not specified laborers" under "Laborers, Building and hand trades" (155 98) for 1910 and to "Laborers, except construction" (889) in "Industry not reported" (990) for 1980. Subsequently, the editing procedure moved those in unincorporated places to the farm laborer category in each system, with the residue remaining in the dictionary specified categories.
Translating Coded Data into Occupation Data Published for 1910
As noted above, the coding structure used in processing returns for 1910 was published in full array only for the country as a whole (U.S. Bureau of the Census. 1914, Table VI).7 For all subnational areas, the general format used in prior and succeeding censuses (through 1930) was followed, that is, a list of specified occupations grouped in broad sector categories is presented. This format, displayed in the Appendix, represents an extensive rearrangement and condensation of the data as originally coded. An occupation coding index for this arrangement was published in 1915 but no documentation on how the original (1910) codes were translated into the published arrangement is available. Since, however, national data are available in both formats it is possible to produce a relatively complete translation by comparing the two sets for the United States, a laborious and time consuming process much of which, fortunately, had been done a number of years ago (Palmer and Ratner 1949).
An example may help to clarify the procedure: Figure 1 shows that a return of "Truckers" in column 18 and "Boat company" in column 19 is to be coded 05 X0; figure F2 shows that items coded 05 in industry XO (water transportation) are in the more general rubric "Stevedores." Figure F3, which is from the index published in 1915 (but in the published 1910 format), shows that "Stevedore" is in code category 504. The coded data for P25 is in this format.
The index reproduced in Figure F1 is the basic instrument that we (and the Bureau's coders in 1910) used in processing the 1910 PUS and the "punch symbol" shown for industry there (in our example, X0) is the code shown as the 1910 industry code in the Appendix-and appearing as P26 on the PUS tape.8 It corresponds to the industry order in which data for the United States were published (U.S. Bureau of the Census, 1914. Table V1).
P29 EMPSTAT has been coded directly from the schedule entry, without editing. Examination of the schedules suggests certain problems that users should be aware of. The use of "employee" in the heading of column 20 and extensive reference to employees in the instructions to enumerators (and coders) appears to have led some enumerators to use E or Emp instead of W as the entry for such workers with the consequence that the number of employers is probably over-stated and the number of wage and salary workers understated. Other problems of interpretation also can be expected with this variable since this was the first attempt at its measurement. Later analysis may provide better insight into the success with which class of worker was distinguished.
- Michael A. Strong, et al., "Occupation, Industry and Class of Worker," User's Guide: Public Use Sample, 1910 United States Census of Population, Philadelphia: Population Studies Center, University of Pennsylvania, 1989, pp. 62-72.
- Another instructional item in 1900 amended this to include entries of "at school" for persons attending school regardless of age.
- For some early discussions see A. M. Edwards, "Comparative Occupation Statistics for the United States, 1870 to 1940," Sixteenth Census of the United States: 1940, Population, Washington: Government Printing Office, 1943, pp. 137-138; and A. J. Jaffe, "Trends in the Participation of Women in the Working Force," Monthly Labor Review, 1956, 79(5):559-565.
- As far as we know, the only extant copies of these indexes are those in the Bureau of the Census Library. The Library kindly loaned these to us so that we might make xerox copies. Exact references are given on the figure F1 and F2.
- Insofar as inmates were contract workers and reported as resident on the work site, the procedures do not assure that we obtained a complete (sample) count of the institutionalized population. Moreover, with regard to the 1910 classification, it is certainly possible that some inmates reported a pre-institutional activity and therefore are also misclassified.
- See Edwards (1943, pp. 141-143) for discussion.
- Even in this table certain combinations of residual categories were made.
- The alpha designation appearing as a third "digit" for certain codes (for example, 00A) was added by the Bureau, after publication of the 1910 Indexes, for technical reasons.