Occupational Coding in the 1880 Public Use Microdata Sample 1
Francis Walker, Superintendent of the Censuses of 1870 and 1880, considered the question of employment "one of the most important questions of the schedule," and cautioned enumerators to "make a study of it." Enumerators were instructed to report "the occupation of each person ten years of age and upward," and Walker included a series of directions to ensure that enumerators recorded occupations reliably and consistently. Occupations were to be reported in detailed rather than in vague or general terms. "Call no one a 'factory hand' or 'mill operative,'" he instructed. "State the kind of mill or factory." Women's occupations were also to be reported, but women doing domestic labor for themselves or their families without receiving a wage were to be classified as "keeping house." Children "too young to take part in production" or at school were not considered gainfully employed, but children who earned money "regularly by labor, contributing to the family support, or appreciably assisting in mechanical or agricultural industry" were to be reported with an occupation. The modern labor force concept, defined by work within a specific reference week, was not implemented until 1940. In 1880, the amorphous idea of "gainful employment" was the rule. The nominal bottom age limit for having an occupation was ten years in 1880, but exceptions were allowable. The occupation question was to be answered for every person, whether they were employed or not. This resulted in a variety of non-occupational responses not distinguished by the Census Office in its tabulations.
The instructions provided by the Census Office did not guarantee consistent reporting in every case. Recorded occupations were often vague or inconsistently reported, sometimes listing an industry without an occupation (e.g., "Cotton mill"), or an occupation without an industry (e.g., "Molder"). In addition, other information (health or relationship status) was sometimes entered into the occupation field. Because of these irregularities, classifying and coding occupations proved challenging.
Our method of coding occupations consisted of three main stages: data entry, sorting, and coding. In the data entry stage, data entry operators recorded each occupation exactly as it was reported, including enumerator spelling errors or abbreviations. These titles were then copied into a separate file and sorted alphabetically. Finally, each title was assigned a numeric code based on the Census Office's 1880 detailed occupational coding scheme. In addition, each response was assigned a 1950 occupational code and a detailed occupation code distinguishing individual job titles within the 1950 categories.
1880 Occupational Classification
In many cases coding an occupation according to the standards employed by the 1880 Census Office in tabulating returns was difficult. No detailed instructions remain and many of the enumerated responses are vague or incomplete. The only evidence we had of how the Census Office grouped the detailed occupational responses into its 1880 classification scheme of 265 categories was an 1880 Occupational Index provided by Margo Anderson. Although it proved helpful in some cases, the listing was incomplete and was compiled at an early stage in the tabulation process before many categories were dropped and combined. For these reasons we designed a standard coding procedure to ensure consistent coding. Five general rules covered many of the difficult coding problems:
1). In cases where more than one occupation was listed we coded according to the first occupation. However, when the first occupation was a non-occupational response (i.e., "keeping house") and the second gave an actual occupation, we coded according to the second occupation.
2). When the response listed both occupation and industry, we gave preference to the industry over the occupation if that industry was explicitly noted in the 1880 classification. The rationale for this procedure is the "industrial" classification system used by the 1880 Census Office which placed greater importance on locating persons within sectors of the economy than in relating their specific tasks. Thus, for example, the response "Blacksmith on Railroad" was coded as "Employee on railroad" rather than as "Blacksmith."
3). If the occupation response gave only a place of employment or an industry within the manufacturing sector (i.e., "Iron mills"), we coded the individual as an employee of that industry. If the response referred to a "Shop," it was coded among the manufacturing occupations; if it referred to a "Store," it was coded within trade and transportation. If the response contained only a type of store without further qualification (i.e., "Dry goods store" or "Grocery"), we coded the person as a trader and dealer in that line of trade.
4). If the status of the worker was clear but no further information was provided, we checked the manuscript reel for additional information. The response "Chores" for example was coded as "Agricultural laborer" when an individual lived in a rural area. It was not considered an occupation if other household and locality information suggested that the individual simply performed household chores. We resorted to this procedure in relatively few cases.
5). The final step in the coding procedure was to compare the results of our classification with the published tabulations of the 1880 Census Office. We examined instances where large discrepancies existed between the Census Office returns and our results. The comparison sometimes suggested that certain responses were not coded by the Census Office into a particular category. This procedure was particularly helpful in dealing with some of the following problematic occupations.
001-Agricultural Laborers. Our initial figures for persons working in agriculture were much lower than those of the 1880 Census Office. The difference lay in our much smaller number of agricultural laborers. Conversely, our number of laborers was much higher than that reported in the 1880 tabulations. The Census Office recognized the confusion of agricultural and common labor as a perennial problem of the census. There is no evidence that the 1880 figures were adjusted after tabulation, but the Census Office apparently inferred agricultural laborer status from the characteristics of the household or locality. We recoded and flagged cases from the "Laborers (not specified)" category to "Agricultural Laborers" when a laborer lived in a household headed by a farmer. After this adjustment, the PUMS figures for agricultural labor approximated the published numbers.
023-065- Clerks. After initial coding, we had far fewer clerks coded as "Clerks in stores" than the 1880 Census Office reported, and we had many more "Clerks and copyists (not otherwise described)." In fact, there were many more persons unambiguously enumerated simply as "Clerks" (or some spelling variation thereof) than the published returns had in the category of "Clerks and copyists (not otherwise described)." It appears that the 1880 tabulators somehow judged a large portion of clerks to have been working in stores on the basis of other personal or household characteristics. We have no way of knowing on what basis this classification was made. The issue is an important one because it determines the economic sector in which these persons are classified: Trade and Transportation versus Professional and Personal Services. We chose to code all persons returned simply as "Clerks" as "Clerks in stores" (the former category being 14 times larger than the latter in the published data). Our final figures for the two categories are close to the published 1880 figures, but we cannot be sure that the same persons were classified into the same categories in the two schemes. The residual category "Clerks and copyists (not otherwise described)" now contains persons returned as clerks who worked in a specified setting not described in the other clerk categories.
046-Nurses. A response of "Nurse" without qualification was ambiguous with respect to whether the person was a medical nurse or a domestic servant. In some cases extra information in the occupation field (e.g., "Child's Nurse") provided the necessary information for proper classification. But over 300 persons responded only "nurse" when the published 1880 returns suggested we should encounter less than half that number. We coded as domestic servants all nurses whose relationship to the household head was "Resident employee." A resident medical nurse would therefore be coded as a domestic servant. We note the recoded cases with a data quality flag.
029-Housekeepers. The most significant coding issue we faced concerned housekeepers. The 1880 enumerator instructions stated that
The term "housekeeper" will be reserved for such persons as receive distinct wages or salary for the service. Women keeping house for their own families or for themselves, without any other gainful occupation, will be entered as "keeping house." Grown daughters assisting them will be reported without occupation.
But we discovered that the Census Office clearly did not believe that the enumerators had adhered to these directions. When we interpreted the occupation responses according to these rules, we had 80 percent more persons coded as domestic servants than the 1880 tabulations reported. And this was one of the largest occupational categories to begin with. Crucially, coded in this way, the level of married women's employment in 1880 was much higher than other sources imply and suggested a great temporary upward spike in the historical trend of married women's work. We recoded to a non-occupational response category those women who responded "housekeeper" who were related to the head of the household. We gave the recoded housekeepers a distinct non-occupational response (in addition to a data quality flag) so that researchers could readily identify them.
We coded more persons within the "other" groupings than did the 1880 tabulators. We had no guidance as to whom the 1880 Census Office put in these categories. Some of the distinct groups and general rules of classification we followed are:
058-Others in Professional and Personal Services. This grouping contains 74 prostitutes and 12 gamblers, among other titles.
089-Porter and Laborers in Stores and Warehouses. Includes the numerically significant group of stevedores and longshoremen. Anyone reported as "Works in [some type of store]" was also classified here.
72-Employees in Manufacturing Establishments (not specified). Persons reporting a manufacturing occupation that suggested employee status but did not include reference to a mill or factory (e.g., "Works in lamp shop," and "Pressman").
204-Mill and Factory Operative (not specified). Persons whose title suggested employee or operative status while also mentioning a mill or factory workplace. Some of the titles include "Mill hand," "In pencil factory," and "Steam mill."
210-Officials of Manufacturing and Mining Companies. Includes the following terms in the title in combination with some reference to manufacturing: keeps, owner, proprietor, manager, running, superintendent, president, treasurer.
265-Others in Manufacturing, Mechanical, and Mining Industries. Titles that suggested manufacturing occupations but that gave no intimation of the status of the person. Included here are many persons described simply as "makers" of certain items not specified among the other occupational categories.
266-Employed, Occupation Unspecified. A category we created for the sample. Persons coded here gave a response that clearly indicated they were employed (e.g., supervisor), but there was no way to determine even in which economic sector to place the person. This is an occupational response.
We differentiated among the non-occupational responses we encountered in the data and coded them into a number of categories above the numeric range of the occupational responses (301 to 310). We grouped the responses in such a way as we felt would be most useful to researchers.
1950 Occupational Classification
We coded occupations into the 1950 Census Bureau occupational classification in addition to the 1880 scheme. The 1950 classification was carried out in a similar manner to the 1880 coding (steps 1, 3 and 4 detailed above). In coding into the 1950 system we did not favor industry as we did for 1880. The procedure for 1950 also differed because we did not have published Census Office statistics against which to compare our final figures. The 1950 classification was simplified greatly by a published Census Bureau Index of Occupations and Industries that the Bureau used for its own 1950 tabulations. The vast majority of 1880 occupations were contained in this index which supplied the appropriate 1950 code for particular job titles, sometimes providing different codes for the same occupational title where the industry differed. The status of certain occupations may have changed since 1880 with respect to the particular occupational grouping in which it belongs (e.g., "Craftsmen" or "Operative"), but we adhered strictly to the letter and logic of the 1950 Index. We leave to the individual researcher how to resolve such issues.
Some occupations proved difficult to code because of ambiguity, lack of the necessary industry information, or because the particular occupation disappeared-or fell out of usage-between 1880 and 1950. If no appropriate category suggested itself, we classified the occupation within one of the residual categories such as "Operatives and kindred workers (n.e.c.)." The following occupations proved problematic or contain subgroups that bear pointing out:
300-Agents (not elsewhere classified). If the title suggested the person was an agent in retail, as opposed to wholesale or manufacturing, then the person was coded in "Salesmen and sales clerks (not elsewhere classified)."
564-Painters, Construction and Maintenance. There are two categories of painters in 1950, the other being "Painters, except construction or maintenance." We used the construction category as the default code. Persons listed as "Painter" or "House painter" were coded in construction painting.
594-Craftsmen and Kindred Workers (not elsewhere classified. This includes persons returned as coopers, brewers and wagonwrights, among others.
625-Bus Drivers. Includes bus, coach and stage drivers. A person returned as a "Coachman" was coded in "Private household workers (not elsewhere classified)."
682-Taxicab Drivers and Chauffeurs. Includes carriage and hack drivers.
683-Truck and Tractor Drivers. Includes cartmen, expressmen, and persons listed only as "driver." We also classified "teamsters" here rather than coding them in the 1950 category "Teamsters." This was the only point where we consciously broke from the 1950 occupational index. Our rationale was that teamsters in 1950 were an insignificant and marginal occupation classified in the larger grouping "Laborers, except farm and mine." Teamsters in 1880 were a mainstream occupation performing the function of 1950 truck drivers. With other occupations we did not let mechanization or change of method or setting alter its classification.
690-Operatives and Kindred Workers (not elsewhere classified). A large residual category containing harnessmakers, tanners, wagon makers, cigar makers, and persons reported as "working" in a mill or ship.
700-Housekeepers, Private Household. This occupation was subject to the same logical change as the 1880 domestic servant category, whereby women related to the head of household were coded into a non-occupational category.
710-Laundresses, Private Household. Includes persons returned variously as "Washer" or "Washerwoman" or some variant thereof. Persons with a response of "Laundress" without a qualifier suggesting a private household were coded as "Laundry and dry cleaning operatives."
731-Attendants, Professional, and Personal Services (not elsewhere classified). Includes prostitutes.
781-Practical Nurses. This occupation was subject to the same logical change as the 1880 nurse category, coding persons related to the head of the household as "Private household workers (n.e.c.)." Only if the title specified that this was a professional nurse was the person coded in "Nurses, professional" (code 058).
970-Laborers (not elsewhere classified). Subject to the same logical recoding as the 1880 laborer category whereby persons in households headed by a farmer were recoded as "Farm laborers, wage workers." The laborer category contains persons identified as "Hostler."
975-Employed, Occupation Unspecified. A category we created for the PUMS. Persons coded here gave a response that clearly indicated they were employed (e.g., "supervisor"), but there was no way to determine even which economic sector in which to place the person. This is an occupational response.
Non-occupational responses were grouped into categories and given codes above the range of legitimate 1950 occupational responses (981 to 990).
Detailed Occupational Classification
Some historians may wish to focus on particular occupations at a finer level of detail than that offered by either the 1880 or 1950 Census Bureau classifications. For example, researchers particularly interested in mining may wish to compare gold miners and coal miners. The Census Bureau occupational classification systems do not provide such detail, instead coding thousands of occupational titles into two or three hundred categories. A number of distinctive occupations like "prostitute" were grouped with other titles (in 1950, "Attendants, personal and professional service") and cannot be separated out again. However, giving researchers a complete listing of all occupations as originally recorded from the schedules would provide an unmanageable level of detail preserving meaningless distinctions such as those between "gold minors" and "ogld miners" or between "c. miners" and "col miners." To accommodate more exacting research needs while eliminating mere spelling variations, we created a supplementary detailed occupational coding scheme based on the 1950 system. The detailed occupation codes are basically addenda to the 1950 classification, extending the occupation codes from three to seven digits if read as a single field. The first three digits provide the 1950 occupation code and the last four distinguish specific job titles while removing spelling variations.
To generate the detailed codes, we began with the completed data dictionary listing every occupational response as originally recorded in alphabetic form along with the occupation codes that we assigned. We sorted the file by our 1950 codes. A unique number was assigned to each valid variation of an occupation with a 1950 category, collapsing distinctions that were unambiguously spelling variations or abbreviations. Thus "railroad contractor" and "rr contractor" were given the same detailed code, while "railroad man" and "railroad porter" were given different codes. Many occupational responses were distinguished by different detailed codes, even though they seemed logically similar; for example "railroad Man" and "railroad worker." Our goal was to provide researchers with as much detail as possible in case such differences turned out to be significant. Similarly, differences in terms of grammatical structure were preserved. For example, "bookkeeper" and "bookkeeping" were given separate detailed codes. Some of these distinctions may be ephemeral or useless, but we wished to err on the side of caution.
One of the key points to note about our occupational coding procedure was our reliance (with little exception) on the occupational field considered in isolation. Only in cases of severe disjuncture between our figures and those of the Census Office (for the 1880 classification) did we attempt a computer logical change that relied on other characteristics of the individual, household, or locality. We used this procedure only to reassign codes for a portion of housekeepers, laborers, and nurses. Our experience suggests that the Census Office regularly employed a coding procedure using other personal and household characteristics to make some coding decisions. Comparing the published 1880 figures with our own for women and children in certain occupations also shows that the Census Office edited "unlikely" or "impossible" responses as some historians have argued. There are advantages to the Census Office method in terms of error control. Their method, however, builds a certain correlation between characteristics into the coding scheme itself. Our classification method leaves the occupation variable independent. It does not superimpose our own notions of social reality, but treats such issues as open questions subject to empirical investigation.
- Steven Ruggles and Russell R. Menard. "Occupational Coding," Public Use Microdata Sample of the 1880 United States Census Population: User's Guide and Technical Documentation. Minneapolis: Social History Research Laboratory, 1994, pp. 24-29.