The Classification of Work: Applying 1950 Census Occupation and Industry Codes to 1920 Responses1

by Chad Ronnander

The Historical Census Projects created the IPUMS variables OCC1950 (Occupation, 1950 basis) and IND1950 (Industry, 1950 basis) to facilitate analysis of the United States' occupational structure across the entire 150 year period covered by the data series. Construction of these variables involved the application of occupation and industry codes used in the of 1950 census to occupation and industry responses for other census years (Sobek and Dillon 1995; Ruggles and Sobek 1998). This article describes how we tailored OCC1950 and IND1950 codes to the 1920 data and how we handled the most difficult cases.2 More technical discussion can be found in the Procedural History of the 1920 Public Use Microdata Sample (Goeken and Ruggles 1999) and the IPUMS documentation (Ruggles and Sobek 1998a, 1998b, 1998c), available on the IPUMS website.

The essential task for the Census Projects' coding staff was to integrate 1920 responses for occupation and industry with those of other census years already captured in OCC1950 and IND1950. The 1920 procedure is significantly different from that used in the 1850 and 1880 samples produced by the Historical Census Projects because more information is available in this census than in the earlier years. Prior to 1910, the census contained no separate industry question. Although industry or place of work can usually be inferred from the occupation response in those years, there are a significant number of cases for which it is not clear. This limits coding precision for the earlier censuses.

For each 1920 case, there are four complementary pieces of information that can be used to code occupation and industry in the 1950 classification system. The two most important are the individual's responses for occupation (activity performed at work) and industry (place of work), both of which directly correspond to their 1950 counterparts. Class of worker-which differentiates among employers, employees and people working on their own account-was also used although the definition used in 1920 differs slightly from that used in 1950. By examining these three items in combination, we could code the great majority of responses according to the instructions found in the Alphabetical Index of Occupations and Industries: 1950 - the Census Bureau's handbook and coding dictionary for 1950 census clerks (U.S. Bureau of the Census, 1951). The fourth piece of occupational information is the individual's 1920 occupation code which was written in the right hand margins of the census form by the clerks who tabulated that census. These 1920 codes allowed us to interpret many responses that would otherwise have been unclear using according to the 1950 rules.

Once a given combination of occupation/ industry/class of worker/1920 occupation code became part of the 1920 variable dictionary, new cases with that exact combination of responses were automatically assigned the same OCC1950 and IND1950. The process of constructing a variable dictionary is explained elsewhere in this issue (see Goeken, Bryer and Lucas). Thus, the most common responses, representing more than the majority of cases, actually required only a small portion of research assistant time. For example, the following string represents thousands of individual cases but only had to be coded once:

Farmer General Farming O 000

By contract, the least common responses representing far fewer cases actually accounted for the majority of the coding time. The entire 1920 IPUMS sample, which contains just over a million person records, generated some 100,000 different combinations, the majority of them occurring only once in the data. Nearly all of the most frequently occurring responses - and most of the less common ones - could be adequately coded using the basic 1950 procedures. Many could not. As a general rule, the odder the response, the more likely it was that the alternative coding methods described below would be used.

Clarifying Ambiguities in 1920 Occupation Responses

From the time the Census Bureau first asked workers to state their occupation in 1850, eliciting sufficiently detailed responses posed something of a problem. Although enumerators were repeatedly instructed to collect as much detail as reasonably possible, some continued to return vague or ambiguous responses. While much progress had been made by 1920, the 1950 census enumerator instructions identified more than twenty potentially ambiguous responses that the 1920 instructions did not discuss (United States Bureau of the Census 1919, 32-37; 1949?, 39-43, 45-46).3 For example, in 1950 it was not adequate to enter simply "nurse" or "trucker" ñ enumerators were to indicate whether the nurse was professionally trained or not and they were to distinguish hand-truckers from those who drove motor trucks. The 1950 instructions defined such terms as "contractor," "housekeeper" and "machinist" whereas 1920 instructions did not. While the elaboration of instructions in 1950 undoubtedly reflects changes in the nature of work, it also suggests an attempt to remedy the coding problems encountered with vague responses from previous censuses.

For most occupation categories, differences in instructions only minimally affect comparability between the 1920 and 1950 versions of OCC1950. Enumerators in 1920 often voluntarily asked for - and obtained - the detail requested by the 1950 instructions. Even if they did not, many of the less precise 1920 responses can still be coded with a great deal of certainty when they are considered in tandem with the industry responses as the 1950 coding rules require. It is also likely that 1950 enumerators sometimes failed to obtain the detail suggested by their instructions. Since the same coding guidelines were used in 1920 as in 1950, it is probable that ambiguous responses in both years received comparable codes. Unfortunately, confidentiality rules prevent access to the 1950 census returns so we cannot directly compare 1950 responses with their codes.

Several frequently-occurring ambiguous occupations could be coded more precisely by using 1920 occupation codes. Since the original coding clerks were contemporaries of the persons enumerated and had access to all census forms for a particular geographical area, they were often in the best position to interpret the responses. Among the most frequently occurring occupation responses coded using 1920 census clerk decisions were "agent," "clerk," "farm laborer," "laborer," and "secretary" - all of which could be coded in different ways depending upon industry, region or other contextual information.4

Clarifying Ambiguities in 1920 Industry Responses

Industry posed more coding problems than did occupation. Industry responses are less precise than those for occupation in 1920 and the Alphabetical Index is less clear about how to code ambiguous industry responses. The 1920 enumerator instructions gave only limited attention to defining industry. Enumerators were to write down "the name of the industry, or the business, or the place in which this person works, as cotton mill, general farm, dry-goods store, insurance office, bank, etc.," and were to avoid "such indefinite terms as ëmill,' ëfarm,' ëstore,' ëJones and Company,' etc." (United States Bureau of the Census 1919, 32-36). Merchants were to be identified both by product and as either retailers or wholesalers. Beyond this, the 1920 instructions are essentially silent. As a result, many industries are recorded in vague terms such as "company," "shop" or "business." Although the product is usually given in these cases, it is often difficult to tell whether that product is being made, bought, sold, repaired, or handled in some other way. "Shoe company" could denote manufacturing, wholesale or retail trade, or even repair service. The term "oil company" could include crude oil extraction, either of two types of manufacturing, pipeline distribution, wholesale trade, or either of two types of retail trade. It is telling that when it came time to compile the 1920 census, industry was not tabulated separately despite the key role it plays in the 1920 occupation coding scheme.

By 1950, the Census Bureau had refined the industry question considerably. Enumerators were instructed to distinguish carefully between mines, factories, wholesale and retail trade, and various levels of government employment. Besides these general instructions, the Bureau issued specific instructions aimed at potentially troublesome responses, just as they had for Occupation (United States Bureau of the Census 1950). Furthermore, the 1950 census coding clerks were assisted by research specialists who collected information on industrial activities in particular geographic areas (U.S. Bureau of the Census 1955, 60). The net result was greater precision than had been the case thirty years previously.

Since this greater level of precision is assumed in the Alphabetical Index, it offers little help in coding ambiguous responses. We devised several procedures for assigning codes to ambiguous industry responses. For most cases, the 1920 coding scheme for occupation also indicates the individual's employment sector (agricultural, mining, construction, manufacturing, transportation, commercial, governmental, or service sector). Although the 1920 coding clerks had to rely on their own judgment when a response was not clear, we assumed that ñ as with occupation - they were likely to make correct decisions in coding vague industry responses. For example, we relied on their judgment in deciding whether the industry response "electric company" referred to a manufacturer or a public utility. Checks of the original microfilm reels confirmed that 1920 clerks generally interpreted industry responses sensibly, so we used their codes to assist us in assigning IND1950 codes.

The 1920 codes were not always helpful in coding IND1950. Some 1920 codes - most importantly those for professional and many clerical workers - yield no information about industry. In addition, comparison of the 1920 codes with the 1950 Alphabetical Index indicates that the conceptions of industry in the two years do not match perfectly. For such cases, Historical Census Projects staff devised coding rules designed to ensure general consistency where absolute certainty was not possible. In most cases, for example, professionals and clerks who reported their industry as a "company" (e.g., "hardware company," "drug company,") were assumed to work in the manufacturing sector unless the 1920 code clearly indicated otherwise.

Note that this method was not necessary if the 1920 industry response was blank or unintelligible. For these cases, the Alphabetical Index generally provides adequate solutions by suggesting default Industry codes. The only two major occupation responses for which there are no adequate default IND1950 codes to use when Industry was not reported are "stationary engineer" and "stationary firemen" - either of which could be found working in almost any industry. Nor do the 1920 codes really help to assign occupations to these responses. When their industry is blank or unintelligible, we assigned the temporary IND1950 code "996" (unclassifiable). These cases will later be allocated or assigned a default industry value.

Taken in combination, these procedures allowed Historical Census Projects staff to assign generally accurate IND1950 codes to the 1920 data and to assign consistent IND1950 codes when perfect accuracy could not be assured. Unfortunately, given the tremendous variety of responses and occupation-industry combinations found in the data, it is not feasible to specify which rule was applied to every single case. Users will be able to check our logic (and alter it to suit their needs) using a detailed version of the OCC1950, entitled DETOCC (Detailed Occupation, 1950 Basis). Table 1 lists some of the more frequently occurring ambiguous industries and indicates which IND1950 codes they are currently assigned.5

Responses Not Included in The Alphabetical Index

Despite its near-comprehensive character, the Alphabetical Index does not contain codes for all possible responses to the occupation and industry questions. When we could not find a given response in the Alphabetical Index, we checked the U.S. Employment Service's 1939 Dictionary of Occupational Titles, which offers more than 1,000 pages of descriptions and synonyms for occupational titles and for some industrial terminology. If the term was found, we read the job description and attempted to find a sensible match with an entry contained in the Alphabetical Index. We also checked the 1920 occupation code for the contemporary interpretation of the response. Occasionally we used a standard dictionary. When none of these procedures worked, the 1920 code was used alone to define the worker's skill level and one of the residual OCC1950 codes was entered (such as "690"- operatives and kindred workers, not elsewhere classified or "970" - laborers, not elsewhere classified). Industries were coded similarly. If the 1920 code was unknown or did not offer a clear solution, the occupation and/or industry in question received the code "996" (unclassifiable response). When hand coding is complete, all unclassifiable occupation and industry responses will receive codes from comparable donor records in the allocation procedure.

Data-Entry Errors, Real And Suspected

Census Projects data-entry operators occasionally misinterpreted or miskeyed responses from the hand-written census forms. Although data-entry errors affected only a minute percentage of cases, the identification and correction of errors made these data even more reliable than they already were. Operators entered the occupation and industry responses written by enumerators and copied the codes assigned by clerks in 1920. Errors were identified by comparing the entered occupation and industry responses with each other as well as with the categories corresponding to the 1920 occupation codes. Numbers are often easier to read than words as demonstrated in the following example in which a data-entry operator had entered the following:5

488 Cloak Maker Cloak FCT

The 1920 code "488" designates semiskilled operatives working in clock and watch factories, not cloak factories. For reasons explained previously, it is likely that the 1920 clerk made the more accurate interpretation of the enumerator's writing. In this particular case, the research assistant considered the worker a "clock maker" at a "clock factory." Examination of these types of cases on the microfilm revealed that such an assumption was correct far more often than not.By using the 1920 codes, we identified and corrected several errors of this type, such as "foundry" for "laundry," "sawyer" for "lawyer," "tire" for "fire."

Identifying Unpaid Family Farm Workers

The OCC1950 codes include a special category (code "830") for unpaid family farm workers ñ those who did substantial work on a farm operated by a member of the household to whom they were related and for which they received no pay other than room and board. In 1950, these workers all received a separate class of worker code that has no equivalent in the 1920 census. The 1920 enumerators were to identify these workers by assigning the occupation/industry combination "farm laborer, home farm;" the 1920 occupation codes identify such workers. Since the 1920 enumerator instructions do not say what class of worker code they were to receive, we assigned them the OCC1950 unpaid family farm worker code regardless of their 1920 class of worker entry. We did the same with other responses that the 1920 coding clerks interpreted as "farm laborer, home farm."

Changing Workplace Terminology, 1920 to 1950

With few exceptions, we did not attempt to compensate for possible changes in the meaning of occupational titles between 1920 and 1950. Since it cannot be ascertained whether or not the census experts of 1950 would have coded some occupational titles differently had they been assigned the task of classifying 1920 data, we usually had to assume that terminology was similar for both years. Whether or not an "engineer" or a "seamstress" or a "saddler" or a "nurse" performed work of similar skill, remuneration, and/or status in 1920 as in 1950 - and therefore would have been classified similarly - is left to the researcher to answer along with other questions: How many "deliverymen" in 1920 might have considered themselves "truck drivers" in 1950? How many "nurses" might later have been called "registered nurses?" How many "machinists" would have said they were "machine operators?" Or how many 1920 "drivers" were actually "teamsters" as opposed to "truck drivers?"

The 1920 occupation codes served as a check on absolute literal-mindedness on our part. Some 1920 "agents" are considered "sales agents" if the 1920 clerks coded them as such, just as some 1920 "nurses" are treated as "trained nurses" for the same reason. Only with the response of "chauffeur" did Historical Census Projects staff break significantly from the 1950 coding system. In this case it became evident from the combined occupation-industry responses that the term in 1920 was frequently used to describe people who would clearly have been considered "truck drivers" in 1950. Many of them are so coded in the 1920 version of OCC1950.


This article has described solutions to the difficulties of applying a set of occupation and industry codes designed for the 1950 census to responses recorded in the 1920 census thirty years earlier. Much changed in the world of work in those thirty years but much stayed pretty much the same. It is worth repeating that in most cases, similarities between 1920 and 1950 occupation, industry and class of worker questions and responses allowed for quick and accurate coding. Providing researchers with a means to measure accurately how much has changed and how much remained the same has been the impetus for creating the OCC1950 and IND1950 variables for the 1920 public use sample.


  1. This article appeared in Historical Methods, Volume 32, Number 3, Pages 151-155, Summer 1999. Reprinted with Permission of the Helen Dwight Reid Educational Foundation. Published by Heldref Publications, 1319 18th St. N.W. Washington, D.C. 20036-1802. Copyright 1999.
  2. The 1920 public use sample project was funded by the National Institute of Child Health and Human Development, grant HD29015.
  3. Enumerator instructions for all census years included in the IPUMS (1850-1990) can be found in Ruggles and Sobek (1998c), available at the IPUMS website (
  4. Researchers can refine the method further using the OCC1950 variable in combination with a second IPUMS variable, OCC1920 (Occupation, 1920 Manuscript Edits), which preserves the 1920 Occupation codes.
  5. Since 1920 occupation and industry coding is still in process, codes are sometimes altered when new information comes to light. Changes in the data are catalogued on the Revisions History page of the IPUMS website.


Table 1. Examples of Ambiguous 1920 Industry Responses and Solutions

RESPONSE THE PROBLEM: to distinguish between... OUR SOLUTION (if 1920 code is unclear)
Bag co., mfg., etc. paper, cloth, and other bags Code 458 (Paper bag mfg.)
Bakery retail only stores and other bakeries Code 416 (Bakery products mfg.)
Belt co., mfg., etc. apparel and belts for machinery Code 399 (Misc. mfg.)
Box co., mfg., etc. paper, wooden, and other boxes Code 457 (Paper box mfg.)
Car shops railroad car manufacturing and railroad car repair Code 379 (Railroad car mfg.)
Club various types of clubs Code 859 (Misc. entertainment and recreation)
Electric(al) co. manufacturing and public utilities Code 367 (Electrical mfg.)
Elevator co. elevator manufacturing and grain elevators Code 619 (Grain elevator)
Film co. film mfg. and the motion picture industry Code 387 (Film mfg.)
Foundry iron and other metals Code 337 (Iron foundries)
Fur co., mfg., etc. fur processing and fur apparel Code 448 (Apparel mfg.)
Furnace ferrous & non-ferrous metals, as well as furnace mfg. Code 348 (Metal mfg., metal type not specified)
Glove co., mfg., etc. leather and non-leather gloves Code 448 (Apparel mfg.)
Government federal, state, and local government Code 946 ñ our own code for gov't, level not spec'd
Lamp co., mfg., etc. metal components and electrical components Code 347 (Nonferrous metal mfg.)
Motor co. various types of motors and automobile mfg. Code 358 (Machinery mfg.)
Oil co., mfg., etc. oil wells, petroleum refineries, and other types of oil Code 476 (Petroleum refining)
Packing co. meat packing and other food packing Code 426 temporarily (Food mfg., type not specified)
Pipe co., mfg., etc. various types of pipes Code 399 (Misc. mfg.)
Railway steam and street railways Code 506 (Steam railroads)
Refinery oil, sugar, and other refineries Code 399 (Misc. mfg.)
Tailor shop retail, manufacturing, and repair Code 656 (Retail)ñIGNORE 1920 code
Wire co., mfg., etc. various types of wire Code 348 (Metal mfg., metal type not specified)

