Data Cart

Your data extract

0 variables
0 samples
View Cart

Description

Overview of "Language" Variables:
In addition to variables reporting respondents' ability to speak English (see LINGISOL and SPEAKENG), the IPUMS contains four variables that yield language information for the following years:

MTONGUE   Mother tongue   1910, 1920, 1930, 1940, 1960, 1970  
MMTONGUE   Mother's mother tongue   1910, 1920  
FMTONGUE   Father's mother tongue   1910, 1920  
LANGUAGE   Language spoken   1910, 1980, 1990, 2000, ACS  

All four variables share the same four-digit IPUMS coding scheme. The codes consist of a two-digit general code, followed by a two-digit detailed code. The general codes, which identify broad language categories, are as completely comparable across years as possible, while the detailed codes usually identify language subgroups and dialects that were classified separately in some samples but grouped with a larger general category in other samples. In general, the IPUMS codes are numbered according to the Census Bureau's practice and linguistic similarity.

Users who wish to check the logic of the groupings should consult the codes and frequencies and the detailed category composition for each year (see Codes, below). The most complicated coding problems involved the most infrequently occurring languages, since the Census Bureau established consistent patterns for handling the most common languages. Thus, "judgment calls" affect relatively few cases. Furthermore, all of the original sample categories are preserved in the IPUMS detailed codes.

The following year-by-year discussion describes the different classification systems and their integration into the IPUMS codes.

  • 1910-1930: All languages were given a unique code in the PUMS, and they were easily incorporated into the IPUMS codes.
  • 1960 and 1970: These two years followed a similar classification system, but they preserve less detail than later years. The 1970 sample provides an exhaustive list of the responses grouped together for a single code. This list contains many responses (either languages or alternative names for languages) that are not mentioned in the documentation for other samples, probably because they rarely or never occurred in other years. In these cases, the IPUMS identified the most common language(s) within each 1970 category (which was usually what the category was called in 1960) and used these to guide the placement of the 1970 (and 1960) category.
  • 1940, 1980, 1990, 2000 and the ACS: The most recent samples use the most detailed classification systems for MTONGUE and/or LANGUAGE. The 1940 sample followed the 1980 coding scheme. Native American languages are grouped under a single code in 1940 and in 1980. The samples for the 1980-2000 censuses and the ACS also include a list of most responses grouped into one code (see Codes, below). These coding schemes contain quite a few categories that the earlier samples do not. Usually these new categories were variations or dialects of a more common language, in which case they were coded as details of a general IPUMS code/category. If not, they were given a separate general code or coded as details of a general IPUMS code/category, based on geographic clues.

Sometimes languages classified one way in one sample were classified differently in other samples. In these cases, the IPUMS usually created one larger general code for all languages included in the various categories and then used detail codes for each different grouping. For example, the code for English (01 00) includes six detail codes (Jamaican Creole, Krio, Hawaiian Pidgin, Pidgin, Gullah, and Saramaca). Because they share the same general code, users can assume that, in at least one sample, they were (or would probably have been) subsumed under that general code.

Sometimes the above process created an unwieldy coding scheme, so some small subgroups appear under different codes in different years. The following categories were affected; what is true for the 1970 sample is probably true for 1960 as well, and what is true for 1980 is always true for 1940 and probably for 1990:

Language subgroup   Code (1970)   Code (1980)  
Dano-Norwegian   6-00   07-00  
Cossack/Kazakh   19-00   37-03  
Slavonia   19-00   23-00  
Pushton   30-00   29-00  
Livonian/Votic   03-00   35-10  
Uighur   36-00   37-06  
Min   43-10   43-01  
Jordanian, Lebanese   58-00   57-00  
Chaldean   57-30   58-10  
Yao   44-00   63-07  
Chamorro   04-00   55-03  
Sudanic   63-06 and 63-07  

Details about MTONGUE variable:
MTONGUE reports the respondent's mother tongue. In 1910 (U.S. Census), 1920 (U.S. and Puerto Rican Censuses), 1930, and 1960, this was asked only of foreign-born persons; in 1940, this was asked of all sample-line persons; in 1970, this was asked of all persons. In 1910, 1920, 1930, and 1960, mother tongue meant the language spoken in the home prior to immigration. In 1940 and 1970, mother tongue meant the language spoken in the home as a child.

For 1910 (U.S. Census) and 1920 (U.S. and Puerto Rican Census), the mother tongue(s) of the respondent's parents, if they were foreign-born, are available in MMTONGUE and FMTONGUE. Further language information is available for 1910, 1980, and 1990 in LANGUAGE (Language spoken). All the language variables (MTONGUE, MMTONGUE, FMTONGUE, and LANGUAGE) follow the same IPUMS coding scheme.

Additionally, the 1910 full count sample for MTONGUE (and the corresponding flag variable QMTONGUE) was removed due to a data transcription error. MTONGUE information from the original Census forms was not transcribed into our digital version of the complete count 1910 data file. The derivation of this variable led to incorrectly high rates of English as a MTONGUE value. Only the 1910 full count file was affected by this error.