Data Cart

Your data extract

0 variables
0 samples
View Cart

Description

Overview of "Language" Variables:
In addition to variables reporting respondents' ability to speak English (see LINGISOL and SPEAKENG), the IPUMS contains four variables that yield language information for the following years:

MTONGUE   Mother tongue   1910, 1920, 1930, 1940, 1960, 1970  
MMTONGUE   Mother's mother tongue   1910, 1920  
FMTONGUE   Father's mother tongue   1910, 1920  
LANGUAGE   Language spoken   1910, 1980, 1990, 2000, ACS  

All four variables share the same four-digit IPUMS coding scheme. The codes consist of a two-digit general code, followed by a two-digit detailed code. The general codes, which identify broad language categories, are as completely comparable across years as possible, while the detailed codes usually identify language subgroups and dialects that were classified separately in some samples but grouped with a larger general category in other samples. In general, the IPUMS codes are numbered according to the Census Bureau's practice and linguistic similarity.

Users who wish to check the logic of the groupings should consult the codes and frequencies and the detailed category composition for each year (see Codes, below). The most complicated coding problems involved the most infrequently occurring languages, since the Census Bureau established consistent patterns for handling the most common languages. Thus, "judgment calls" affect relatively few cases. Furthermore, all of the original sample categories are preserved in the IPUMS detailed codes.

The following year-by-year discussion describes the different classification systems and their integration into the IPUMS codes.

  • 1910-1930: All languages were given a unique code in the PUMS, and they were easily incorporated into the IPUMS codes.
  • 1960 and 1970: These two years followed a similar classification system, but they preserve less detail than later years. The 1970 sample provides an exhaustive list of the responses grouped together for a single code. This list contains many responses (either languages or alternative names for languages) that are not mentioned in the documentation for other samples, probably because they rarely or never occurred in other years. In these cases, the IPUMS identified the most common language(s) within each 1970 category (which was usually what the category was called in 1960) and used these to guide the placement of the 1970 (and 1960) category.
  • 1940, 1980, 1990, 2000 and the ACS: The most recent samples use the most detailed classification systems for MTONGUE and/or LANGUAGE. The 1940 sample followed the 1980 coding scheme. Native American languages are grouped under a single code in 1940 and in 1980. The samples for the 1980-2000 censuses and the ACS also include a list of most responses grouped into one code (see Codes, below). These coding schemes contain quite a few categories that the earlier samples do not. Usually these new categories were variations or dialects of a more common language, in which case they were coded as details of a general IPUMS code/category. If not, they were given a separate general code or coded as details of a general IPUMS code/category, based on geographic clues.

Sometimes languages classified one way in one sample were classified differently in other samples. In these cases, the IPUMS usually created one larger general code for all languages included in the various categories and then used detail codes for each different grouping. For example, the code for English (01 00) includes six detail codes (Jamaican Creole, Krio, Hawaiian Pidgin, Pidgin, Gullah, and Saramaca). Because they share the same general code, users can assume that, in at least one sample, they were (or would probably have been) subsumed under that general code.

Sometimes the above process created an unwieldy coding scheme, so some small subgroups appear under different codes in different years. The following categories were affected; what is true for the 1970 sample is probably true for 1960 as well, and what is true for 1980 is always true for 1940 and probably for 1990:

Language subgroup   Code (1970)   Code (1980)  
Dano-Norwegian   6-00   07-00  
Cossack/Kazakh   19-00   37-03  
Slavonia   19-00   23-00  
Pushton   30-00   29-00  
Livonian/Votic   03-00   35-10  
Uighur   36-00   37-06  
Min   43-10   43-01  
Jordanian, Lebanese   58-00   57-00  
Chaldean   57-30   58-10  
Yao   44-00   63-07  
Chamorro   04-00   55-03  
Sudanic   63-06 and 63-07  

Details about MTONGUE variable:
MTONGUE reports the respondent's mother tongue. In 1910 (U.S. Census), 1920 (U.S. and Puerto Rican Censuses), 1930, and 1960, this was asked only of foreign-born persons; in 1940, this was asked of all sample-line persons; in 1970, this was asked of all persons. In 1910, 1920, 1930, and 1960, mother tongue meant the language spoken in the home prior to immigration. In 1940 and 1970, mother tongue meant the language spoken in the home as a child.

For 1910 (U.S. Census) and 1920 (U.S. and Puerto Rican Census), the mother tongue(s) of the respondent's parents, if they were foreign-born, are available in MMTONGUE and FMTONGUE. Further language information is available for 1910, 1980, and 1990 in LANGUAGE (Language spoken). All the language variables (MTONGUE, MMTONGUE, FMTONGUE, and LANGUAGE) follow the same IPUMS coding scheme.

Additionally, the 1910 full count sample for MTONGUE (and the corresponding flag variable QMTONGUE) was removed due to a data transcription error. MTONGUE information from the original Census forms was not transcribed into our digital version of the complete count 1910 data file. The derivation of this variable led to incorrectly high rates of English as a MTONGUE value. Only the 1910 full count file was affected by this error.

Codes and Frequencies




Can't find the category you are looking for? Try the Detailed codes

Comparability

Only the 1940 and 1970 censuses set out to record the mother tongue of native-born persons. However, in 1910 some enumerators recorded the mother tongue of non-English-speaking native-born persons, even though they were instructed not to. The 1910 sample (and thus the IPUMS) preserves these responses, but since the enumerators recorded information beyond the instructions, the responses available for native-born persons are not representative. Users must exclude the native-born to create a comparable, foreign-born-only universe across all years for which MTONGUE is available.

In 1910, some enumerators entered countries instead of mother tongues. In cases where the country entered was home to only one major language, the 1910 sample creators entered that language as the person's mother tongue. In other cases, they checked the person's response for "language spoken." If this was a non-English language and was known to be spoken in the country entered under mother tongue, mother tongue was coded the same as "language spoken." Otherwise, mother tongue was coded "unknown."

In 1920 and 1930, mother tongue was not asked of people who were born in the outlying territories of the United States (Alaska, Hawaii, Puerto Rico, the Philippines, Guam, American Samoa, the Panama Canal Zone, and the U.S. Virgin Islands). In 1910 and 1960, those born in outlying territories were included among the foreign-born and were asked about their mother tongue. (In 1940 and 1970, all persons were asked their mother tongue.)

In 1960, if a person reported more than one language, the code assigned was the mother tongue reported by the largest number of immigrants from that person's native country in the 1940 census. (Prior to 1960, enumerators were to list only one language; in 1970, respondents were instructed to indicate only the principal language.)

Universe

  • 1910: Foreign-born persons. Not available in Alaska and Hawaii; not available for Puerto Rico.
  • 1920: Foreign-born persons. Not available in Alaska and Hawaii.
  • 1930: Foreign-born persons. Not available in Alaska.
  • 1940: Sample-line persons.
  • 1960: Foreign-born persons.
  • 1970: All persons; not available for Puerto Rico.

Availability

United States
  • 2022: --
  • 2021: --
  • 2020: --
  • 2019: --
  • 2018: --
  • 2017: --
  • 2016: --
  • 2015: --
  • 2014: --
  • 2013: --
  • 2012: --
  • 2011: --
  • 2010: --
  • 2009: --
  • 2008: --
  • 2007: --
  • 2006: --
  • 2005: --
  • 2004: --
  • 2003: --
  • 2002: --
  • 2001: --
  • 2000: --
  • 1990: --
  • 1980: --
  • 1970: 1% state fm2; 1% metro fm2; 1% neigh fm2
  • 1960: All samples
  • 1950: --
  • 1940: All samples
  • 1930: All samples
  • 1920: All samples
  • 1910: 1%; 1.4% ovrsmp
  • 1900: --
  • 1880: --
  • 1870: --
  • 1860: --
  • 1850: --
Puerto Rico
  • 2022: --
  • 2021: --
  • 2020: --
  • 2019: --
  • 2018: --
  • 2017: --
  • 2016: --
  • 2015: --
  • 2014: --
  • 2013: --
  • 2012: --
  • 2011: --
  • 2010: --
  • 2009: --
  • 2008: --
  • 2007: --
  • 2006: --
  • 2005: --
  • 2000: --
  • 1990: --
  • 1980: --
  • 1970: --
  • 1930: All samples
  • 1920: All samples
  • 1910: --

Flags

QMTONGUE 

Editing Procedure

There is no editing procedure available for this variable.