IPUMS USA

Count	Item Type
0	variables
0	samples

The 2024 ACS 5-year PUMS data, released by the Census Bureau on March 5, are now available via IPUMS USA.

Select samples

MTONGUE

Mother tongue

Return to Race, Ethnicity, and Nativity variables list

Description
Codes
Comparability
Universe
Availability
Questionnaire Text
Flags
Source Variables
Editing Procedure

Description

Overview of "Language" Variables:
In addition to variables reporting respondents' ability to speak English (see LINGISOL and SPEAKENG), the IPUMS contains four variables that yield language information for the following years:

MTONGUE	Mother tongue	1910, 1920, 1930, 1940, 1960, 1970
MMTONGUE	Mother's mother tongue	1910, 1920
FMTONGUE	Father's mother tongue	1910, 1920
LANGUAGE	Language spoken	1910, 1980, 1990, 2000, ACS

All four variables share the same four-digit IPUMS coding scheme. The codes consist of a two-digit general code, followed by a two-digit detailed code. The general codes, which identify broad language categories, are as completely comparable across years as possible, while the detailed codes usually identify language subgroups and dialects that were classified separately in some samples but grouped with a larger general category in other samples. In general, the IPUMS codes are numbered according to the Census Bureau's practice and linguistic similarity.

Users who wish to check the logic of the groupings should consult the codes and frequencies and the detailed category composition for each year (see Codes, below). The most complicated coding problems involved the most infrequently occurring languages, since the Census Bureau established consistent patterns for handling the most common languages. Thus, "judgment calls" affect relatively few cases. Furthermore, all of the original sample categories are preserved in the IPUMS detailed codes.

The following year-by-year discussion describes the different classification systems and their integration into the IPUMS codes.

1910-1930: All languages were given a unique code in the PUMS, and they were easily incorporated into the IPUMS codes.
1960 and 1970: These two years followed a similar classification system, but they preserve less detail than later years. The 1970 sample provides an exhaustive list of the responses grouped together for a single code. This list contains many responses (either languages or alternative names for languages) that are not mentioned in the documentation for other samples, probably because they rarely or never occurred in other years. In these cases, the IPUMS identified the most common language(s) within each 1970 category (which was usually what the category was called in 1960) and used these to guide the placement of the 1970 (and 1960) category.
1940, 1980, 1990, 2000 and the ACS: The most recent samples use the most detailed classification systems for MTONGUE and/or LANGUAGE. The 1940 sample followed the 1980 coding scheme. Native American languages are grouped under a single code in 1940 and in 1980. The samples for the 1980-2000 censuses and the ACS also include a list of most responses grouped into one code (see Codes, below). These coding schemes contain quite a few categories that the earlier samples do not. Usually these new categories were variations or dialects of a more common language, in which case they were coded as details of a general IPUMS code/category. If not, they were given a separate general code or coded as details of a general IPUMS code/category, based on geographic clues.

Sometimes languages classified one way in one sample were classified differently in other samples. In these cases, the IPUMS usually created one larger general code for all languages included in the various categories and then used detail codes for each different grouping. For example, the code for English (01 00) includes six detail codes (Jamaican Creole, Krio, Hawaiian Pidgin, Pidgin, Gullah, and Saramaca). Because they share the same general code, users can assume that, in at least one sample, they were (or would probably have been) subsumed under that general code.

Sometimes the above process created an unwieldy coding scheme, so some small subgroups appear under different codes in different years. The following categories were affected; what is true for the 1970 sample is probably true for 1960 as well, and what is true for 1980 is always true for 1940 and probably for 1990:

Language subgroup	Code (1970)	Code (1980)
Dano-Norwegian	6-00	07-00
Cossack/Kazakh	19-00	37-03
Slavonia	19-00	23-00
Pushton	30-00	29-00
Livonian/Votic	03-00	35-10
Uighur	36-00	37-06
Min	43-10	43-01
Jordanian, Lebanese	58-00	57-00
Chaldean	57-30	58-10
Yao	44-00	63-07
Chamorro	04-00	55-03
Sudanic	63-06 and 63-07

Details about MTONGUE variable:
MTONGUE reports the respondent's mother tongue. In 1910 (U.S. Census), 1920 (U.S. and Puerto Rican Censuses), 1930, and 1960, this was asked only of foreign-born persons; in 1940, this was asked of all sample-line persons; in 1970, this was asked of all persons. In 1910, 1920, 1930, and 1960, mother tongue meant the language spoken in the home prior to immigration. In 1940 and 1970, mother tongue meant the language spoken in the home as a child.

For 1910 (U.S. Census) and 1920 (U.S. and Puerto Rican Census), the mother tongue(s) of the respondent's parents, if they were foreign-born, are available in MMTONGUE and FMTONGUE. Further language information is available for 1910, 1980, and 1990 in LANGUAGE (Language spoken). All the language variables (MTONGUE, MMTONGUE, FMTONGUE, and LANGUAGE) follow the same IPUMS coding scheme.

Additionally, the 1910 full count sample for MTONGUE (and the corresponding flag variable QMTONGUE) was removed due to a data transcription error. MTONGUE information from the original Census forms was not transcribed into our digital version of the complete count 1910 data file. The derivation of this variable led to incorrectly high rates of English as a MTONGUE value. Only the 1910 full count file was affected by this error.

Codes and Frequencies

Display Category availability view
Case-count view

Codes

General codes
Detailed codes

Can't find the category you are looking for? Try the Detailed codes

Comparability

Only the 1940 and 1970 censuses set out to record the mother tongue of native-born persons. However, in 1910 some enumerators recorded the mother tongue of non-English-speaking native-born persons, even though they were instructed not to. The 1910 sample (and thus the IPUMS) preserves these responses, but since the enumerators recorded information beyond the instructions, the responses available for native-born persons are not representative. Users must exclude the native-born to create a comparable, foreign-born-only universe across all years for which MTONGUE is available.

In 1910, some enumerators entered countries instead of mother tongues. In cases where the country entered was home to only one major language, the 1910 sample creators entered that language as the person's mother tongue. In other cases, they checked the person's response for "language spoken." If this was a non-English language and was known to be spoken in the country entered under mother tongue, mother tongue was coded the same as "language spoken." Otherwise, mother tongue was coded "unknown."

In 1920 and 1930, mother tongue was not asked of people who were born in the outlying territories of the United States (Alaska, Hawaii, Puerto Rico, the Philippines, Guam, American Samoa, the Panama Canal Zone, and the U.S. Virgin Islands). In 1910 and 1960, those born in outlying territories were included among the foreign-born and were asked about their mother tongue. (In 1940 and 1970, all persons were asked their mother tongue.)

In 1960, if a person reported more than one language, the code assigned was the mother tongue reported by the largest number of immigrants from that person's native country in the 1940 census. (Prior to 1960, enumerators were to list only one language; in 1970, respondents were instructed to indicate only the principal language.)

Universe

1910: Foreign-born persons. Not available in Alaska and Hawaii; not available for Puerto Rico.
1920: Foreign-born persons. Not available in Alaska and Hawaii.
1930: Foreign-born persons. Not available in Alaska.
1940: Sample-line persons.
1960: Foreign-born persons.
1970: All persons; not available for Puerto Rico.

Availability

United States

2024: --
2023: --
2022: --
2021: --
2020: --
2019: --
2018: --
2017: --
2016: --
2015: --
2014: --
2013: --
2012: --
2011: --
2010: --
2009: --
2008: --
2007: --
2006: --
2005: --
2004: --
2003: --
2002: --
2001: --
2000: --
1990: --
1980: --
1970: 1% state fm2; 1% metro fm2; 1% neigh fm2
1960: All samples
1950: --
1940: All samples
1930: All samples
1920: All samples
1910: 1%; 1.4% ovrsmp
1900: --
1880: --
1870: --
1860: --
1850: --

Puerto Rico

2024: --
2023: --
2022: --
2021: --
2020: --
2019: --
2018: --
2017: --
2016: --
2015: --
2014: --
2013: --
2012: --
2011: --
2010: --
2009: --
2008: --
2007: --
2006: --
2005: --
2000: --
1990: --
1980: --
1970: --
1930: All samples
1920: All samples
1910: --

Flags

QMTONGUE

Editing Procedure

There is no editing procedure available for this variable.

Data Cart

Your data extract