- Description
- Codes
- Comparability
- Universe
- Availability
- Questionnaire Text
- Flags
- Source Variables
- Editing Procedure
Description
Overview of "Language" Variables:
In addition to variables reporting respondents' ability to speak English (see LINGISOL and SPEAKENG), the IPUMS contains four variables that yield language information for the following years:
- MTONGUE (Mother tongue): 1910, 1920, 1930, 1940, 1960, 1970
- MMTONGUE (Mother's mother tongue): 1910, 1920
- FMTONGUE (Father's mother tongue): 1910, 1920
- LANGUAGE (language spoken): 1910, 1980, 1990, 2000, ACS
All four variables share the same four-digit IPUMS coding scheme. The codes consist of a two-digit general code, followed by a two-digit detailed code. The general codes, which identify broad language categories, are as completely comparable across years as possible, while the detailed codes usually identify language subgroups and dialects that were classified separately in some samples but grouped with a larger general category in other samples. In general, the IPUMS codes are numbered according to the Census Bureau's practice and linguistic similarity.
Users who wish to check the logic of the groupings should consult the codes and frequencies and the detailed category composition for each year (see Codes, below). The most complicated coding problems involved the most infrequently occurring languages, since the Census Bureau established consistent patterns for handling the most common languages. Thus, "judgment calls" affect relatively few cases. Furthermore, all of the original sample categories are preserved in the IPUMS detailed codes.
The following year-by-year discussion describes the different classification systems and their integration into the IPUMS codes.
- 1910-1930: All languages were given a unique code in the PUMS, and they were easily incorporated into the IPUMS codes.
- 1960 and 1970: These two years followed a similar classification system, but they preserve less detail than later years. The 1970 sample provides an exhaustive list of the responses grouped together for a single code. This list contains many responses (either languages or alternative names for languages) that are not mentioned in the documentation for other samples, probably because they rarely or never occurred in other years. In these cases, the IPUMS identified the most common language(s) within each 1970 category (which was usually what the category was called in 1960) and used these to guide the placement of the 1970 (and 1960) category.
- 1940, 1980, 1990, 2000 and the ACS: The most recent samples use the most detailed classification systems for MTONGUE and/or LANGUAGE. The 1940 sample followed the 1980 coding scheme. Native American languages are grouped under a single code in 1940 and in 1980. The samples for the 1980-2000 censuses and the ACS also include a list of most responses grouped into one code (see Codes, below). These coding schemes contain quite a few categories that the earlier samples do not. Usually these new categories were variations or dialects of a more common language, in which case they were coded as details of a general IPUMS code/category. If not, they were given a separate general code or coded as details of a general IPUMS code/category, based on geographic clues.
Sometimes languages classified one way in one sample were classified differently in other samples. In these cases, the IPUMS usually created one larger general code for all languages included in the various categories and then used detail codes for each different grouping. For example, the code for English (01 00) includes six detail codes (Jamaican Creole, Krio, Hawaiian Pidgin, Pidgin, Gullah, and Saramaca). Because they share the same general code, users can assume that, in at least one sample, they were (or would probably have been) subsumed under that general code.
Sometimes the above process created an unwieldy coding scheme, so some small subgroups appear under different codes in different years. The following categories were affected; what is true for the 1970 sample is probably true for 1960 as well, and what is true for 1980 is always true for 1940 and probably for 1990:
| Language subgroup | Code (1970) | Code (1980) |
|---|---|---|
| Dano-Norwegian | 6-00 | 07-00 |
| Cossack/Kazakh | 19-00 | 37-03 |
| Slavonia | 19-00 | 23-00 |
| Pushton | 30-00 | 29-00 |
| Livonian/Votic | 03-00 | 35-10 |
| Uighur | 36-00 | 37-06 |
| Min | 43-10 | 43-01 |
| Jordanian, Lebanese | 58-00 | 57-00 |
| Chaldean | 57-30 | 58-10 |
| Yao | 44-00 | 63-07 |
| Chamorro | 04-00 | 55-03 |
| Sudanic | 63-06 and 63-07 |