- Description
- Codes
- Comparability
- Universe
- Availability
- Questionnaire Text
- Flags
- Source Variables
- Editing Procedure
Description
The concept of race has changed over the more than 150 years represented in IPUMS. Currently, the Census Bureau and others consider race to be a sociopolitical construct, not a scientific or anthropological one. Many detailed RACE categories consist of national origin groups. With the exception of the 1970-1990 Puerto Rican censuses, RACE was asked of every person in all years.
Beginning in 2000, the race question changed substantially to allow respondents to report as many races as they felt necessary to describe themselves. In earlier years, only one race response was coded. Beginning in 2020, the Census Bureau updated the questionnaire text, processing, and coding of the race and Hispanic origin questions, resulting in major changes to the distribution of race and Hispanic origin categories. As a result, users should proceed with caution when comparing RACE and HISPAN in 2019-prior samples with 2020-onward samples. More improvements made to the race question in 2020 were implemented in 2023. See the comparability tab for more details.
IPUMS offers several variables describing the answer(s) to the race question. RACE provides the full detail given by the respondent and/or released by the Census Bureau; it is not always historically compatible (see comparability discussion below). Users primarily interested in historical compatibility should consider using RACHSING. RACHSING codes race and Hispanic origin responses into a simple, historically compatible scheme that includes only federally defined race and Hispanic origin groups. Please note that RACESING, an earlier version of RACHSING, is also available on the IPUMS website.
In addition, specific combinations of major races can be discerned using the following bivariate indicators of whether a particular race group was reported: RACAMIND, RACASIAN, RACBLK, RACOTHER, RACPACIS, and RACWHT. RACNUM indicates the total number of major race groups reported for an individual. The information contained in the bivariate indicators and in RACNUM is integrated into the detailed version of RACE.
Prior to 1960, the census enumerator was responsible for categorizing persons and was not specifically instructed to ask the individual his or her race. In 1970 and later years, an individual's race was reported by someone in the household or group quarters. In the 1990 U.S. census, the 2000 U.S. and Puerto Rican censuses, the ACS, and the PRCS respondents were specifically asked what race the person "considers himself/herself" to be, although such self-description was more or less operative since 1960.
User Note: Race questions were not asked in the Puerto Rican censuses of 1970, 1980, and 1990. They were asked in the 1910 and 1920 Puerto Rican censuses, the 2000-2010 Puerto Rican censuses, and the PRCS.
Codes and Frequencies
Comparability
The RACE codes are comparable for 2019 and prior years, with three important exceptions: (1) the residual "other race" category is different each year, (2) there have been fluctuations in the way Hispanics are coded, and (3) multiple-race responses (allowed since 2000) have divided major race groups into different RACE codes. In addition, the level of detail has been increasing. In 2020, the Census Bureau implemented major revisions to the race and Hispanic origin questions, which significantly impacted the comparability of the 2020-onward samples with 2019-prior years. In 2023, the Census Bureau implemented more revisions to the race question, which impacted the comparability of 2023-onward samples with 2022-prior years. Impacts and explanations of these changes are described below.
In general, Codes and Frequencies for RACE provide information about which categories were not used in a particular year. The 5 percent sample of census 2000, the ACS and the PRCS samples contain less detail than the 1 percent sample of census 2000 (which is shown in the Codes and Frequencies and is used for the small and tiny IPUMS data sets). In the 5 percent sample, the ACS and the PRCS, any category representing fewer than 10,000 people was combined with another category. See 2000 Race Codes for a comparison of the race categories used in the 2000 samples, the ACS and the PRCS.
The residual category "other race, n.e.c." (general RACE code 6) contains any race not listed in the available data in a given year. In all years, certain races were specified as choices on the form and so were especially likely to be reported. For example, "Chinese" and/or "Japanese" may have absorbed other Asian responses when other choices were not available. In 1970, the form for Alaska was different from that in the rest of the United States: the categories "Aleut" and "Eskimo" were substituted for "Hawaiian" and "Korean". Hawaiians and Koreans are therefore classified as "other race" in Alaska, and Aleuts and Eskimos are "other race" outside of Alaska.
Hispanics/Latinos represent an important exception to this policy of assigning a race to "other race" respondents. The race(s) of people of Hispanic/Latino origin have been coded in a variety of ways because the Census Bureau does not consider Hispanic/Latino to be a race group (for more discussion see HISPAN) and because "other race" is a very common response for Hispanics in recent years. In most years before 1970, the majority of Hispanics were probably classified as white by enumerators, as was specified in the enumerator instructions for 1940 and 1950. Mexicans in 1930 had their own category and thus were an exception to this rule. In 1970, "Mexican" and "Puerto Rican" write-in responses to "other race" were recoded to "white". In 1980, the Census Bureau noted whether an "other race" response indicated Hispanic origin but did not recode Hispanic "other race" responses. Other details of the "other race" write-in responses have not been included in the census samples.
The mixed-race population poses even more serious problems of historical comparability because its members have been enumerated and/or coded inconsistently over the years (see enumerator instructions for 1930-1950). Comparability issues are particularly acute in years when multiple race responses are allowed (2000, ACS and PRCS). Because of multiple-race responses, all major race groups are divided between two general RACE codes in these years. For example, single-race Asians have a general RACE code of 4, while multiple-race Asians have a general RACE code of 7. Because of the variety of researchers' needs, IPUMS has taken two approaches to this issue. First, the RACE variable prioritizes full information about responses to the race question, including national origin or tribal affiliation. The accompanying race indicator variables (RACAMIND, RACASIAN, RACBLK, RACOTHER, RACPACIS, RACWHT, and RACNUM) provide a simple way to identify everyone who reported a particular major race group. Second, the RACHSING variable prioritizes historical compatibility by providing bridged estimates of the likely single-race response of each multiple-race person in 2000, the ACS, and the PRCS (also see PREDWHT, PREDBLK, PREDAI, PREDAPI, and PREDHISP ). Thus, RACHSING does not have a multiple-race category.
In 2021, the Census Bureau changed the label for people originating from Guam from 'Guamanian or Chamorro' to simply 'Chamorro.'
In 2020, the Census Bureau updated the questionnaire text, processing, and coding of the race and Hispanic origin questions, resulting in major changes to the frequency distributions of the race and Hispanic origin categories. These updates were first implemented in the 2020 decennial census and were then implemented in the 2020-onward ACS and PRCS samples. These changes reflect a major revision to the race and Hispanic origin questions, and as a result, users should proceed with caution when comparing RACE and HISPAN in 2019-prior samples with 2020-onward samples.
The race question was updated in several ways, including new write-in response areas for the "White" and "Black or African Am." racial categories and the addition of six example groups for the "White," Black or African American," and "American Indian or Alaska Native" racial categories. The term "Negro" was removed from the question by changing the category from "Black, African Am., or Negro" to "Black or African Am. Response processing and coding for race and Hispanic origin (HISPAN) questions was also updated. This was done by increasing the number of coded write-in responses recorded in each write-in area from two to six responses, increasing the number of characters coded in each write-in area from 30 to 200 characters, and using a single code list for both the Hispanic origin and race questions rather than separate coding lists.
For more detail about these changes, see the Census Bureau's blog post on changes to the questionnaire text, processing, and coding as well as compliance with standards set by the U.S. Office of Management and Budget.
In 2023, the Census Bureau implemented more changes to the race question due to improvements made to the race question in 2020; these updates resulted in the following comparability issues:
- Additional detail for Sioux tribes.
Three specific Sioux tribes are named in 2023 data; however, they are grouped under the Sioux code (323) in RACE. Specific Sioux tribes can be identified through the variable TRIBE. Note that for 2023-onward RACE data, the code for Sioux does not include all "Sioux alone" codes, but rather a subset of Sioux tribes that were identified in the data.
- New codes for Latin American tribes.
More detailed codes, including Aztec, Inca, Maya, Mixtec, Taino, Tarasco (Purepecha), and All other Latin American Indian alone (364), impact the comparability of the general code for American Indian or Alaska Native (3) between 2023-onward and 2022-prior years. These detailed codes are also available in TRIBE.
- New Asian codes.
The addition of codes for Mien, Sikh, Kazakh, and Uzbek, as well as new Micronesian codes, including Chuukese, Guamanian, and Marshallese, impact the comparability of the general code for Other Asian or Pacific Islander (6) between 2023-onward and 2022-prior years.
- Note:
Some detailed codes changed in 2023 to make room for new RACE codes. The code for "Fijian" was changed from 690 to 695, the code for "Other Melanesian" was changed from 691 to 696, and the code for "One or more Melanesian races" was changed from 692 to 697.
In 1910, the Census Bureau began giving instructions for enumerating people of other races besides those listed. Occasionally, enumerators wrote in other race responses in years before 1910. When possible, IPUMS has used the RACE variable to retain the extra information provided by the enumerator. The Census Bureau has not been consistent in its handling of information about the race of individuals who do not fit into the listed categories. Respondents who marked "other race" and wrote in a response (such as "Jamaican") that seemed to indicate membership in a listed race group (such as "black") were recoded into that group by the Census Bureau in 1940 through 1990. In 2000 and later, "other race" responses were not recoded, regardless of the written-in response.
Universe
- 1850-2010: All persons.
- ACS, PRCS: All persons.
Availability
- 2023: All samples
- 2022: All samples
- 2021: All samples
- 2020: All samples
- 2019: All samples
- 2018: All samples
- 2017: All samples
- 2016: All samples
- 2015: All samples
- 2014: All samples
- 2013: All samples
- 2012: All samples
- 2011: All samples
- 2010: All samples
- 2009: All samples
- 2008: All samples
- 2007: All samples
- 2006: All samples
- 2005: All samples
- 2004: All samples
- 2003: All samples
- 2002: All samples
- 2001: All samples
- 2000: All samples
- 1990: All samples
- 1980: All samples
- 1970: All samples
- 1960: All samples
- 1950: All samples
- 1940: All samples
- 1930: All samples
- 1920: All samples
- 1910: All samples
- 1900: All samples
- 1880: All samples
- 1870: All samples
- 1860: All samples
- 1850: All samples
- 2023: All samples
- 2022: All samples
- 2021: All samples
- 2020: All samples
- 2019: All samples
- 2018: All samples
- 2017: All samples
- 2016: All samples
- 2015: All samples
- 2014: All samples
- 2013: All samples
- 2012: All samples
- 2011: All samples
- 2010: All samples
- 2009: All samples
- 2008: All samples
- 2007: All samples
- 2006: All samples
- 2005: All samples
- 2000: All samples
- 1990: --
- 1980: --
- 1970: --
- 1930: All samples
- 1920: All samples
- 1910: All samples
Editing Procedure
HISPAN (Hispanic origin) and RACE (Race)
ACS Years: 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
ACS editing procedure:
There are two questions on race and ethnicity in the ACS. The first asks respondents if the Spanish/Hispanic/Latino. There is a "No" checkbox and three subgroups identified in separate checkboxes: 1. Mexican, Mexican-American, Chicano 2. Puerto Rican, and 3. Cuban. Then there is a final checkbox for "other Spanish/Hispanic/Latino" and a box to write in a specific group.
The question on race multiple checkboxes for different races, and four checkboxes also have a write-in box. The American Indian or Alaska Native box includes a write-in box for the enrolled or principal tribe. The "Other Asian," "Other pacific Islander" and "Other race" checkboxes likewise includes a write-in box for a more specific race.
IPUMS reports both detailed codes for RACE and broader codes. The edits below refer to the detailed codes. The flag variables (QHISPAN and QRACE) will indicate when the values were edited or allocated.
Donald Duck cases
If a respondent checks all race boxes (a "Donald Duck" case), RACE will be replaced with a missing value and later allocated. If all of the Hispanic checkboxes are selected, they will be blanked.
Geographically dependent editing
Some editing of race and Hispanic origin depend on where the respondent is currently living. For example, if a person lives in South Carolina and reports their race as "Turk," it will be replaced with "White." If a person reports their race as "Wales" and doesn't live in Alaska, it will be replaced with "White." If a person in Humboldt County, CA reports being "Trinidad", their race will be replaced with "American Indian." In 2010 and later, if a person in Humboldt County, CA reports being "Tobago", their race will be replaced with "American Indian."
If a person reports their race as "Moor" and American Indian or Alaska Native and lives in Cumberland County, New Jersey, RACE will be replaced with "Moor" as an American Indian tribe. If they do not live in Delaware or New Jersey, RACE will be replaced with "Black."
American Indian
If a person reports being American Indian and also reports being Cajun or Wesort on one of the write in boxes, and none of the checkboxes for "Other race" Other Asian" or "Other Pacific Islander" are checked, RACE will be replaced with just "American Indian."
If a person reports being American Indian and Mexican, RACE will be replaced with "Mexican American Indian."
If a person selects "American Indian" or "Indian" but offers no specific tribe or ethnicity, other household members are used to distinguish between Asian Indian and American Indian. If no other members of the household are American Indian, and all other members of the household are Asian, that person's RACE will be replaced with "Asian Indian." If the person reports being "Indian," no other members of the household are Asian Indian, and all other members of the household are American Indian, that person's RACE will be replaced with "American Indian." If the person reports being "American Indian," no other members of the household are either Asian Indian or American Indian, and all other members of the household are Indian, that person's RACE will be replaced with "Indian."
If a person reports their race as "Half-breed," it will be blanked.
General responses will be blanked when detailed response is available
If there are more detailed responses given in the write-in boxes, the checkbox for "Other race" will be blanked. For non-relatives (RELATE), the checkboxes for "American Indian or Alaska Native," "Other Asian," and "Other Pacific Islander" will be blanked when there is a more detailed write-in value given.
When a checkbox appears to contradict the white in value, the check box will be blanked. For example, if the "White" checkbox is selected and then a black ethnicity is written in the write-in box. Likewise, if the "Black" checkbox is selected but a white ethnicity is written in, the checkbox will be blanked.
If the write-in value is more detailed than the check box, the check box will also be blanked (eg, the "White" checkbox is selected and then a specific white ethnicity is reported on the write-in box). This also applies to the HISPAN variable. For example, if a person selects the "Mexican" checkbox and then reports a more specific Mexican value in the write-in box, the checkbox will be blanked and the more specific value will be used.
If a person reports two write-in values where one value is a specific value while the other is a more general version of that same ethnicity, the less specific value will be blanked. For example, if a person reports two values for an American Indian or Alaska Native write in value and one is specific while the other is not, the less specific value will be blanked. This also applies to the HISPAN variable. For example, if a writes in a specific value for a Hispanic ethnicity and also a general version of the same ethnicity, the general version will be blanked.
If a person includes multiple write-in codes that are all white ethnicities, RACE will be replaced with "Multiple white." If a person includes multiple write-in codes that are all black ethnicities, RACE will be replaced with "Multiple black." If a person includes multiple write-in codes that are all other races, RACE will be replaced with "Multiple other." If a person has multiple checkboxes and a "Multiple..." value, the "Multiple..." value will be removed.
Too many values
If a person has more than 8 races selected, RACE will be replaced with missing and later allocated.
If a person reports not being Hispanic and write in a value in the write-in box that is also not Hispanic, the write-in value will be blanked.
If a person has multiple Hispanic values reports, HISPAN will be replaced with "Multiple Hispanic." If there are mixed Hispanic and non-Hispanic values, and the person's surname is Hispanic, HISPAN will be replaced with "Mixed Hispanic." If there are mixed Hispanic and non-Hispanic values, and the person's surname is not Hispanic, HISPAN will be randomly replaced with "Mixed Hispanic" or "Non-Hispanic."
Starting in 2010, among those with 8 or more race codes, the reported races will be prioritized and certain race groups will be kept. For example if a person reports more than 8 races and among those more than 3 are American Indian or Alaska Native races, two detailed race groups will be prioritized and kept, the next will be replaced with "Multiple American Indian and Alaska Native responses." The remaining American Indian or Native Alaskan values will be blanked. A similar process occurs for other racial groups until everyone has 8 or fewer races codes.
Missing or inconsistent Hispanic Origin and Race
If a person is missing a value for HISPAN, but has a detailed value for RACE that indicated a Hispanic origin, HISPAN will take on the detailed race value. For example, if a person reports their race as "Argentinian," and has a missing value for Hispanic origin, HISPAN will be replaced with "Argentinian."
Starting in 2010, if a person reports their detailed race (RACE) as a Hispanic group, but selects "Non-Hispanic" for HISPAN, HISPAN will be replaced with the value from RACE. For example, if a person reports their race as "Argentinian" but their Hispanic origin as "Non-Hispanic," HISPAN will be replaced with "Argentinian."
If a person has a value for Hispanic origin, but not race, RACE will be the value of HISPAN. There are some exceptions when a Hispanic ethnicity is white, in which case RACE will be replaced with "White." This includes many values, for examples "Portuguese," "Azorean," "Andalusian," and "Asturian."
If after the above edits, the value for RACE and HISPAN remains missing, the values will be replaced with the values from another household member. If RACE is missing, but not Hispanic origin, RACE will be replaced with the value of another household member who has the same value for HISPAN. If HISPAN is missing, but not race, HISPAN will be replaced with the value of another household member who has the same value for RACE.
Write-in value inconsistent with checkbox
If a person selects "American Indian or Alaska Native" "Other Asian" or "Other Pacific Islander" but has a write-in value that is inconsistent with the checkbox, other household members will be used to infer the correct RACE. If no relative in the household has a value for RACE that is consistent with the checkbox, it will be blanked.
Allocation of missing values
For those who have a missing value for RACE and HISPAN after the above edits, the values will be allocated. For the reference person (RELATE), the allocated values will be drawn from another reference person, with a similar value for if their surname is Spanish and AGE. If the surname is missing or is not clearly Hispanic or non-Hispanic, the value will be drawn from another reference person with a similar age.
If a person is missing RACE but has a value for HISPAN, the allocated value of RACE will be drawn from someone with a similar value for HISPAN and AGE.
If a person is missing HISPAN but has a value for RACE, the allocated value of HISPAN will be drawn from someone with a similar value for if their surname is Spanish, RACE, and AGE. If the person is Filipino or American Indian, the value will be drawn from another person with a similar race and age. If the surname is missing or is not clearly Hispanic or non-Hispanic, the value will be drawn from another person with a similar race and age.
If person only has "Some other race" as their race, but another family member with the same value for HISPAN has a more detailed value, that family member's value will replace "Some other race."