IPUMS Full Count Data (1790-1950)
IPUMS USA makes freely available to researchers worldwide full count U.S. Census microdata through 1950. This dataset includes over 800 million individual-level (1850-1940) and 7.5 million household-level records (1790-1840). The microdata represents the fruition of longstanding collaborations between IPUMS and the nation's two largest genealogical organizations —Ancestry.com and FamilySearch— to leverage genealogical data for scientific purposes. This microdata collection is possible due to the donations of an unprecedented scale of digitized census data by both Ancestry.com and FamilySearch.
Person-level microdata are available from 1850-1950. Full count data for 1790-1840 are available at the household-level. In contrast to the 1850-1950 U.S. censuses, the 1790-1840 historical censuses named only the head of the household and tallied household totals attached to that record.
IPUMS is continually improving the quality of these datasets by removing duplicate records, improving identification of household groups, applying numeric codes to the transcribed responses, etc. These improvements to the underlying data are periodically released via IPUMS USA. The variable VERSIONHIST identifies the public-release version of a dataset. Researchers can merge data between versions of the dataset using HISTID and identify updates, corrections, or improvements that may have been applied to the data. Users can review notes about major revisions for more information about specific data versions.
All data are coded and anonymous and can be downloaded via the IPUMS USA extract system. Restricted versions of these datasets are also available. A list of publications and grants that have used Full Count IPUMS data can be found on the Full Count Bibliography page.
Dataset Provenance
1900-1950 Full Count Data
The result of a collaboration with Ancestry.com.
1880 Full Count Data:
The result of a collaboration with the Church of Jesus Christ of Latter-Day Saints. These data are also available via IPUMS International.
1860-1870 Full Count Data:
The result of a collaboration with Ancestry.com.
1850 Full Count Data:
The result of a collaboration with the Church of Jesus Christ of Latter-Day Saints. These data are also available via IPUMS International.
1790-1840 Household-level Full Count Data:
The result of a collaboration and donations from both Ancestry.com and FamilySearch. For more information, see here.
Restricted Versions
IPUMS also has a limited set of licenses to grant for access to the original restricted data with names and string variables. Users can also access the 1850 and 1880 datasets via IPUMS International.
Revision History
Census Year | Release Information |
---|---|
1950 | Jan 2024 - Preliminary Release (Version 1) |
1850-1880 | Oct 2023 - 19th Century Update (Versions 2/3) |
January 2024 - 1950 Full Count PRELIMINARY Release
This is the first IPUMS release of the 1950 full-count decennial census data. This data set contains microdata on over 152 million individuals in more than 46 million households. IPUMS continues to work with Ancestry to refine these data, identify and remove duplicate records, and fix transcription errors. Users should consider the status of this work while using these data, and be cognizant that this is a preliminary release only.
If you identify data issues not described below or have suggestions that would make the 1950 database more useful, we would welcome your feedback.
Users are cautioned to be aware of the following information:
- The raw transcribed data contain excess records representing duplicate persons, vacant dwellings, or enumerator notes. We removed a large number of these cases, but were fairly conservative out of concern for eliminating true entries. We estimate that just under one million errant records remain in this preliminary dataset (roughly 0.65% of the file). We will focus on removing these records in our next data release.
- Categories for non-response are not always consistent with the universe definition. Because the universe-defining variables were not subjected to comprehensive editing, they were not always suitable for differentiating between values of "not applicable" and "missing". In general, blank responses are interpreted as "not in universe" regardless of what other variables may indicate.
- Values for personal income are often erroneous (INCTOT, INCWAGE, INCBUSFM, INCOTHER). Multiple values were written in the allotted space on the census forms, which sometimes confused the data capture process. Values might be expressed in single dollars or hundreds, sometimes producing orders-of-magnitude errors which we tried to correct via targeted truncation. We estimate the current total and wage income values exactly match the intent of the form roughly 70-75% of the time. The magnitude of the differences ranges widely. Wages are underestimated twice as frequently as they are overestimated. Researchers should be cautious when using these variables and should screen for outliers. We hope to perform supplemental data capture for these fields in the future.
- Educational attainment is underestimated. Data transcription had trouble distinguishing between the letters used on the census form to identify level of schooling, and some responses were truncated, losing the last digit for grade within level. The result is too many people with one year or less of schooling and too few in the higher grades. Persons who completed high school or attended college are underrepresented by approximately 15%. (Problem was identified May 2024.)
- The identification of institutions is uneven, due to a peculiarity in how these dwellings were recorded on the census forms. We depend on dwelling size and non-family relationship entries to identify group quarters. Institutions (GQ = 3) are further identified by the presence of any person with an "inmate" relationship response.
- The data capture process often incorrectly transcribed the ages of infants, resulting in a population of zero-year-olds only 54% of the expected size. The transcription frequently misinterpreted the month of birth. We flagged and edited (QAGE) cases we suspect are children under the age of 14, most of which (75%) are likely infants. This still results in an undercount of children under the age of 1, with the total equaling 88% of the published infant population count. We will focus on improving the identification of infants in the next release.
- RACE data underrepresent Japanese and Chinese persons in the population. The problem is especially acute in California, where the number of Japanese and Chinese is roughly half the number reported in the published 1950 census results. The undercount is in the range of 10% elsewhere in the contiguous 48 states outside of California. Hawaii race data match the published counts fairly well. Non-Asians are not noticeably affected by the undercount. (Problem was identified November 2024.)
- Many census questions in 1950 were asked only of sample-line persons: every fifth person on the census form (20% of the population). These sample-line persons can be identified using SLREC and all analyses using sample-line variables should be weighted with SLWT. The final sample-line person on each page, representing 1-in-30 persons in the population, were asked a small subset of additional questions. At this time, IPUMS does not provide either an indicator variable or a separate weight for these limited sample-line individuals. We intend to incorporate proper identification and weighting for the 1-in-30 persons in a future release. Variables in this subset are MARRNO, DURMARR, and CHBORN.
- Alaska and Hawaii were not states in 1950. They each used distinct census forms and therefore lack some variables. Children-ever-born (CHBORN) is missing for Hawaii, and no sample-line questions were included for Alaska. The universe statement for each variable notes when one of these territories lacks the census item.
- Some variables are not included in this preliminary data release. These include family income, institution type, residence one year ago, and some geography variables, such as city population and metropolitan area.
October 2023 - 19th Century Full Count Files
This release constitutes a major revision of the 19th century full-count decennial census files from 1850-1880. It includes updates and improvements to household definition, geography, nativity, occupational and demographic variables, new slave-related variables, and improvements to imputation and allocation processes. In general, efforts to increase standardization and improve harmonization have resulted in minor changes throughout the data, but users should take special note of some specific and more consequential updates, listed below:
- DWSIZE: The variable DWSIZE (Dwelling Size) has been removed from the full-count files. Users looking for comparable information should use the variable FAMSIZE.
- GQ/GQTYPE/GQFUNDS: Significant cleanup and standardization of coding and universe enforcement were done to group quarters variables across all files.
- BPL/FBPL/MBPL: Previous versions of these data had inconsistent coding for persons born within certain US Territories. Wherever possible, the detailed codes for these variables have been updated to reflect the Territory status of some states. These are:
State Code State Territory Code Territory 1600 Idaho 1610 Idaho Territory 3500 New Mexico 3510 New Mexico Territory 4000 Oklahoma 4010 Indian Territory 3800/4600 North/South Dakota 4610 Dakota Territory 4900 Utah 4910 Utah Territory 5300 Washington 5310 Washington Territory 5600 Wyoming 5610 Wyoming Territory - CITYPOP: A change was made to CITY to identify only cities within the stated universe. This adjustment suppressed populations for some small cities that were previously available. This change also affects SIZEPL.
- REALPROP/PERSPROP: Erroneous very high values have been corrected.
- IMPREL: A small improvement to relationship imputation may result in changes to IMPREL/RELATE and ensuing dependent variables. These changes are not statistically significant, but users may see adjustments to individual records. These changes also apply to all 1/5/10% sample files where relationships are imputed rather than recorded.
- Allocation updates: Adjustments have been made to improve and refine hot-deck allocation for missing data. The new criteria exclude persons with outlier values from donor eligibility. Users should expect some changes in birthplace, occupation, and age distributions.
- Added variables, with availability:
Variable Label 1850 1860 1870 1880 MCDPOP Minor Civil Division Population x x x x QRPROP Real Estate Value Editing Flag x x QPPROP Personal Estate Value Editing Flag x x OCCDUPE Duplicate Occupation Flag x SLAVEHH Slave Inhabitants in Household x x SLAVEHOLDINGS Number of Slaveholdings Associated With Individual x x SLAVENUM Number of Slaves Associated With Individual x x SLAVEOWN Number of Slaves Owned By Individual x
Single-Year File Updates
1850
- CITY: two new codes have been added. Code 3397 is for historical Lafayette, LA (different from present-day Lafayette), and 4612 has been added within New York City for the Williamsburgh area.
- SLAVEHH, SLAVENUM, and SLAVEHOLDINGS are now available. These variables describe whether any household member was linked to a slave holding and the number of slaves and slave holdings linked to each individual.
1860
- RACE: Persons in Dakota Territory who were previously misclassified as Mulatto have been corrected to American Indian.
- STATEFIP: Households located in Oklahoma/Indian Lands Territory were mistakenly being coded as "Overseas Military." This error has been corrected.
- CITY: New codes for Lockport, NY (3680), Gloucester, MA (2510), and North Providence, RI (4839) have been added.
- SLAVEHH, SLAVENUM, SLAVEHOLDINGS, and SLAVEOWN are now available. In addition to the variables added in 1850, SLAVEOWN indicates how many slaves were specifically "owned" by an individual.
- An alignment error in several counties previously assigned incorrect values for OCC, PERSPROP, and REALPROP to some individuals, and has been fixed. These counties are:
- Dickinson, Kansas
- West Baton Rouge, Louisiana
- Itasca, Minnesota
- Washington, Nebraska
- Hyde, North Carolina
- San Augustine, Texas
- Corrections were made to REALPROP and PERSPROP for some very high values. These records are flagged with QRPROP and QPPROP, respectively.
1870
- Corrections were made to REALPROP and PERSPROP for some very high values. These records are flagged with QRPROP and QPPROP, respectively.
- OCCDUPE: A number of individuals were erroneously assigned the OCC value of the head of their household. The error has been corrected and affected individuals are flagged in the new variable OCCDUPE. Their occupations have been logically edited based on age and relationship to the household head. These persons now have occupation codes of "Other non-occupational response" (OCC=310 and OCC1950=995).
1880
- Significant revisions were made to URBAN, especially in Massachusetts and Rhode Island. Some persons who were living in urbanized fringe areas (MDSTATUS) were erroneously coded as "rural." This has been corrected.
- SICKNESS: A new code for appendicitis (912) was added.
- CITY: New codes for Austin, TX (0490) and Lynchburg, VA (3790) have been added.