IPUMS Full Count Data (1790-1950)

IPUMS USA makes freely available to researchers worldwide full count U.S. Census microdata through 1950. This dataset includes over 800 million individual-level (1850-1950) and 7.5 million household-level records (1790-1840). The microdata represents the fruition of longstanding collaborations between IPUMS and the nation's two largest genealogical organizations —Ancestry.com and FamilySearch— to leverage genealogical data for scientific purposes. This microdata collection is possible due to the donations of an unprecedented scale of digitized census data by both Ancestry.com and FamilySearch.

Person-level microdata are available from 1850-1950. Full count data for 1790-1840 are available at the household-level. In contrast to the 1850-1950 U.S. censuses, the 1790-1840 historical censuses named only the head of the household and tallied household totals attached to that record.

IPUMS is continually improving the quality of these datasets by removing duplicate records, improving identification of household groups, applying numeric codes to the transcribed responses, etc. These improvements to the underlying data are periodically released via IPUMS USA. The variable VERSIONHIST identifies the public-release version of a dataset. Researchers can merge data between versions of the dataset using HISTID and identify updates, corrections, or improvements that may have been applied to the data. Users can review notes about major revisions for more information about specific data versions.

All data are coded and anonymous and can be downloaded via the IPUMS USA extract system. Restricted versions of these datasets are also available. A list of publications and grants that have used Full Count IPUMS data can be found on the Full Count Bibliography page.

Dataset Provenance

1900-1950 Full Count Data

The result of a collaboration with Ancestry.com.

1880 Full Count Data:

The result of a collaboration with the Church of Jesus Christ of Latter-Day Saints. These data are also available via IPUMS International.

1860-1870 Full Count Data:

The result of a collaboration with Ancestry.com.

1850 Full Count Data:

The result of a collaboration with the Church of Jesus Christ of Latter-Day Saints. These data are also available via IPUMS International.

1790-1840 Household-level Full Count Data:

The result of a collaboration and donations from both Ancestry.com and FamilySearch. See more information on the 1790-1840 household-level full count data.

Restricted Versions

IPUMS also has a limited set of licenses to grant for access to the original restricted data with names and string variables. Users can also access the 1850 and 1880 datasets via IPUMS International.

Revision History

Census Year	Release Information
1950	June 2026 - Improved Income and Education Variables (Version 1)
1850-1950	July 2025 - MLP Version 2.0
1940	Jan 2025 - 1940 Update (Version 2)
1950	Jan 2025 - Updates and Corrections (Version 1)
1950	Jan 2024 - Preliminary Release (Version 1)
1850-1880	Oct 2023 - 19th Century Update (Versions 2/3)

June 2026 - Improved Income and Education Variables in 1950

INCTOT, EDUC, and HIGRADE have been updated for the 1950 full count data. Improvements to these variables are the result of new data capture by our partners at Ancestry and further review by IPUMS staff. For this version, staff at Ancestry bypassed the optical character recognition process and hand-entered total income amounts. Education variables were reviewed by hand and corrected where necessary. IPUMS staff audited these changes by making statistical comparisons to the Census Bureau's public use 1% sample and examining the value distributions for each variable.

Improvements to EDUC and HIGRADE address the underestimation of educational attainment from the January 2024 release of 1950. These new variables largely correct the underestimation of secondary and tertiary attainment in the previous version and bring the variable distributions closer to those of the 1% sample. This version still slightly overstates the population with no schooling, and slightly understates the completion of tertiary education (particularly Grade 12), compared to the 1% sample.

Both variables show changes to 2.12% of the population when comparing the detailed versions of the variables (EDUCD and HIGRADED).

INCTOT is vastly improved from the previous version. Median income has risen from $1550 in the preliminary data to $1850 in this release, and income distribution is highly consistent with the 1% sample. Changes to INCTOT affect approximately 6.44% of the total population.

At this time, corrections have only been applied to INCTOT and not the three component income variables. Users may find discrepancies between INCTOT and the summed values of INCWAGE, INCBUSFM and INCOTHER. We encourage users to use INCTOT wherever possible and exercise caution if they require the component income variables in their analyses. Fixes to these other income variables are planned for a future release.

Users should also note that there have been no improvements to the INCTOT universe in this version. Data may still exist for non-sample-line persons and children under the age of 14, or may be missing for in-universe respondents. Users should decide whether to impose the stated universe on the data as part of their analyses.

July 2025 - MLP Version 2.0

The Multigenerational Longitudinal Panel has released a new version of the linked person database. This version utilizes new methods and data sources to improve the linking algorithm. Records from the Social Security Administration allow the linking of women across birth and married surnames, resulting in a 15-20% increase in the number of links. Links to 1950 are now available through the extract system, and the system has been improved to allow users to create custom datasets linked across any combination of censuses.

January 2025 - 1940 Full Count File

A new version of the 1940 full count data is available. This release incorporates updated and improved data capture on key demographic and identification variables by Ancestry. Users should see a general improvement in data quality, with an emphasis on refinements to RACE, AGE, and BPL. We obseved changes of 1-2% from the original version of the data.

In addition, we have merged alternative string variables for NAMEFRST and NAMELAST into our database. Name variables are restricted in the public IPUMS data, but users may see adjustments to SURSIM and dependent variables as a result.

This version also implements a number of small corrections.

Approximately 1500 duplicate person records have been removed. Most of these duplications were in Benton, Washington (178 households), and Washington, D.C. (162 households).
SEA codes were corrected in a small number of counties, predominantly in the Western region.
CITY codes were corrected for Lake Charles, LA (3400), Lakeland, FL (3405), and Uniontown, PA (7081).
Approximately 100,000 cases of URBAN in Connecticut were corrected from "rural" to "urban."
An allocation error in FARM where residents of large group quarters were incorrectly assigned values of FARM=2 (Farm) has been fixed. Group quarters are not in universe for FARM, and should have a value of 1 (Non-farm).

January 2025 - 1950 Full Count New Variables and Corrections

This update introduces two new variables and addresses issues in the data uncovered since the initial preliminary release.

Data Improvements

The variables MBPL and FBPL are now available. Questions about parental birthplace were only asked of sample-line respondants in 1950, and users should use SLWT in their analyses. These variables are not available for persons living in Alaska, and are not currently available for persons living in Hawaii.
A coding error in BPL mistakenly assigned most people born in Washington State as having a birthplace of Washington, D.C. Using the (restricted) string version of this variable as a reference, we have corrected the coding on slightly more than 1.3 million records. Users whose analyses include BPL should re-submit their extract requests and download this new data. In addition, variables that rely on the value of BPL to select a donor for allocation (CITIZEN, HISPAN, MBPL, FBPL) may be affected.

Continuing Known Issues

CHBORN: There are currently too few women reporting zero births. Non-sample-line women and sample-line women with missing data both have a value of "not applicable" (code 00). We are developing a solution to differentiate between these two groups so that allocation can be accurately applied to in-universe persons with missing values.
FARM: As described in the notes for 1940 above, the issue of farm status being incorrectly allocated to residents of group quarters extends across many of the full count data sets. At this time, the 1940 data are the only ones that have been corrected. Users may implement their own programmatic fix to the FARM variable using instructions provided in the variable description.
MBPL and FBPL: For a number of records, there is a discrepancy between the values of MBPL/FBPL and the values for parental birthplaces provided by the Attached Characteristics extract feature. Users should be aware of the potential for conflicting data and use their best judgement when deciding which cases to include in their analyses.
Issues with duplicate records, universe discrepancies, GQ, AGE, RACEincome variables, education variables, and sample-line variables remain as described in the January 2024 release notes.

January 2024 - 1950 Full Count PRELIMINARY Release

This is the first IPUMS release of the 1950 full-count decennial census data. This data set contains microdata on over 152 million individuals in more than 46 million households. IPUMS continues to work with Ancestry to refine these data, identify and remove duplicate records, and fix transcription errors. Users should consider the status of this work while using these data, and be cognizant that this is a preliminary release only.

If you identify data issues not described below or have suggestions that would make the 1950 database more useful, we would welcome your feedback.

Users are cautioned to be aware of the following information:

The raw transcribed data contain excess records representing duplicate persons, vacant dwellings, or enumerator notes. We removed a large number of these cases, but were fairly conservative out of concern for eliminating true entries. We estimate that just under one million errant records remain in this preliminary dataset (roughly 0.65% of the file). We will focus on removing these records in our next data release.
Categories for non-response are not always consistent with the universe definition. Because the universe-defining variables were not subjected to comprehensive editing, they were not always suitable for differentiating between values of "not applicable" and "missing". In general, blank responses are interpreted as "not in universe" regardless of what other variables may indicate.
Values for personal income are often erroneous (INCTOT, INCWAGE, INCBUSFM, INCOTHER). Multiple values were written in the allotted space on the census forms, which sometimes confused the data capture process. Values might be expressed in single dollars or hundreds, sometimes producing orders-of-magnitude errors which we tried to correct via targeted truncation. We estimate the current total and wage income values exactly match the intent of the form roughly 70-75% of the time. The magnitude of the differences ranges widely. Wages are underestimated twice as frequently as they are overestimated. Researchers should be cautious when using these variables and should screen for outliers. We hope to perform supplemental data capture for these fields in the future.
Educational attainment is underestimated. Data transcription had trouble distinguishing between the letters used on the census form to identify level of schooling, and some responses were truncated, losing the last digit for grade within level. The result is too many people with one year or less of schooling and too few in the higher grades. Persons who completed high school or attended college are underrepresented by approximately 15%. (Problem was identified May 2024.)
The identification of institutions is uneven, due to a peculiarity in how these dwellings were recorded on the census forms. We depend on dwelling size and non-family relationship entries to identify group quarters. Institutions (GQ = 3) are further identified by the presence of any person with an "inmate" relationship response.
The data capture process often incorrectly transcribed the ages of infants, resulting in a population of zero-year-olds only 54% of the expected size. The transcription frequently misinterpreted the month of birth. We flagged and edited (QAGE) cases we suspect are children under the age of 14, most of which (75%) are likely infants. This still results in an undercount of children under the age of 1, with the total equaling 88% of the published infant population count. We will focus on improving the identification of infants in the next release.
RACE data underrepresent Japanese and Chinese persons in the population. The problem is especially acute in California, where the number of Japanese and Chinese is roughly half the number reported in the published 1950 census results. The undercount is in the range of 10% elsewhere in the contiguous 48 states outside of California. Hawaii race data match the published counts fairly well. Non-Asians are not noticeably affected by the undercount. (Problem was identified November 2024.)
Many census questions in 1950 were asked only of sample-line persons: every fifth person on the census form (20% of the population). These sample-line persons can be identified using SLREC and all analyses using sample-line variables should be weighted with SLWT. The final sample-line person on each page, representing 1-in-30 persons in the population, were asked a small subset of additional questions. At this time, IPUMS does not provide either an indicator variable or a separate weight for these limited sample-line individuals. We intend to incorporate proper identification and weighting for the 1-in-30 persons in a future release. Variables in this subset are MARRNO, DURMARR, and CHBORN.
Alaska and Hawaii were not states in 1950. They each used distinct census forms and therefore lack some variables. Children-ever-born (CHBORN) is missing for Hawaii, and no sample-line questions were included for Alaska. The universe statement for each variable notes when one of these territories lacks the census item.
Some variables are not included in this preliminary data release. These include family income, institution type, residence one year ago, and some geography variables, such as city population and metropolitan area.

Back to Release Selection

October 2023 - 19th Century Full Count Files

This release constitutes a major revision of the 19th century full-count decennial census files from 1850-1880. It includes updates and improvements to household definition, geography, nativity, occupational and demographic variables, new slave-related variables, and improvements to imputation and allocation processes. In general, efforts to increase standardization and improve harmonization have resulted in minor changes throughout the data, but users should take special note of some specific and more consequential updates, listed below:

DWSIZE: The variable DWSIZE (Dwelling Size) has been removed from the full-count files. Users looking for comparable information should use the variable FAMSIZE.
GQ/GQTYPE/GQFUNDS: Significant cleanup and standardization of coding and universe enforcement were done to group quarters variables across all files.

BPL/FBPL/MBPL: Previous versions of these data had inconsistent coding for persons born within certain US Territories. Wherever possible, the detailed codes for these variables have been updated to reflect the Territory status of some states. These are:

State Code	State	Territory Code	Territory
1600	Idaho	1610	Idaho Territory
3500	New Mexico	3510	New Mexico Territory
4000	Oklahoma	4010	Indian Territory
3800/4600	North/South Dakota	4610	Dakota Territory
4900	Utah	4910	Utah Territory
5300	Washington	5310	Washington Territory
5600	Wyoming	5610	Wyoming Territory

CITYPOP: A change was made to CITY to identify only cities within the stated universe. This adjustment suppressed populations for some small cities that were previously available. This change also affects SIZEPL.
REALPROP/PERSPROP: Erroneous very high values have been corrected.
IMPREL: A small improvement to relationship imputation may result in changes to IMPREL/RELATE and ensuing dependent variables. These changes are not statistically significant, but users may see adjustments to individual records. These changes also apply to all 1/5/10% sample files where relationships are imputed rather than recorded.
Allocation updates: Adjustments have been made to improve and refine hot-deck allocation for missing data. The new criteria exclude persons with outlier values from donor eligibility. Users should expect some changes in birthplace, occupation, and age distributions.

Added variables, with availability:

Variable	Label	1850	1860	1870	1880
MCDPOP	Minor Civil Division Population	x	x	x	x
QRPROP	Real Estate Value Editing Flag		x	x
QPPROP	Personal Estate Value Editing Flag		x	x
OCCDUPE	Duplicate Occupation Flag			x
SLAVEHH	Slave Inhabitants in Household	x	x
SLAVEHOLDINGS	Number of Slaveholdings Associated With Individual	x	x
SLAVENUM	Number of Slaves Associated With Individual	x	x
SLAVEOWN	Number of Slaves Owned By Individual		x

Single-Year File Updates

1850

CITY: two new codes have been added. Code 3397 is for historical Lafayette, LA (different from present-day Lafayette), and 4612 has been added within New York City for the Williamsburgh area.
SLAVEHH, SLAVENUM, and SLAVEHOLDINGS are now available. These variables describe whether any household member was linked to a slave holding and the number of slaves and slave holdings linked to each individual.

1860

RACE: Persons in Dakota Territory who were previously misclassified as Mulatto have been corrected to American Indian.
STATEFIP: Households located in Oklahoma/Indian Lands Territory were mistakenly being coded as "Overseas Military." This error has been corrected.
CITY: New codes for Lockport, NY (3680), Gloucester, MA (2510), and North Providence, RI (4839) have been added.
SLAVEHH, SLAVENUM, SLAVEHOLDINGS, and SLAVEOWN are now available. In addition to the variables added in 1850, SLAVEOWN indicates how many slaves were specifically "owned" by an individual.
An alignment error in several counties previously assigned incorrect values for OCC, PERSPROP, and REALPROP to some individuals, and has been fixed. These counties are:
- Dickinson, Kansas
- West Baton Rouge, Louisiana
- Itasca, Minnesota
- Washington, Nebraska
- Hyde, North Carolina
- San Augustine, Texas
Corrections were made to REALPROP and PERSPROP for some very high values. These records are flagged with QRPROP and QPPROP, respectively.

1870

Corrections were made to REALPROP and PERSPROP for some very high values. These records are flagged with QRPROP and QPPROP, respectively.
OCCDUPE: A number of individuals were erroneously assigned the OCC value of the head of their household. The error has been corrected and affected individuals are flagged in the new variable OCCDUPE. Their occupations have been logically edited based on age and relationship to the household head. These persons now have occupation codes of "Other non-occupational response" (OCC=310 and OCC1950=995).

1880

Significant revisions were made to URBAN, especially in Massachusetts and Rhode Island. Some persons who were living in urbanized fringe areas (MDSTATUS) were erroneously coded as "rural." This has been corrected.
SICKNESS: A new code for appendicitis (912) was added.
CITY: New codes for Austin, TX (0490) and Lynchburg, VA (3790) have been added.