IPUMS Complete Count Data

By 2020, IPUMS-USA will make freely available to researchers worldwide 100% count U.S. Census microdata through 1940. This dataset will include over 650 million individual-level (1850-1940) and 7.5 million household-level records (1790-1840). The microdata represents the fruition of longstanding collaborations between IPUMS and the nation's two largest genealogical organizations—Ancestry.com and FamilySearch—to leverage genealogical data for scientific purposes. This new microdata collection is possible due to the donations of an unprecedented scale of digitized census data by both Ancestry.com and FamilySearch.

Preliminary data from 1900-1940 is currently available, as well as full count files from 1850 and 1880. All data are coded and anonymous and can be downloaded via the IPUMS-USA extract system. The IPUMS also has a limited set of licenses to grant for access to the original restricted data with names and string variables. Please email ipums@umn.edu for more information.

Completed Datasets

1940 Preliminary Complete Count Data: This file is the first public 100% 1940 file released through the IPUMS extract system. New versions of the data with additional variables and various improvements will be released over the next three years. A list of publications and grants that have used the 1940 IPUMS data can be found here. Below are a few notes about the preliminary 1940 data:

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT40, reconstructed using the variable SERIAL40, and the original count is found in the variable NUMPREC40.
  • Some variables are missing from this data set for specific enumeration districts. The enumeration districts with missing data can be identified using the variable EDMISS. These variables will be added in a future release.
  • Coded variables derived from string variables are still in progress. These variables include: occupation, industry and migration status.
  • We have allocated missing observations and edited some inconsistencies for the following variables: SURSIM, SEX, SCHOOL, RELATE, RACE, OCC1950, MTONGUE, MBPL, FBPL, BPL, MARST, EMPSTAT, CITIZEN, OWNERSHP. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the 'Select data quality flags' box on the extract summary page.
  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for many variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

1930 Preliminary Complete Count Data: The result of a recent collaboration between Minnesota Population Center and Ancestry.com, the 1930 complete count database is now available through IPUMS-USA. A few notes about this complete count database:

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
  • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
  • We have allocated missing observations and edited some inconsistencies for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGEMARR, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, FARM, EMPSTAT, OCC1950, IND1950, MTONGUE, MARST, RACE, SEX, RELATE, CLASSWKR. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the 'Select data quality flags' box on the extract summary page.
  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

1920 Preliminary Complete Count Data: The result of a recent collaboration between Minnesota Population Center and Ancestry.com, the 1920 complete count database is now available through IPUMS-USA. A few notes about this complete count database:

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
  • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
  • We have allocated missing observations and edited some inconsistencies for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the 'Select data quality flags' box on the extract summary page.
  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

1910 Preliminary Complete Count Data: The result of a recent collaboration between Minnesota Population Center and Ancestry.com, the 1910 complete count database is now available through IPUMS-USA. A few notes about this complete count database:

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
  • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
  • We have allocated missing observations and edited some inconsistencies for the following variables: SPEAKENG, LANGUAGE, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, EMPSTAT, CLASSWKR, DURMARR, MARST, RACE, SEX, RELATE, CHBORN, CHSURV, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the 'Select data quality flags' box on the extract summary page.
  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, GQ has known inconsistencies and will be improved with the next release.

1900 Preliminary Complete Count Data: The result of a recent collaboration between Minnesota Population Center and Ancestry.com, the 1900 complete count database is now available through IPUMS-USA. A few notes about this complete count database:

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
  • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
  • We have allocated missing observations and edited some inconsistencies for the following variables: YRSUSA1, YRIMMIG, CITIZEN, AGE, AGEMONTH, BIRTHMO, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, DURMARR, MARST, RACE, SEX, RELATE, CHBORN, CHSURV, GQFUNDS", GQTYPE, QTRUNEMP. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the 'Select data quality flags' box on the extract summary page.
  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, GQ has known inconsistencies and will be improved with the next release.

1880: This dataset was developed through a collaboration between the Minnesota Population Center and the Church of Jesus Christ of Latter-Day Saints; these complete count data have been available since 2008. Data are available through the North Atlantic Population Project (with names) and IPUMS-USA (without names).

1880 Linked Representative Samples: This database links records from the 1880 complete count dataset to 1% samples of the 1850 to 1930 U.S. Censuses. We have data samples for seven pairs of years: 1850-1880, 1860-1880, 1870-1880, 1880-1900, 1880-1910, 1880-1920, and 1880-1930. Each of these contains three independent linked samples: one of men, one of women, and one of married couples. Go to the Linked Samples page.

1850: The result of a recent collaboration between Minnesota Population Center and the Church of Jesus Christ of Latter-Day Saints, the 1850 complete count database is now available through IPUMS-USA. These data, with names, will be available from the North Atlantic Population Project (with 18 states and the District of Columbia available now) as well. A few notes about this complete count database:

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
  • We have allocated missing observations and edited some inconsistencies for the following variables: SEX, RACE, OCC1950, OCC, BPL, AGE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the 'Select data quality flags' box on the extract summary page.
  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables.

Datasets in Progress

1790-1910: Our largest new microdata collection will capitalize on the donations of an unprecedented scale of digitized census data by both Ancestry.com and FamilySearch. The 1790-1910 microdata include a core set of variables for every census year, including geographic location, age, sex, race, and name. Birthplace information is available in all but a few of the early years, and from 1880 forward the data include marital status, the relationship of each individual to the household head, and the birthplace of each individual's mother and father, allowing the identification of second-generation Americans. Other key variables such as year of immigration, duration of marriage, literacy, occupation, children ever born, children surviving, and disability are available sporadically.