IPUMS Linked Data

IPUMS disseminates full count census enumerations for nine census years from 1850 to 1940. These full count data, covering almost 700 million individual records, are the fruit of collaboration between IPUMS and the world's two largest genealogical organizations—Ancestry.com and FamilySearch—to leverage genealogical data for scientific purposes. Full count data have opened the possibility of automated record linkages across census years to construct millions of individual life histories and trace millions of families over multiple generations.

Multigenerational Longitudinal Panel

The IPUMS Multigenerational Longitudinal Panel project links individuals' records between censuses. Our first IPUMS MLP data release consists of a set of crosswalks between pairs of adjacent censuses from 1850-1940. We plan to build on this work in future releases. We expect that IPUMS MLP will eventually serve as a general framework that can incorporate records from a wide range of sources.

Next steps

Our next step will be to incorporate the linking keys into the IPUMS data extract system. We expect to complete that work in spring 2022. We plan several enhancements to increase the number of high-quality linkages. Future improvements will incorporate additional links across adjacent census years, link individuals in non-adjacent census years, and increase links using administrative records. We will also be working on data dissemination tools to enable users to easily create customized datasets of linked data.

Project team

Jonas Helgertz, Lead Research Scientist
Steven Ruggles, Principal Investigator (MPI)
John Robert Warren, Principal Investigator (MPI)
Catherine Fitch, Co-investigator
J. David Hacker, Co-investigator
Evan Roberts, Co-investigator
Matthew Sobek, Co-investigator
Matt A. Nelson, Research Scientist
Jacob Wellington, Hlink Software Developer
Martha Bailey (University of California, Los Angeles), Consultant
Joseph Ferrie (Northwestern University), Consultant
Joseph Price (Brigham Young University), Consultant

Linked Representative Samples

Between 2008 and 2010, IPUMS released linked representative samples (LRS) of the population spanning the period 1850 to 1930. These samples linked the full count 1880 database to the samples of the population from 1850, 1860, 1870, 1900, 1910, 1920 and 1930. Data and documentation are archived and available for analysis and replication. The LRS files are much smaller than MLP crosswalks, since linkages were from the 1880 full count census to 1% samples in other census years. The MLP linking strategy also differs from the LRS strategy.

Funding

IPUMS MLP is funded by the National Institute on Aging grant R01AG057679.

Contact us

For questions about IPUMS linked data, contact ipumsres@umn.edu.