Data Extracts

The IPUMS dissemination system produces data files designed for cross-census linking. On the sample selection tab for full-count data, click the checkbox to "link census data" to create a data extract that identifies individuals across multiple censuses from 1850 to 1940. The extract system will automatically include the Historical Identification Key (HIK) variable that uniquely identifies persons across census years. The census links were created by the IPUMS Multigenerational Longitudinal Panel (MLP) project. Roughly one-quarter to one-third of individuals are linked between censuses. See the hyperlinks at the top of this page for more information on the MLP project and the construction of the census links.

Data format. The linked data extract will be produced in standard IPUMS format: all the person records for census year X followed by all person records for census year Y, etc. Users may use HIKs to sort this into a long file, restructure the data to a wide file, or otherwise manipulate the data as appropriate for their application.

Sample selection. Invoking the "link census data" option on the sample selection screen constrains your choice of censuses to the full-count datasets. Any other datasets in your data cart will be dropped and cannot subsequently be added.

Case selection. Linked extracts, like all full count extracts, are large. By default, for linked census extracts, the extract system will employ case selection to include only person who link across ALL of the selected censuses. For example, if you select the 1900, 1910, and 1920 censuses for linking, only persons who are linked across all three censuses will be included in your data extract. You can edit those automated case selection choices on the final screen of the extract process by clicking the "Select Cases" button. Note that removing any of those selections can yield significantly larger data extracts that may prove challenging to process.

You can also change the default case selection option from including only linked persons to also include everyone who resided with the linked person. The choice to include non-linked household members will result in a more complicated dataset for analysis while providing more contextual information for the linked persons.

Users should be cautious about adding case selections beyond those that are applied automatically by the system to identify linked individuals across census years. Performing case selection on time-variant characteristics -- such as age, marital status, or state of residence -- risks excluding some observations for a person. An individual may be linked in a census year, but the observation will be dropped if they do not meet the additional selection criteria in that specific census.

Manual linked extracts. You can create extracts that include the HIK linking key without invoking the "link census data" checkbox on the sample selection screen and its associated automated case selection. The HIK variable and the flags identifying linked persons in each dataset are accessible in the Linking Tools group in the drop-down list in the variable browsing system.

Crosswalk Files

Alternative crosswalk files can be used to link IPUMS full-count census extracts outside of the dissemination system. The files provide HISTIDs (persistent person IDs) for each adjacent census pairing. For most users, the creation of linked data extracts through the dissemination system will be more convenient than the crosswalks. The crosswalks contain the same links as the data available through the extract system and they will be updated to remain in sync.

Linked Representative Samples

Between 2008 and 2010, IPUMS released linked representative samples (LRS) of the population spanning the period 1850 to 1930. These samples linked the full count 1880 database to the samples of the population from 1850, 1860, 1870, 1900, 1910, 1920 and 1930. Data and documentation are archived and available for analysis and replication. The LRS files are much smaller than MLP crosswalks, since linkages were from the 1880 full count census to 1% samples in other census years. The MLP linking strategy also differs from the LRS strategy.