MLP Version History
Version 2.0 (July 2025): The current version of MLP, available through the IPUMS data extract system. This version differs from version 1.2 by adding women to the Step 1 (individual) linking stage, using the Social Security Numident data to link single to married women (new Step 3), replacing logistic regression with the XGBoost algorithm for matching records, and removing conflicts between 10, 20, and 30 year census pairings. Some features and training data were also altered. The IPUMS extract system was improved to allow creation of custom datasets containing records linked across any combination of censuses from 1850 to 1950. For more details on the linking process used in version 2.0, see MLP linking methods.
Version 1.2 (April 2024): This version differs from version 1.1 by including the 1950 census, using revised 19th century census files, and adding more customization to the linking model to address changing variable availability across censuses. Additionally, this release provides links between 20 and 30-year census pairings. Links are available as crosswalk files and were never offered through the IPUMS data extract system. Version 1.2 also provides crosswalks to the Social Security Numident data for the first time in addition to World War II enlistment data linked to the 1940 census. The enlistment data were prepared by the CenSoc team at UC Berkeley.
Version 1.1 (March 2023): This version differs from version 1.0 in using revised census files for some years. In addition, small changes were implemented in the linking code, including i) an updated name standardization file, and ii) census-specific inter-county and state distance files. The software that calculates name similarity was also updated, producing additional minor changes. This version of MLP was the first to be made available through the IPUMS data extract system. It also introduced Historical Identification Keys (HIKs), which provide a common ID for individuals in all census years in which they are identified. Links are available as crosswalk files.
Version 1.0 (September 2020): The original MLP data release. The first release included links for the 1900-1940 censuses. Crosswalks were added for 1850-1880 in September 2021, and for the twenty-year 1880-1900 census pairing in March 2022. Links are available as crosswalk files.
Linked Representative Samples: Between 2008 and 2010, IPUMS released linked representative samples (LRS) of the U.S. population spanning the period 1850 to 1930. These samples linked the full count 1880 database to the samples of the population from 1850, 1860, 1870, 1900, 1910, 1920 and 1930. Data and documentation are archived and available for analysis and replication. The LRS files are much smaller than MLP crosswalks, since linkages were from the 1880 full count census to 1% samples in other census years. The MLP linking strategy also differs from the LRS strategy.