Social Security to Census Crosswalk
- MLP version 2.0: released July 2025 -
IPUMS links Social Security Administration records to the 1880-1950 censuses. The data source is the Social Security Numident (Numerical Identification files). The public version of the Numident includes all Social Security registrants with verified death records and persons who would have reached the age of 110 by December 31, 2007. The earliest records date to 1936. We combine data from the Social Security applications file (SS-5) and death records file to link individuals to the censuses.
IPUMS uses a subset of Numident links to enhance census to census linking for married women. We also provide a crosswalk of all Numident links to the censuses. The crosswalk includes Social Security Number (SSN) and the HISTID of each census record to which that SSN is linked across years. We are able to offer SSNs because the Numident data, which consist only of deceased persons, are fully public information available without restriction from the National Archives.
Users should note that the links of an individual from the Numident across census years do not always agree with the census-to-census links. In most cases we believe the census-to-census links are superior, but there are exceptions.
DOWNLOAD CROSSWALK
Download Stata (1.2 GB)
Download CSV (1.1 GB)
Download Parquet (1.0 GB)
The crosswalk described above is Version 2.0, released July 2025. Older Version 1.2 MLP Numident links are also available.
Linking Method
We use the Social Security applications file (SS-5) combined with the death records file to link individuals to the censuses. For each Social Security Number, multiple entries may exist due to changes or replacements of SSNs or because of corrections to the information. To fully leverage the available information, all recorded first names, last names, birth dates, birthplaces, and parents' names are employed in the linking process. The linking procedure is similar to that used for census-to-census linking, modified to account for the lack of household contextual information and for variables not present in the Social Security records.
We link women and men separately, because name changes at marriage complicate linking for women. For women, we use the father's last name as a proxy for the woman's maiden (birth) surname. We only link the Numident to single women in the census, when they are likely to be living with their parents. Parental names in the Numident greatly aid in accurately linking these women. We determined that Numident links to ever-married females were low quality, because we cannot tell from the Numident when the woman changed marital status and switched to her married surname in the census. Remarriages pose an additional surname problem we avoid by focusing on single women. Once we've linked a woman as single, we utilize these links for census to census linking to married women in Step 3.
For men, we do not apply restrictions based on marital status, since men rarely change their last names. We link men in the Social Security file to census records using invariant identifying information.
In the linking process for both men and women, we first create a set of potential matches from the census. Potential matches must share the same birthplace and sex, and birth years must be within three years of each other. To narrow down the search further, we only consider census records where the Jaro-Winkler score of last/maiden name is greater than 0.80 and where at least one of the first or middle names also has a Jaro-Winkler score above 0.80. Next, we compare detailed characteristics between each Numident record and its potential census matches. The comparison features include age, race, similarity scores of first name, last name, mother and father names, as well as the distance between the birthplace and the state of residence.
A machine learning model is trained — separately for men and women — using these features to assign a match score to each potential link. To finalize the links, we apply a two-step thresholding process. A link is kept if the probability exceeds a minimum score threshold and if the score ratio between the best and second-best match is above a second threshold, ensuring confidence in the match.
Differences from Previous Version
- We now include links to the 1880 census for both men and women.
- We created a newly cleaned version of the Numident file, including improved versions of first name, middle name, and birthplace. First and middle names are now used as blocking variables during linking. Additionally, we exclude entries with conflicting information, such as significantly different birth years or multiple, inconsistent birthplaces.
- Lastly, we no longer link Numident women directly to married women in the census. That part of the process has been moved to the census-to-census linking step, as described above.
Underlying Numident Data
The original Numident files used by IPUMS are public-use data disseminated by the National Archives and Records Administration. The full Numident data contain a number of variables not included in the crosswalk files, including women's surnames prior to marriage. MLP uses this information to link the Social Security records to the census. The more extensive Numident information used by IPUMS is available to researchers who have obtained access to the restricted IPUMS data. Contact ipumsres@umn.edu with inquiries.
Citation and terms of use
To download these data, you must agree to IPUMS USA terms of use.
1. Cite both IPUMS MLP and the Full Count IPUMS Ancestry data:
Steven Ruggles, Nesile Ozder, Catherine A. Fitch, Matthew Sobek, Julia A. Rivera Drew, J. David Hacker, Jonas Helgertz, Cheyenne Lonobile, Matt A. Nelson, Evan Roberts, and John Robert Warren. IPUMS Multigenerational Longitudinal Panel: Version 2.0 [dataset]. Minneapolis, MN: IPUMS, 2025. https://doi.org/10.18128/D016.V2.0
Steven Ruggles, Matt A. Nelson, Matthew Sobek, Catherine A. Fitch, Ronald Goeken, J. David Hacker, Evan Roberts, and J. Robert Warren. IPUMS Ancestry Full Count Data: Version 4.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D014.V4.0
2. Redistribution: You will not redistribute the data without permission:
Publications and research reports making use of IPUMS should be added to our Bibliography.
Contact us
For questions about IPUMS MLP, contact ipumsres@umn.edu.