Social Security to Census Crosswalks
- MLP version 1.2: released April 2024 -
IPUMS links Social Security Administration records to the 1900-1950 censuses. The data source is the Social Security NUMIDENT (Numerical Identification files). The public version of the NUMIDENT includes all Social Security registrants with verified death records and persons who would have reached the age of 110 by December 31, 2007. The earliest records date to 1936. We combine data from the Social Security applications file (SS-5) and death records file to link individuals to the 1900-1950 censuses.
The crosswalk files contain the variable HISTID to link to the relevant census record, month and year of birth, month and year of death (when available), and the state in which the Social Security number was issued. We also include a consistent SSID variable to identify the same person across different NUMIDENT-to-census crosswalks.
Users should note that the links to an individual across different NUMIDENT-to-census crosswalks do not always agree with the census-to-census links for that person. In the next version of MLP links, we will resolve such contradictions. In most cases we believe the census-to-census links are superior, but there are exceptions.
Download Data
Linking Method
We use the Social Security applications file (SS-5) combined with the death records file to link individuals to the censuses. For each Social Security Number, multiple entries may exist due to changes or replacements of SSNs or because of corrections to the information. To fully leverage the available information, all recorded first names, last names, birth dates, birthplaces, and parents' names are employed in the linking process. The linking procedure is similar to that used for census-to-census linking, modified to account for the lack of household contextual information (stage 2 linking) and for variables not present in the Social Security records.
The linking process distinguishes between men, never-married women, and ever-married women. For men and never-married women, linking is based on last names. For ever-married women, the father's last name is employed as the maiden name to enable links to the woman's pre-married state. Separate training datasets are used for each group. A potential match universe is established in the census by blocking on birthplace, sex, birth year, first and last/maiden name. Only potential matches with first and last name Jaro-Winkler similarity scores exceeding 0.80 are considered. Characteristics of the individual in the NUMIDENT to be linked are compared to those of potential matches in the census. The comparison features include age, race, similarity scores of first name, last name, mother and father names, as well as the distance between the birthplace and the state of residence. We train a machine learning algorithm to generate scores for each potential match based on the features described above. To establish confident links, we employ two thresholds. Potential matches with scores that exceed the first threshold are kept. Among those, scores of most probable and second-most probable matches are compared and the ratio of those matches must be greater than the second threshold.
Underlying NUMIDENT Data
The original NUMIDENT files used by IPUMS are public-use data disseminated by the National Archives and Records Administration. The full NUMIDENT data contain a number of variables not included in the crosswalk files, including women's surnames prior to marriage. MLP uses this information to link the Social Security records to the census. The more extensive NUMIDENT information used by IPUMS is available to researchers who have obtained access to the restricted IPUMS data. Contact ipumsres@umn.edu with inquiries.
Citation and terms of use
To download these data, you must agree to IPUMS USA terms of use.
1. Cite both IPUMS MLP and the Full Count IPUMS Ancestry data:
Jonas Helgertz, Nesile Ozder, Steven Ruggles, John Robert Warren, Catherine A. Fitch, J. David Hacker, Matt A. Nelson, Joseph P. Price, Evan Roberts, and Matthew Sobek. IPUMS Multigenerational Longitudinal Panel: Version 1.2 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D016.V1.2
Steven Ruggles, Matt A. Nelson, Matthew Sobek, Catherine A. Fitch, Ronald Goeken, J. David Hacker, Evan Roberts, and J. Robert Warren. IPUMS Ancestry Full Count Data: Version 4.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D014.V4.0
2. Redistribution: You will not redistribute the data without permission:
Publications and research reports making use of IPUMS should be added to our Bibliography.
Contact us
For questions about IPUMS MLP, contact ipumsres@umn.edu.