Linked Data HomeData DescriptionLinking MethodGet Data


Social Security to Census Crosswalks

- MLP version 1.2: released April 2024 -

IPUMS links Social Security Administration records to the 1900-1950 censuses. The data source is the Social Security NUMIDENT (Numerical Identification files). The public version of the NUMIDENT includes all Social Security registrants with verified death records and persons who would have reached the age of 110 by December 31, 2007. The earliest records date to 1936. We combine data from the Social Security applications file (SS-5) and death records file to link individuals to the 1900-1950 censuses.

The crosswalk files contain the variable HISTID to link to the relevant census record, month and year of birth, month and year of death (when available), and the state in which the Social Security number was issued. We also include a consistent SSID variable to identify the same person across different NUMIDENT-to-census crosswalks.

Users should note that the links to an individual across different NUMIDENT-to-census crosswalks do not always agree with the census-to-census links for that person. In the next version of MLP links, we will resolve such contradictions. In most cases we believe the census-to-census links are superior, but there are exceptions.

Number of cases per NUMIDENT-census crosswalk
Census year
Linked persons
1900
2,079,166
1910
3,172,147
1920
7,772,057
1930
14,767,780
1940
15,620,514
1950
13,662,071

Download Data

Stata format
CSV format
1900
1900
1910
1910
1920
1920
1930
1930
1940
1940
1950
1950

Linking Method

We use the Social Security applications file (SS-5) combined with the death records file to link individuals to the censuses. For each Social Security Number, multiple entries may exist due to changes or replacements of SSNs or because of corrections to the information. To fully leverage the available information, all recorded first names, last names, birth dates, birthplaces, and parents' names are employed in the linking process. The linking procedure is similar to that used for census-to-census linking, modified to account for the lack of household contextual information (stage 2 linking) and for variables not present in the Social Security records.

The linking process distinguishes between men, never-married women, and ever-married women. For men and ever-married women, linking is based on last names, while for never-married women, the father's last name is employed as the maiden name. Separate training datasets are used for each group. A potential match universe is established in the census by blocking on birthplace, sex, birth year, first and last name. Only potential matches with first and last name Jaro-Winkler similarity scores exceeding 0.80 are considered. Characteristics of the individual in the NUMIDENT to be linked are compared to those of potential matches in the census. The comparison features include age, race, similarity scores of first name, last name, mother and father names, as well as the distance between the birthplace and the state of residence. We train a machine learning algorithm to generates scores for each potential match based on the aforementioned features. To establish confident links, we employ two thresholds. Potential matches with scores that exceed the first threshold are kept. Among those, scores of most probable and second-most probable matches are compared and the ratio of those matches must be greater than the second threshold.

Underlying NUMIDENT Data

The original NUMIDENT files used by IPUMS are public-use data disseminated by the National Archives and Records Administration. The full NUMIDENT data contain a number of variables not included in the crosswalk files, including women's surnames prior to marriage. MLP uses this information to link the Social Security records to the census. The more extensive NUMIDENT information used by IPUMS is available to researchers who have obtained access to the restricted IPUMS data. Contact ipumsres@umn.edu with inquiries.

Citation and terms of use

To download these data, you must agree to IPUMS USA terms of use.

1. Cite both IPUMS MLP and the Full Count IPUMS Ancestry data:

Jonas Helgertz, Nesile Ozder, Steven Ruggles, John Robert Warren, Catherine A. Fitch, J. David Hacker, Matt A. Nelson, Joseph P. Price, Evan Roberts, and Matthew Sobek. IPUMS Multigenerational Longitudinal Panel: Version 1.2 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D016.V1.2

Steven Ruggles, Matt A. Nelson, Matthew Sobek, Catherine A. Fitch, Ronald Goeken, J. David Hacker, Evan Roberts, and J. Robert Warren. IPUMS Ancestry Full Count Data: Version 4.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D014.V4.0

2. Redistribution: You will not redistribute the data without permission:

You may not redistribute any IPUMS Full Count data that are linked to the MLP crosswalks. You may request permission to redistribute IPUMS MLP crosswalks.

Publications and research reports making use of IPUMS should be added to our Bibliography.

Contact us

For questions about IPUMS MLP, contact ipumsres@umn.edu.

Back to Top