MLP Crosswalk Data User Guide
The MLP crosswalk file is designed to be used in combination with data extracts created by the IPUMS dissemination system.
Note that the IPUMS extract system can create linked census files, which will serve most users better than the crosswalks, which require more file manipulation.
In the IPUMS extract system
- On the "Select Samples" screen, deselect the default samples and choose an appropriate 100% data file on the "USA full count" tab. Make separate extracts: one for each census year you wish to link via the crosswalk.
- In selecting variables, be as parsimonious as possible. These can be very large files, posing potential problems for statistical software during merging. The preselected variables YEAR and HISTID must be included in your extract, as they constitute the unique identifier needed to merge the data onto the crosswalk file.
- Use the "Select Cases" option with caution. You may otherwise lose cases that do not match the expected value in the adjoining census year, but which were linked by the algorithm based on a holistic assessment of personal and household characteristics. For example, we sometimes link individual cases in which age in the first census differs by more than the expected 10 years in the subsequent census. See the linking method for a description.
If you intend to use a pre-existing data extract, ensure that you have the most recent version of the data by referring to the IPUMS USA revision history. Prospective users who are not familiar with the IPUMS data extract system can refer to these written instructions or video tutorials.
On your desktop: General instructions
- Download your IPUMS extracts and the corresponding MLP crosswalk to the same directory on your local file system. Uncompress the files.
- Each record in the crosswalk file contains the HISTID values that identify a linked person in each census year in which they can be identified. Select only the HISTID year values required for merging to each extract. Drop any cases that are missing a HISTID entry, since those cases are not linkable records in your extract. The crosswalk file also contains the HIK variable: the common ID that identifies a person linked across census years.
- The HIK variable is unstable, and its value can change between data releases as new links are made or old ones dropped. In contrast, the HISTID is stable and will never change. For this reason, we recommend you continue including HISTID in your extracts should you ever need to update your data in the future.
- Rename the variables in each census extract file to incorporate the census year in the name. For example, in an extract of the 1900 census, rename AGE to AGE_1900, MARST to MARST_1900, etc.
- Merge each census file onto the crosswalk using the appropriate HISTID variable for that year (HISTID_1900, HISTID_1910, etc).
On your desktop: Stata users
- Stata users can use this .do file to perform the merge of IPUMS extracts. Customize the do file to provide the directory where the data are located and the name of your IPUMS extract files. You can also specify whether to select whole households of linked persons or to reformat the data from a wide format 9one person per record) to a long format (one person-year per record).