MLP Crosswalk Data User Guide
The MLP crosswalk files are designed to be used in combination with data extracts created from the IPUMS dissemination system. The basic model we envision is to build extracts for two adjacent census years and merge those data onto the crosswalk to create one linked dataset.
Note that the IPUMS extract system can create linked census files, which will serve many users better than the crosswalks, which require more file manipulation. However, the "link census data" feature is not updated to work with the most recent version of the crosswalks that includes the 1950 census as well as 20 and 30-year links. Instead, you should make conventional extracts that include all cases in the relevant population from each census you wish to link.
In the IPUMS extract system
- On the "Select Samples" screen, deselect the default samples and choose an appropriate 100% data file on the "USA full count" tab. Make two extracts: one for the initial census year and one for the terminal census year.
- In selecting variables, be as parsimonious as possible. These can be very large files, posing potential problems for statistical software during merging. The preselected variables YEAR and HISTID must be included in your extract, as they constitute the unique identifier needed to merge the data onto the crosswalk file. Adding other linking variables besides HISTID should be avoided (e.g., LINK1930), as these have not yet been updated to accommodate MLP version 1.2.
- Use the "Select Cases" option with caution. You may otherwise lose cases that do not match the expected value in the adjoining census year, but which were linked by the algorithm based on a holistic assessment of personal and household characteristics. For example, we sometimes link individual cases in which age in the first census differs by more than the expected 10 or 20 years in the subsequent census. See the linking method for a description.
If you intend to use a pre-existing data extract, ensure that you have the most recent version of the data by referring to the IPUMS USA revision history. Prospective users who are not familiar with the IPUMS data extract system can refer to these written instructions or video tutorials.
On your desktop: General instructions
- Download your IPUMS extracts and the corresponding MLP crosswalk to the same directory on your local file system. Uncompress the files.
- Each record in the crosswalk file contains the HISTID values that identify a linked person in that pair of censuses. There is also a STEP (sometimes ROUND) variable that indicates the stage of the linking process in which the link was made.
- Rename the variables in each census extract file to incorporate the census year in the name. For example, in an extract of the 1900 census, rename AGE to AGE_1900, MARST to MARST_1900, etc. Do not rename the YEAR variable.
- Merge each census file onto the crosswalk using YEAR and the appropriate HISTID variable for that year (HISTID_1900, HISTID_1910, etc). You will probably have to sort files by YEAR and HISTID to perform the merge.
On your desktop: Stata users
- Stata users can use this .do file to perform the merge of IPUMS extracts. Customize the do file to provide the directory where the data are located and the name of your IPUMS extract files.