MLP Linked Data User Guide
The MLP crosswalk files are designed to be used in combination with data extracts created from the IPUMS dissemination system. The basic model we envision is to build extracts for two adjacent census years and merge those data onto the crosswalk to create one linked dataset.
In the IPUMS extract system
- On the "Select Samples" screen, deselect the default samples and choose an appropriate 100% data file on the "USA full count" tab (1900, 1910, 1920, 1930 or 1940). Make two extracts: one for the initial census year and one for the terminal census year.
- In selecting variables, be as parsimonious as possible. These can be very large files, posing potential problems for statistical software during merging. The preselected variables YEAR and HISTID must be included in your extract, as they constitute the unique identifier needed to merge the data onto the crosswalk file.
- Use the "Select Cases" option with caution. You may otherwise lose cases that do not match the expected value in the adjoining census year, but which were linked by the algorithm based on a holistic assessment of personal and household characteristics. For example, we sometimes link individual cases in which age in the first census differs by more than the expected 10 years in the subsequent census. See the linking method for a description.
If you intend to use a pre-existing data extract, ensure that you have the most recent version of the data by referring to the IPUMS USA revision history. Prospective users who are not familiar with the IPUMS data extract system can refer to these written instructions or video tutorials.
On your desktop
- Download your IPUMS extracts and the corresponding MLP crosswalk to the same directory on your local file system. Uncompress the files.
- Rename the variables in each census extract file to incorporate the census year in the name. For example, in an extract of the 1900 census, rename AGE to AGE_1900, MARST to MARST_1900, etc. Do not rename the YEAR variable.
- Merge each census file onto the crosswalk using YEAR and the appropriate HISTID variable for that year (HISTID_1900, HISTID_1910, etc). You will probably have to sort all files by YEAR and HISTID to perform the merge.
- The crosswalk file includes some basic variables to enable you to reduce the size of the file prior to the merge, depending on your population of interest (but note the caveat above regarding case selection and inconsistent values between censuses).
- HISTID_1900/1910: Stable individual identifier linking the individual to public-use versions of census 1900/1910 (as described above).
- SERIAL_1900/1910: Household identifier linking the individual to their household in public-use versions of census 1900/1910.
- AGE_1900/1910: Age of the individual according to census 1900/1910.
- SEX_1900/1910: Sex of the individual according to census 1900/1910 (variable codes).
- BPLD_1900/1910: Detailed place of birth of the individual according to census 1900/1910 (variable codes – select "detailed codes" option).
- STATEFIP_1900/1910: State of residence of the individual according to census 1900/1910 (variable codes).
- Stata users can use this .do file to perform the merge. Customize the file to provide the directory where the data are located and the name of your IPUMS extracts.