FAQ on the final release of the Linked Representative Samples

Does IPUMS ever remove links based on low household similarity between two census years?

No, we do not do this. Removing these links would also require using household information to identify them, which would be a violation of our decision not to use household information in the linking process.

Can researchers use the household links (i.e., not the primary links) for analysis?

You can use them, but they are not weighted, and there are selection bias issues with the household links. For example, you will only get a household link if the primary link co-resided with the same individuals in both census years.

Why does the final version of the data have more variables than the preliminary version did? Why do some cases lack data for the additional variables?

The original 1880 data from the North Atlantic Population Project (NAPP) did not include a number of variables from the 1880 census form, and thus these variables were not included in the preliminary release. For cases that were linked as a part of the preliminary Linked Representative Samples, however, we have entered the additional information and released it as part of the final version. These variables include LIT, SCHOOL, MAIMED, INSANE, IDIOTIC, DEAF, BLIND, SICKNESS, MOUNEMP, MARRINYR, and BIRTHMO. These variables are not available for cases that appear in the final version of the Linked Representative Samples but did not appear in the preliminary version.

How did you make weights for the pre-Civil War African American population?

The linkable population would be African Americans in the 1850 and 1860 censuses who were also alive and enumerated in 1880. However, the enumerated population in 1850 and 1860 were free blacks, and there is no way to identify the comparable population in 1880 (i.e., blacks in 1880 that were free in 1850 or 1860). Thus we assign the average weight for native born whites to comparable African American groups for the 1850 and 1860 linked samples.

Are the contents of the datasets consistent over time?

Alaska, Hawaii, and the American Indian schedules are linked in 1900, but not in 1910. The 1900 and 1910 IPUMS samples include records from Alaska, Hawaii, and American Indian population schedules. We included these records in the 1880-1900 linked samples, but not for the 1880-1910 linked samples (the 1880-1900 linked records from these samples can be identified with the SAMP1900 variable). We anticipate adding linked records for 1880-1910 for Alaska, Hawaii and the American Indian schedules in a subsequent release.

Both 1900 and 1910 include links to overseas military records. These can be identified with the variable STATEFIP.

Why are there no primary links for Chinese or American Indians?

We do not include any primary links for races other than black and white. Although there were a fair number of Chinese in the United States in 1880, the high level of name homogeneity (and assumed age imprecision) made linking this population impractical. Few American Indians were present in the 1880 census, which meant that we would have been able to make relatively few links to subsequent censuses.