Family Interrelationships

Go back to the IPUMS User's Guide

IPUMS-Constructed Family Interrelationship Variables

In 2017 IPUMS updated the family interrelationship variables in order to increase comparability across different data projects and over time, to include same-sex couples and other diverse family types, and to clearly communicate how links were identified so that researchers may choose to include or exclude certain types of links. These new versions of the family interrelationship variables are available for samples from 1970 on and are described in detail here. For samples prior to 1970, the original family interrelationship variables are still used. Users should note that the different versions of family interrelationship variables are constructed using different rules and that this information is captured in two different variables. Variables indicating how links were formed for samples prior to 1970 have a _HIST suffix. Pre-2017 versions of the IPUMS family interrelationship variables are available here.

The IPUMS contains a consistent, versatile, and reliable set of constructed variables that describe a variety of family interrelationships among individuals within the same household. Researchers can use them to easily link characteristics of one family member to another - spouses to spouses, children to either or both parents, and so on - thereby speeding up analyses of family structures and characteristics.

Basic Family Interrelationship Variables: SPLOC, MOMLOC, and POPLOC

Each of the samples for the period since 1880 contains a variable indicating the relationship of each household member to the head of household as it was listed on the census form. The general codes for the IPUMS version of this variable, called RELATE (Relationship to household head/householder), are reasonably compatible across all census, ACS, and PRCS years for which this variable is available (see the variable description, RELATE, for more information on comparability across samples). While RELATE provides basic family relationship information, it can not identify all family relationships and is therefore often inconvenient as a tool for constructing new family variables.

Consider the household in Table 1. RELATE sufficiently establishes that the two daughters are both children of the household head/householder, but to identify the other family interrelationships we must look to the daughters' other characteristics. We can infer that the son-in-law is married to the second daughter rather than the first one because they share the same surname and are both listed as married. For analogous reasons, we know that the grandchild is probably the child of the second daughter listed. It is also safe to assume that the two boarders are married to one another because they are both married, they share the same surname, they are both adults close to the same age, and they are listed adjacently.

Table 1. Family Relationships to Household Head
Surname Relationship Age Sex Marital Status
Mulcahy Head 61 F W
Mulcahy Daughter 32 F S
Ryden Son-in-law 32 M M
Ryden Daughter 27 F M
Ryden Grandchild 4 M S
Salerno Boarder 26 M M
Salerno Boarder 22 F M

To allow users to identify relationships among spouses, parents, and children without forcing them to use multiple variables and complicated logic, the IPUMS includes a set of pointers called SPLOC, MOMLOC, and POPLOC. These pointers identify the location within the household of each individual's own spouse, mother, and father, respectively. Table 2 illustrates these variables. PERNUM (Person number in unit) is an IPUMS variable that indicates each individual's position within the household as listed on the original census form. SPLOC shows the PERNUM of each individual's own spouse. In Table 2, the son-in-law is married to the second daughter, and her PERNUM is 04. Therefore, the son-in-law's SPLOC is 04-the same as his wife's PERNUM. MOMLOC and POPLOC show the PERNUMs of own mothers and own fathers; for example, the mother and father of the grandchild are in positions 04 (MOMLOC) and 03 (POPLOC). Of course, many persons do not have a spouse, mother, or father living in the household with them; these cases are assigned a code of "00" for the appropriate variable(s).

Table 2. Family Relationships with SPLOC, MOMLOC, and POPLOC
Mulcahy Head 01 00 00 00
Mulcahy Daughter 02 00 01 00
Ryden Son-in-law 03 04 00 00
Ryden Daughter 04 03 01 00
Ryden Grandchild 05 00 04 03
Salerno Boarder 06 07 00 00
Salerno Boarder 07 06 00 00

SPLOC, MOMLOC, and POPLOC can be used to identify conjugal units, to attach characteristics of spouses or parents, to develop specialized own-child measures, or to serve as building blocks for more elaborate measures of family composition. In most cases, users will be able to manipulate these variables to construct their own measures within a statistical package and will not be forced to resort to higher-level programming.1

Most scholarly family classification schemes are built up from information on the presence of immediate kin. The basic Census Bureau classifications focus on the presence of spouses and children of the household head/householder; the Laslett scheme widely used by historians is based on a count of "conjugal family units" consisting of parents and children or married couples.2 SPLOC, MOMLOC, and POPLOC make it relatively simple to construct such classifications.

Family historians are increasingly moving from household-level schemes of family classification toward individual-level measures of family structure. For example, instead of measuring the proportion of households headed by a single female parent, we might assess the proportion of women who were single parents or the proportion of children residing with mothers only. Such individual-level analyses offer a variety of advantages that have been detailed elsewhere.3 The individual-level IPUMS pointer variables are especially suited for creating these kinds of measures.

Additional Constructed Family Variables

In addition to SPLOC, MOMLOC, and POPLOC, the IPUMS provides a variety of other constructed variables to aid researchers in creating new family variables. These are described in Table 3. Four of the constructed variables apply to entire households. NFAMS counts the number of families present in the household. For this purpose a family is defined as any group of persons with identifiable relationships by blood, marriage, or adoption. A single individual residing without any relatives is considered a separate family. Thus, a household consisting of an elderly widow residing with a servant would count as two families, and a large extended family with multiple generations but no boarders, lodgers, or servants would count as a single family. NCOUPLES (Number of married couples), NMOTHERS (Number of mothers), and NFATHERS (Number of fathers) are based on counts of non-zero values in SPLOC, MOMLOC, and POPLOC, respectively.

Table 3. IPUMS Family Interrelationship Variables
Household Record
NFAMS: Number of families in household
NCOUPLES: Number of married couples present in household
NMOTHERS: Number of women with own child present in household
NFATHERS: Number of men with own child present in household
Person Record
PERNUM: Sequence number of person within household
RELATE: Relationship of person to household head/householder
FAMSIZE: Number of household members related to person
FAMUNIT: Family unit membership
SPLOC: Location of own spouse within household
MOMLOC: Location of own mother within household
POPLOC: Location of own father within household
NCHILD: Number of own children in household
NCHLT5: Number of own children under age five in household
ELDCH: Age of eldest own child in household
YNGCH: Age of youngest own child in household
NSIBS: Number of own siblings in household

The additional individual-level constructed variables on family and household relationships listed in Table 3 are fully illustrated in Table 4. FAMSIZE (Number of own family members) and FAMUNIT (Family unit membership) use the same definition of family employed for NFAMS. FAMSIZE is useful for creating a variety of family measures. For example, to determine if a family contains extended kin beyond spouse and children, one can subtract the size of the immediate family (spouse and children) from FAMSIZE; if the result is greater than one, there are other kin present. More complex measures of extended family configurations can be constructed using FAMUNIT, which in combination with SERIAL provides a unique identifier for each related group in the census.

Table 4. Family Relationships with Additional Constructed Family Variables
Mulcahy Head 05 1 2 0 32 27 0
Mulcahy Daughter 05 1 0 0 99 99 1
Ryden Son-in-law 05 1 1 1 04 04 0
Ryden Daughter 05 1 01 1 04 04 1
Ryden Grandchild 05 1 0 0 99 99 0
Salerno Boarder 02 2 0 0 99 99 0
Salerno Boarder 02 2 0 0 99 99 0

The IPUMS also includes the four most commonly requested measures of own children - NCHILD (Number of own children), ELDCH (Age of eldest own child), YNGCH (Age of youngest own child), and NCHLT5 (Number of own children under age 5), derived from MOMLOC, POPLOC, and AGE. Finally, there is NSIBS (Number of own siblings), which counts the number of persons within the household who share a common parent or who have family relationship codes that imply a sibling relationship.

Creation of MOMLOC and POPLOC

Assigning links between parents and their children is usually straightforward. In about 97 percent of cases the census information on family relationships is sufficient to establish parent-child links. For example, if an individual is listed as a child of the household head/householder, his or her parents should always be listed as the household head/householder or wife of head; there is little ambiguity because each household has one head and no more than one wife. Similarly, the parents of persons listed as the household head/householder or a sibling of the head are always listed as mother or father of the head, and each household contains no more than one person listed as mother and no more than one listed as father.4 Parentage is almost as clear-cut for persons listed as wife or sibling-in-law, since households ordinarily do not include multiple mothers-in-law or fathers-in-law.

For persons who have family relationships other than head, wife, child, sibling, or sibling-in-law, the relationship information does not identify parental relationships with as much precision. For example, we know that a person listed as a grandchild of the head is the child of one of the household head's children and/or children-in-law. However, if the family contains more than one person listed as child or child-in-law, we can not always be sure which one(s) are the grandchild's parent(s). Even if there is only one child present, there is still room for error, since a grandchild could be the offspring of persons not living in the household. In some cases - such as secondary families consisting of boarders - the relationship codes may provide no information for linking parents and children.

Whenever the family relationship codes are unclear, we must turn to other information to identify parent-child relationships. Every census from 1880 to 2000 and the 2000-onward ACS and PRCS samples contain three additional pieces of information that can be used to clarify ambiguities: age, marital status, and the order in which individuals are listed on the census form.5 For example, if a household contains a widowed daughter followed immediately by a grandchild who is twenty years younger than the daughter, we may reasonably infer a mother-child relationship even if other daughters are present. Each census year also includes other information that can be used to distinguish parental relationships, but the availability of this information is irregular. For example, in the census years 1880, 1910, 1920, 1940, and 1950, we can identify persons who share the same surname (see SURSIM). For 1960, 1970, 1980, and 1990, on the other hand, we can identify the number of children ever born to every adult woman (see CHBORN). Neither SURSIM nor CHBORN are available for the 2000-2010 censuses or the 2000-onward ACS and PRCS samples.

Our procedure for linking parents and children attempted to reconcile two competing goals. The first goal was to create fully compatible links by using only information available across all census and ACS and PRCS years. The second was to identify, as accurately as possible, as many of the total links as we could for each census and ACS and PRCS year. To accomplish the latter, we had to use all information available for any given year. The IPUMS programming rules for MOMLOC and POPLOC therefore represent a compromise between the conflicting goals of compatibility and completeness.

The linking rules are described in detail in the Appendix to this chapter. Compatibility is our first priority, so we begin by establishing all parental relationships that could be plausibly identified using only relationship, age, sex, marital status, and sequence in the household listing-that is, information available in all of the component samples. The first three rules (see the Appendix to this chapter) reflect this part of the process. The next four rules use information available only in some census years to comb out the few parent-child links not identified by the first three rules. We developed these rules through trial-and-error experimentation, continuously checking the results of the programming with our own interpretations of a collection of the most problematic household records selected from several census years, and then fine-tuning the rules until we were satisfied. The variables MOMRULE_HIST and POPRULE_HIST identify which particular logical rule was used to establish a parental relationship in any given case. For analyses comparing multiple census years, users can ensure full compatibility by using only those links that were established under rules 1 through 3.

In practice, the additional information available in particular census years does not make a great deal of difference. For the censuses of 1880 through 1960, 99.5 percent or more of parental links were established by means of the first three logical rules.6 In recent sample years, the percentage of cases requiring additional information has risen because marital status has become less of a determinant of parenthood - by 1990, "only" 97.9 percent of maternal links were established by means of the first three rules.

Identifying Stepparents

The logical rules used to create MOMLOC and POPLOC links parents to adopted children and stepchildren as well as biological children. This may be appropriate for the study of topics such as the family economy, but for some topics - such as fertility analysis - adopted children and stepchildren should be eliminated whenever possible. Researchers who wish to limit their analysis to biological children can use the variables STEPMOM and STEPPOP to identify probable biological relationships. The values for these variables are defined in Table 5.

Table 5. STEPMOM/STEPOP Values
Code Explanation
0 Probable biological parent.
1 Age difference between parent and child improbable (outside the range 15-49 for mothers and 15-64 for fathers).
2 The link was only established because the parent was married to another parent.a
3 Detailed relationship codes explicitly specify a step-parent relationship - information not available in 1850-1870, 1960, 1970, 1980 or in the ACS and PRCS samples.
4 Mother has zero children surviving; only available in 1900 and 1910.
5 Detailed relationship codes specify that the child is adopted - information available only for 1880-1930 and the 2000 census.
6 Child was born before marriage of parent and there is a mismatch between parental birthplace on child's record and birthplace of parent. Available only for 1900 and 1910.
7 For STEPMOM: Number of children present exceeds number ever born (or, for 1900 and 1910, number surviving), and child was born before marriage of mother (current marriage in 1900, 1910, and 1950; first marriage in other years) - information not available for 1850-1880, 1920-1950, 2000, the ACS, the PRCS, or the 1970 Form 2 samples.
7 For STEPPOP: Surname differs and child is a male or never-married female (female under age 15, 1850-1870) - information not available for the 1960-2000 censuses and the ACS and the PRCS samples.

aSee Appendix. The frequency of value 2 for STEPMOM is lower than the frequency of value 7 for MOMRULE_HIST only because most mothers assigned under rule 7 have an improbable age difference and are therefore assigned a STEPMOM value of 1.

Where more than one value for STEPMOM or STEPPOP was valid, the lower value was assigned. To analyze biological children one can eliminate links with a value of greater than zero on STEPMOM or STEPPOP. When comparing successive census years, one should use only values 1 and 2 of STEPMOM and STEPPOP, since they are the only ones consistently available.

With the exception of the 1900 and 1910 census years, 2 percent or less of children can be identified as stepchildren or adopted children. The frequency of identifiable stepchildren is somewhat higher in 1900 and 1910, which is not surprising since those census years provide more relevant information than any others. In particular, they are the only years that indicate the number of surviving children for each woman.

The true percentage of stepchildren and adopted children is no doubt higher in all census years than STEPMOM and STEPPOP indicate. Because we cannot identify all biological children, own-child fertility estimates derived from the census will be slightly biased. In particular, we would expect that estimates of mothers' ages at childbirth may be a bit low, because second and third wives are on average younger than first wives.

Creation of SPLOC

Spousal links are much easier to establish than are parental links, except for the 2010 census when marital status was not asked. Most households have only one married couple; when more than one married couple is present, proximity is a reliable indicator of who goes with whom. In all census years, about 99 percent of married couples are listed adjacently on the census forms, and the few exceptions can almost all be resolved through relationship codes.

The spouse links were carried out by means of seven IPUMS programming rules described in the Appendix to this chapter. These rules use only information that is available in all census years and are therefore fully compatible. Even though the spousal rules ignore much relevant information available in particular census years - such as surname, marriage duration, and age at first marriage - we nevertheless consider them to be much more reliable than the rules governing parental links.

Comparing IPUMS linking procedures with those developed for older census samples

The only previous large census dataset to incorporate family relationship variables similar to MOMLOC and SPLOC was the original 1910 Public Use Microdata Sample and Black oversample (Strong et al. 1989). These datasets are now two of the seven component samples that comprise the IPUMS "1.4% 1910 sample with oversamples." They can be identified using the SAMP1910 variable: cases with a SAMP1910 value of 5 come from the original 1-in-250 national sample developed prior to the IPUMS; cases with a SAMP1910 value of 6 come from the Black oversample that was released along with the 1-in-250 dataset.

These samples were developed before the IPUMS existed, and they used different linking strategies than are employed in the IPUMS. We have retained the samples' original linking variables, and we have also created IPUMS-style linking variables for these samples. Users interested in the creation of family inter-relationship variables may want to compare the IPUMS variables with those developed for these earlier 1910 samples.

The original 1910 linking variables are contained in the IPUMS as MOMNO (Mother number) and WIFENO (Wife number). The creators of the early 1910 sample adopted a much more elaborate procedure for creating links between mothers and children and between husbands and wives. In the most obvious cases - those with completely unambiguous relationship codes and no other hint of ambiguity - the 1910 system relied on logical rules to assign linkages. In all other cases, however, the 1910 system turned to a complicated point system based on probabilities. Each characteristic that could be used to identify potential mother-child or husband-wife links - such as similar surnames, relationship codes, age differences, and so on - was assigned a point value based on its power to predict "correct" links in a small hand-linked subset of the data. The sum of these points was then calculated for all potential links in the sample - the IPUMS preserves these sums in MOMLNKWT (Weight of mother link) and WIFLNKWT (Weight of wife link). If the sum exceeded a prespecified minimum, the link was accepted, and if it fell below a prespecified minimum, the link was rejected. When the sum of weights fell in the gray zone links were carried out by hand, by reexamining the case on the original microfilm. The final rationale for each link is recorded in the MOMLKREA and WIFLKREA variables.

We experimented extensively with similar probability-based point systems for assigning links, but we found them unsatisfactory. The importance of any particular characteristic depends on its context. For example, surnames assume great significance when the relationship codes are ambiguous, but they should otherwise be ignored. A simple additive point system proved incapable of such distinctions.

The 1910 procedure ran into similar difficulties. Despite the complexity of the probability-based linking system, it was sufficient to identify only the most straightforward links. More than 20 percent of individuals in the sample - some 75,000 - fell into the gray zone and had to be reexamined by hand. If we had adopted a similar procedure for the IPUMS, it would have meant looking up about 10 million cases individually, which would have multiplied the cost of the IPUMS manyfold.

The logical rules described in the Appendix to this chapter produce results very similar to the 1910 procedure at a fraction of the cost. Excluding stepchildren, the maternal links obtained through each method differed in 0.66 percent of cases. When the two methods differed, we examined each case and found that in many cases the 1910 links were clearly correct. In most cases, however, the census listings are truly ambiguous, and the links are a matter of guesswork. The spousal links are more clear-cut: the IPUMS and the 1910 procedures produce identical results in over 99.9 percent of cases, even though the IPUMS method ignores all variables that are not available for the entire period from 1880 to 1990.

Imputing Relationships and Interrelationships for 1850, 1860, and 1870: IMPREL, IMPMOM, IMPPOP, and IMPSP

The 1850-1870 censuses did not record relationship to head of household or marital status - the two most important variables for identifying parental and spousal relationships - so the variables RELATE, MOMLOC, POPLOC, and SPLOC are not available for 1850-1870. However, the 1850-1870 censuses do contain sufficient information to reliably impute most family relationships, and we have designed variables for the IPUMS that take advantage of this fortunate fact. The resulting variables are comparable to the key family relationship and interrelationship variables for other years, allowing researchers to incorporate 1850-1870 into analyses of change over time.

The 1850-1870 census instructions to marshals specified that within each household, "the names are to be written beginning with the father and mother; or, if either, or both, be dead, begin with some other ostensible head of the family; to be followed, as far as practicable, with the name of the oldest child residing at home, then the next oldest, and so on to the youngest, then the other inmates, lodgers and boarders, laborers, domestics, and servants." In addition to this sequential information, the 1850-1870 censuses provides other valuable clues to family relationship: surname, age, sex, occupation, and birthplace. These form the bases for the IPUMS variable IMPREL (Imputed relationship), which in turn is used to create IMPMOM (Imputed location of mother), IMPPOP (Imputed location of father), and IMPSP (Imputed location of spouse).

Imputed relationship (IMPREL): Most of the time, 1850-1870 relationships could be inferred using a rather simple set of logical rules. However, about a quarter of the cases were too ambiguous to determine in this way. For these, we designed a probabilistic "hot deck" imputation procedure similar to the procedures that the Census Bureau uses to allocate missing and inconsistent information.

Logical rules: Seventy-five percent of cases were assigned by the following logical rules:

  1. Spouses: The first individual listed for each household was always assigned as household head. If the first person was male, the second was female, both were age 21+, both had the same surname, and the woman was no more than 15 years younger or 20 years older than the head, she was assigned the relationship of wife. (In 1880, when relationship was listed, over 99 percent of such women were, in fact, listed as wives of the head.)
  2. Children: A person listed immediately after the head and wife was assigned as a child of the head if s/he had the same surname and was 16 to 50 years younger than the head and 15 to 45 years younger than the wife. Individuals following the first child were assigned as additional children if they met the same conditions and were listed in descending age order, with no more than 15 years of age separating adjacent children. (In 1880, over 99 percent of persons meeting these criteria were, in fact, listed as children of the head.)

Imputation of remaining cases: Twenty-five percent of the cases could not be assigned according to the above logical rules. These cases were imputed via a hot deck procedure. We identified nineteen key individual characteristics available in the 1850-1870 samples that were strong predictors of family relationship in 1880:

  1. Surname compared to household head (same or different?)
  2. Surname compared to immediately preceding individual
  3. Sequence of listing within household
  4. Age
  5. Difference between age and the age of the head or his wife
  6. Difference between age and the age of the preceding individual
  7. Presence and location of probable own spouse (defined as an adjacent person of the opposite sex age 16+ with the same surname and an appropriate age)
  8. Number of probable siblings (defined as same-surname persons no more than 15 years younger or older who were not probable spouses)
  9. Number of probable parents (defined as same-surname persons more than 15 years older)
  10. Number of probable children (defined as same-surname persons more than 15 years younger)
  11. Imputed relationship to head of immediately preceding person
  12. Occupation
  13. Head's occupation
  14. Migration status (non-migrant in household with a non-migrant head; migrant in household with a non-migrant head; non-migrant in household with a migrant head; migrant in household with a head who migrated from the same place; or migrant in household with a head who migrated from a different place)
  15. Household type (heads are either married men, single men, or single women)
  16. Urban/rural status
  17. Farm status
  18. State of residence
  19. Region of residence

The IPUMS program classifies each individual case from 1850-1870 according to these nineteen characteristics. It then searches the 1880 sample to find the most geographically proximate individual who shared all nineteen characteristics, and assigns that 1880 person's RELATE (Relationship to head) code to the 1850-1870 case for IMPREL. In some cases, no perfect match can be found. For these cases, each of the nineteen characteristics is weighted according to its predictive power, based on a regression analysis of the 1880 sample. The program searches the 1880 sample for the best match, as determined by the sum of the weights.

Note that we did not include race as a predictor, as is customary in such allocation procedures. In the context of the other nineteen variables, race was an insignificant predictor of relationship to head. Moreover, the 1880 black population, which contained many freed slaves, probably differed significantly from the free black population of the 1850 and 1860 samples. (Slaves are not included in the 1850 sample because that census did not collect enough information about them). This would make the 1880 variable RACE an unreliable predictor for 1850 and 1860.

We tested the imputation procedure by applying it to the 1880 sample, matching each person to another 1880 person. (We instructed the program not to match a person to him/herself or to any other person in the same household.) We also imputed relationships for the 1910 sample, matching persons to 1880 donors in order to see if thirty years of change in household composition would introduce unacceptable biases. Both tests yielded satisfactory results. For 1880, the imputed relationship matched the relationship listed on the census form (and included in RELATE) 95 percent of the time. For heads, wives, and their children, this figure rises to nearly 99 percent. For 1910, the results were virtually the same. Furthermore, the method is unbiased; it yields the correct distribution of family relationships for both samples. Given that the 1910 imputed relationships are just as unbiased and reliable as the 1880 ones, the imputed relationships for 1850-1870 should also be unbiased and reliable.

Nevertheless, as with any imputed variable, users should exercise reasonable caution. In particular, they should note that any differentials in household structure between population subgroups (e.g., by class, race, or ethnicity) are likely to be slightly understated due to random error in the imputation.

Imputed family interrelationships (IMPMOM, IMPPOP, and IMPSP): Just as the IPUMS uses RELATE (Relationship to head) to create MOMLOC, POPLOC, and SPLOC for 1880 through 1990, it uses IMPREL to create IMPMOM (Imputed location of mother), IMPPOP (Imputed location of father), and IMPSP (Imputed location of spouse) for 1850-1870. The rules are described in the Appendix to this chapter.

The 1850-1870 samples also contain imputed versions of the above-mentioned constructed variables NFAMS, NCOUPLES, NMOTHERS, NFATHERS, FAMUNIT, NCHILD, ELDCH, YNGCH, NCHLT5, and NSIBS. These are all located in the same IPUMS columns as their respective latter-year counterparts.



In the great majority of cases, parents and children can be unambiguously linked via a simple set of rules applied to five pieces of information available in every census year from 1880 through 1990: relationship to household head, age, sex, marital status, and sequence of listing on the census form. For 1850-1870, if imputed relationship (described above) is substituted for relationship to household head, the same rules can be applied.

In a few instances, however, it is necessary to use additional information available only for a subset of census years. The IPUMS linking procedure is designed to allow users to use only links based on information available across all census years, or to use extra information available in a particular census year to make the additional links.

Parental Links

Parental links were established through seven basic rules. If a link could be established through more than one rule, the lower-numbered rule was used, as the lower-numbered rules are generally the least ambiguous. The rule used in any particular case is identified in the variables MOMRULE_HIST and POPRULE_HIST.

Rule 1. Unambiguous relationships

  1. If the relationship7 of an individual to the household head/householder is son, daughter, or child then establish parental links to persons listed as head or wife, or if the relationship of an individual to the household head/householder is head, brother, or sister, then establish parental links to persons listed as mother or father, or
  2. if the relationship of an individual to the household head/householder is wife, brother-in-law, or sister-in-law, then establish parental links to persons listed as mother-in-law or father-in-law.

Rule 2. Grandchildren
If the relationship of individual to household head/householder is listed as grandson, granddaughter, or grandchild, then establish parental link to the most proximate listed ever-married child and/or child-in-law with a plausible age difference. Plausible age differences are defined as 12 to 54 years for women, and 15 to 74 years for men. If there is more than one eligible parent, choose the most proximate.

Rule 3. All other relatives and nonrelatives via household position
Link relatives and nonrelatives not mentioned above to any preceding ever-married person with a plausible age difference as defined in rule 2, as long as there are no persons listed between them besides children or spouses of the potential parent. Links between relatives and nonrelatives are prohibited.

Rule 4. All other relatives and nonrelatives via surname
Same as Rule 3, except that surname similarity is substituted for the requirement that there are no persons listed between the parent and child. If more than one eligible parent is found, the most proximate is linked. This rule can only be applied in years with surname codes: 1850, 1860, 1870, 1880, 1910, 1920, 1940, and 1950.

Rule 5. Grandchildren via children-ever-born or surviving
Same as Rule 2, except evidence on children-ever-born (or children surviving in 1900 and 1910) is substituted for marital status of parent. This rule can not be used for 1850, 1860, 1870, 1880, 1920, 1940, or 1950.

Rule 6. All other relatives and nonrelatives via children-ever-born or surviving
Same as Rule 3, except evidence on children-ever-born (or children surviving in 1900 and 1910) is substituted for marital status of parent. Again, this rule can not be used for 1850-1880, 1920, 1940, or 1950.

Rule 7. Spouse of linked parent
If one parent is linked and has a spouse present, that spouse is linked as a stepparent. Users who want to limit their analysis to links that could be recognized in all census years can simply ignore links based on Rules 4 through 7. In each year, over 95 percent of links were established on the basis of Rule 1. For the period before 1980, over 99 percent of cases were linked on the basis of Rules 1, 2, or 3, which are fully compatible across census years. With the increase of births to never-married women in recent census years, however, Rules 5 and 6 have become increasingly important, since they substitute information on children-ever-born for information on marital status. We performed two basic checks for inconsistency of the family links. First, if two parents were linked but they were not married to each other, we unlinked the father. Second, if both partners in a married couple were linked to the same parent, we chose the best parental link based on detailed relationship code, surname, and proximity within the household.

Spousal Links

Spousal links (SPLOC/IMPSP) were made based on the following seven rules, identified in the variable SPRULE_HIST:

Rule 1. Link married women to previous adjacent married men with an appropriate relationship.8 Appropriate relationships are defined as follows:

Man's relationship to head Woman's relationship to head
Head of household Wife
Son Daughter-in-law
Father Mother
Father-in-law Mother-in-law
Brother Sister-in-law
Brother-in-law Sister

Rule 2. Link married women to following adjacent married men with an appropriate relationship as defined in Rule 1.

Rule 3. Link married women to nonadjacent married men with an appropriate relationship as defined in Rule 1, provided the woman was at least 13, the man was at least 15, the man was not more than 25 years older than the woman, and the woman was not more than 10 years older than the man.

Rule 4. Link married women with a relationship not specified in Rule 1 to previous adjacent married men with appropriate ages as defined in Rule 3, provided the resulting link does not link a non-relative to a relative.

Rule 5. Same as Rule 4, but link married women to subsequent adjacent married men.

Rule 6. Same as Rule 4, but link married women to non-adjacent married men.

Rule 7. This rule only applies to pairings in which at least one person has an allocated marital status (MARST). As in rule 4, their ages fit parts b, c, and d of rule 3, and the link does not link a non-relative to a relative. However, one of them has an allocated marital status that is not 'married.' In this case, the spouse link is made and the allocated marital status is set to 1 ('married').


  1. For example, users frequently wish to attach the characteristics of immediate family members. The following SPSS-X command file uses SPLOC to attach the spouse's occupation to the record of each married person. SERIAL is an IPUMS variable, Household serial number, which is a unique identifier for each household. First we obtain an active file with SERIAL, OCC (Occupation), and SPLOC. SPLOC is renamed PERNUM (Person number in unit), and we rename OCC as "SPOCC," Spouse's occupation--the variable we have constructed for our own use. We sort the file by SERIAL and PERNUM, and then match it to the original file. Because the PERNUM we are matching was originally SPLOC, we are actually matching spousal occupations.

    If you are working with a file called "test.sav" in the main directory of the C drive, for instance, the following set of commands works in SPSS format:

    GET FILE='C:\test.sav'.
    SAVE OUTFILE='C:\test.sav'.
    GET FILE='C:\test.sav'
    SAVE OUTFILE='C:\temp.sav'.
    MATCH FILES TABLE='C:\temp.sav'
    SAVE OUTFILE='C:\newtest.sav'.

    The same set of commands in Stata format would be:

    cd C:\
    use test.dta
    sort year serial pernum
    save test.dta, replace
    use test.dta
    keep year serial sploc occ
    keep if sploc>0
    rename sploc pernum
    rename occ spocc
    sort year serial pernum
    save temp.dta
    use test.dta
    merge year serial pernum using temp.dta
    save newtest.dta

    The same set of commands in SAS format would be:

    PROC SORT DATA= test ;
    BY year serial pernum ;
    RUN ;
    PROC SQL ;
    AS SELECT year, serial, sploc AS pernum, occ as spocc
    FROM test
    WHERE sploc > 0
    ORDER BY year, serial, pernum ;
    CREATE TABLE merged
    AS SELECT a.*, b.spocc label="Spouse's occupation"
    FROM test AS a LEFT JOIN temp AS b
    ON (a.year=b.year AND a.serial=b.serial AND a.pernum=b.pernum) ;
    QUIT ;

    The same set of commands in R format would be:

    test <- data %>%
      filter(SPLOC > 0) %>%
      select(YEAR, SERIAL, PERNUM, OCC)
    temp <- data %>%
      filter(SPLOC > 0) %>%
      select(YEAR, SERIAL, SPLOC, OCC) %>%
      rename(PERNUM = SPLOC) %>%
      rename(SPOCC = OCC) %>%
    newtest <- merge(test, temp, by = c("YEAR", "SERIAL", "PERNUM"))

    It is almost as easy to use MOMLOC and POPLOC to attach characteristics of own children. The following SPSS command file uses similar logic together with the "AGGREGATE" command to count the number of own children under the age of 10 for each woman:

    GET FILE='C:\test.sav'.
    SAVE OUTFILE='C:\test.sav'.
    GET FILE='C:\test.sav'
    SELECT IF (AGE<10 & PERNUM>0).
    	/NCHLT10 = NU(YEAR).
    MATCH FILES TABLE='C:\temp.sav'
    SAVE OUTFILE='C:\newtest.sav' .

    The same set of commands in Stata format would be:

    cd C:\
    use test.dta
    keep year serial momloc age
    rename momloc pernum
    keep if age<10 & pernum>0
    sort year serial pernum
    collapse (count) age, by (year serial pernum)
    rename age nchlt10
    sort year serial pernum
    save temp.dta
    use test.dta
    sort year serial pernum
    merge year serial pernum using temp.dta
    save newtest.dta

    The same set of commands in SAS format would be:

    PROC SORT DATA=test ;
    BY year serial pernum ;
    RUN ;
    PROC SQL ;
    AS SELECT DISTINCT year, serial, momloc AS pernum, COUNT(*) AS nchlt10
    LABEL="Number of children under 10"
    FROM test
    WHERE age < 10
    GROUP BY year, serial, momloc
    ORDER BY year, serial, momloc ;
    CREATE TABLE merged AS
    SELECT a.nchlt10, b.* FROM
    temp AS a RIGHT JOIN test AS b
    ON (a.year=b.year AND a.serial=b.serial AND a.pernum=b.pernum) ;
    QUIT ;

    The same set of commands in R format would be:

    test <- data %>%
      filter(MOMLOC > 0) %>%
    temp <- data %>%
      filter(SPLOC > 0) %>%
      select(YEAR, SERIAL, MOMLOC, AGE) %>%
      filter(AGE < 10 & MOMLOC > 0) %>%
      rename(PERNUM = MOMLOC) %>%
      group_by(YEAR, SERIAL, PERNUM) %>%
      summarise(NCHLT10 = n())
    newtest <- merge(test, temp, by = c("YEAR", "SERIAL", "PERNUM"))
  2. See Peter Laslett, "Introduction," in Laslett and R. Wall, eds., Household and Family in Past Time, 1972.
  3. See Miriam King and Samuel Preston, "Who Lives with Whom?: Individual Versus Household Measures," Journal of Family History, 15 (1990): 117-132; Steven Ruggles, Prolonged Connections: The Rise of the Extended Family in Nineteenth-Century England and America, 1987; Ruggles, "The Transformation of American Family Structure," AmericanHistorical Review, 99 (1994): 103-128; and Ruggles, "The Origins of African-American Family Structure," American Sociological Review, 59 (1994): 136-151.
  4. Households in the IPUMS files occasionally include more than one head, wife, mother of head, or father of head, usually because of enumerator or data-entry error. Such cases are corrected in the IPUMS version of the data by means of a consistency checking program prior to assigning parentage. We encountered true polygamous marriages very rarely; where these could be identified, we assigned the wives a detailed relationship code identifying them as wives of a polygamous husband.
  5. Before 1960, census enumeration instructions specified the sequence in which various relatives and nonrelatives should be listed on the census form. Since the introduction of self-enumeration in 1960, the instructions have become less explicit; nevertheless, respondents still have a strong tendency to list children immediately following their parents and to list married couples adjacently.
  6. Even a small percentage of missed parental links can have significant consequences for some kinds of analyses, such as the study of young children residing without parents.
  7. For 1850, Imputed relationship IMPREL is used as a substitute for Relationship RELATE.
  8. Again, for 1850-1870, Imputed Relationship (IMPREL) is used as a substitute for Relationship (RELATE).

Go back to the IPUMS User's Guide

Back to Top