1910 Hispanic Oversample: Introduction to the Data Dictionary

The data dictionary describes each variable in the Hispanic Oversample. The first section presents the household variables; the second describes the person variables. The variables are grouped thematically. For each variable, the dictionary provides a universe statement and a variable description. In certain cases, additional user notes caution researchers against potential problems that some uses of the variable may entail.


For consistency, this user's guide retains the variable availability boxes found in the IPUMS-98 User's Guide. However, only variables included in the Hispanic Oversample are presented in this data dictionary. Most of the variables presented here are available for both the 1910 PUMS and the Hispanic Oversample. There are three types of exceptions:


The universe statement defines the population at risk for the given variable for each census year (i.e., who was asked the question, or to whom the question applies).

Codes and Frequencies

For most variables, a frequency table gives the value label for each code. The column numbers in the tables show the number of cases (frequencies) in each category for that sample. A blank indicates that the category is not available for that sample. Since the original PUMS and the Hispanic Oversample preserved virtually all available detail, blanks usually mean the response did not occur in the population. For example, the Hispanic Oversample has frequencies for only six categories of state of residence because only six states were included in the oversample "population."

Some IPUMS variables have both general and detailed codes. These are variables for which some PUMS provided greater detail than others, but where this extra detail could be subsumed under a common set of more general categories. Detailed codes are usually presented in the documentation with a gap separating the general and detail components. This gap is only there to improve readability - no blank spaces exist in the dataset. To save space, the documentation provides frequencies only for the detailed version of some variables.

The frequency counts in the data dictionaries are unweighted; therefore, they do not necessarily accurately reflect the distribution for the general population. By applying appropriate weights, all samples can be made representative of the general population.

Indentations in the value labels column of the data dictionary are meaningful. Any item indented beneath another is a subset of the larger category. Generally, if a subcategory is not available in a given year, the cases would have been coded into the larger category.

Frequently Used Categories:

N/A Not included in the universe (e.g., those under age 5 or older than 22 for "school attendance")
n.s. Not specified (e.g., "United States, n.s.")
n.e.c. Not elsewhere classified (e.g., "Other race, n.e.c.")

