## Demographic Variables

The microsimulation model captures a large array of basic demographic variables (See Appendix 1 for a detailed list.) The transition structure of these variables is calibrated from a variety of datasets. During the simulation process, they evolve jointly according to the estimated transition structure, interacting with each other in potentially complex ways to capture important demographic shifts in the US population. (Other demographic variables -- such as marriage -- are simulated in a similar way, but the transition structure is modeled using the estimates of structural models.) These demographic shifts are then used to inform the microsimulation model's forecasts of macroeconomic variables of interest. The key demographic variables include (on the individual level):

## Initial Distributions of Attributes

### Fertility

As noted earlier, fertility for women aged between 15 and 49 is conditioned on female ethnicity, education, and marital status. The age-specific rates are benchmarked to fertility rates published by the Social Security Administration. These rates exhibit two critical features: first, a sizable decline in fertility rates over the years since 1996 among very young women and, second, a postponement of fertility among more educated women to later ages. Both of these features are implemented in PWBMsim as demonstrated in Figure 1. Panels A and B of Figure 1 show SSA target fertility rates (probability of giving birth) among non-White and non-Asian females (Black/Hispanic/Other ethnicity) with at-most high-school education. The dotted line refers to year 1996 and the unbroken line to year 2015. The SSA target fertility rates are shown in Panel A and the PWBMsim outcomes are shown in Panel B. The decline in fertility among younger females is evident in both Panels.

Figure 1: U.S. Fertility Rates for 1996 (dotted line) and 2015 (unbroken line)

Panels C and D of Figure 1 show fertility rates among women with at least a college degree. As before, the dotted line refers to year 1996, the solid line to year 2015; SSA target fertility rates are shown in Panel C and the PWBMsim outcomes are shown in Panel D. The SSA rates show that educated women have postponed fertility to older ages. PWBMsim’s outcome fertility rates of Panel D show that this feature of the fertility experience among educated women is captured appropriately. (See footnote 2.)

### Disability

Disability prevalence rates are implemented by estimating frequencies from CPS questionnaire items on health status and on whether health impairments affect effectiveness in market work. The frequencies of disability impairments are calculated based on affirmative responses to the latter question and if individuals report having poor health. Figure 2 shows PWBMsim’s implementation (output) of disability status for non-immigrant white married males by 5 education groups and 14 age groups compared to the CPS 1996 micro-survey frequencies.

Figure 2: Disability Prevalence among Nonimmigrants, White, Single Males 1996.

### Education

PWBMsim calibrates education prevalence in 1996 according to prevalence rates of individuals with different education levels by immigrant status, ethnicity, and age. For females, the male spouse’s education is also a conditioning variable among married couples. It is noteworthy, as Panels A and B of Figures 3 show, that education rates are correlated across spouses among married couples. It reflects the significant degree of assortative mating among individuals in the United States. Figure 3 shows that females married to high school dropouts themselves have a high high-school dropout education level relative to females married to high-school graduates. Such assortative mating is also a feature of PWBM’s marriage modules (see below).

Figure 3 Marriage Prevalence with Both Spouses Being High-School Dropouts (Panel A) and between High School Dropouts and High School Educated Spouses (Panel B) Involving Single White Females in 1996 by Female Age

### Immigration

Immigration rates are calibrated according to data on prevalence rates of foreign born individuals in the Current Population Survey. Figures 4 show these prevalence rates by age in the CPS (Panel A) and the PWBMsim outcome (Panel B).

Figure 4: Shares of Male Immigrants in Total Population by Year and Age - PWBMsim 1996 and 2015

## Annual Attribute Transitions After 1996

### Linked Data Creation and Weighting

To identify individual-level transitions, we use the panel data structure of the CPS to link survey responses together across years using a Linking ID.

Unfortunately, attrition through nonresponse is a serious concern when linking CPS data. This occurs because while the Census Bureau tries to poll all respondents twice, they can't always succeed -- individuals might change residence, die, or refuse to fill out the survey the second time. This phenomenon means that our linking leaves out individuals from the original CPS dataset that only responded once -- and because some types of people are much more likely to fall out of the survey through attrition than others, our linked data is not a proportional representation of the overall population. In particular, a naive linking of survey respondents is biased toward indviduals of who are less likely to move, such as individuals who are married, have children or are Caucasian. To correct for this bias, we reweight linked individuals to match linked population distributions to the original processed distributions along attributes such as Age, Race, Legal Status, and Gender.

### Aging

The Aging transition is the simplest: People grow older by one year each year. Those who survive undergo additional attribute transitions as described in the sub-sections below.

### Fertility

PWBMsim’s fertility calibration is based on the Social Security Administration’s historical and projected fertility rates among women aged between 15 and 49. Figure 5 shows how fertility rates have changed over the decades since the 1990s and SSA projections for future years. It shows that fertility has declined among younger women but increased for older ones. The pattern suggests that many women have postponed fertility to older ages, a trend that is projected to continue under SSA’s fertility assumptions.

Figure 5: Historical and Projected Fertility Rates by Female Age for Selected years.

Whereas SSA provides historical and projected fertility rates by female age, PWBMsim demographers have further decomposed them by female ethnicity, education level, cur-rent marital status, and the number of prior births (top-coded at 2). The decomposition is based on data from the CPS June Surveys during 1996-2016. These data provide relative fertility rates by female ethnicity, education level, current marital status, and the number of prior births. CPS March Surveys over the same years are used to obtain the distribution of the female population over the fertile age range (15-49) by these categories. Finally, using March CPS relative population weights by age and education, the decomposed fertility rates are benchmarked to those of the Social Security trustees by female age. Figure 6 shows one example of the outcome of decomposing fertility by female marital status and educational attainment. At younger ages, fertility rates among high school dropouts are much higher compared to those with a college degree or more. Moreover, fertility rates declined rapidly between 1997 and 2015 among younger high school dropouts. Among those with a college-degree or more education, fertility rates have been shifting toward older ages among both married and single women.

PWBMsim incorporates these fertility differences and trends over time. Projected fertility rates assume the continuation of these trends at average historical rates, consistent with overall fertility rates projected by those published the Social Security Trustees.

Figure 6: Differences and Changes Over Time in Fertility Rates Across Women by Age, Marital Status, and Educational Attainment: 1997, 2006, and 2015.

### Education

PWBMsim implements a new method for modeling education acquisition rates, as individuals grow older. Initial (year 1996) years-of-completed-education are assigned to PWBMsim individuals based on micro-data from the Current Population Survey, 1996. The years-of-completed-education in year $t$, $e_t$, ranges between 0 and 18. The data exhibit considerable bunching at 12 (high school completion) and 16 (college completion). Assignments in PWBMsim’s initial population are conditioned on the rates of schooling completion conditional on the person’s age, $a_t$, current value of education, $e_t$, gender, $g_t$ and ethnicity, $r_t$.

Annual transition probabilities are computed based on American Community Survey (ACS) 2001-2015. The ACS survey provides schooling attainment and school attendance information are available for all samples aged three years old and over. (A variable EDUC in the ACS indicates respondents’ educational attainment, as measured by the highest year of school or degree completed. In addition, the variable SCHOOL indicates whether the respondent attended school during past 3 months. A table of conditional probabilities of advancing schooling attainment each year is constructed for subgroups by age, race, gender, and current schooling attainment. These transition rates are assumed to remain constant in years after 2015 for making future projections of educational attainments. See Appendix 3 for more details.)

Annual transitions of educational attainment (years of education completed) are based on estimates of probabilities of acquiring an additional year of education in year $t+1$ conditional on age, gender, ethnicity and education attained in the previous year. The conditional probabilities are estimated from the American Community Survey (see below). That is, if

$$p(e_{t+1} = 14 \mid e_t = 13, a_t = 18, r_t = \text{white}, g_t = \text{female}) = 0.85$$,

then, in $t+1$, the simulation randomly assigns $e_{t+1}$ to, on average, eighty percent of white females aged 18 with 13 years of education attained in year $t$.

Figure 7 shows the closeness of fit of the simulated population over years 1996-2016 by five education categories: Less than High School, High School, Some College, College Degree, and Advanced Degree.

Figure 7: Share of the U.S. population by education category, 1996-2020.

Figure 8 shows the closeness of fit between the U.S. actual and simulated populations by age group for those with a college degree.

Figure 8: Share of those with a College Degree level of education by age group, 1996-2020.

### Marriage and Divorce

It is important for PWBMsim to estimate and project family formation and dissolution outcomes that are consistent with trends observed in the past before attempting to project those trends into the future. PWBMsim implements a new, stochastic modeling method for generating marriage and divorce transitions. This approach is especially useful for capturing partner selection and separation patterns that differ by individual attributes -- ethnicity in particular. PWBMsim’s marriage formation process involves two steps: first, among participants in the 'marriage market' composed of single females and males, a female and a male pairing is selected based on the participants’ race affiliations (meeting step), and next, whether the pairing results in a marriage is determined according to the races, education levels and ages of the two potential partners (acceptance step). The introduction of the meeting step distinguishes PWBMsim’s method from conventional stochastic approaches used in other simulations.

PWBMsim divides the marriage and family formation process into two steps: meetings between eligible singles and acceptances of matches (by getting married). PWBMsim also calibrates the divorce process to determine which couples divorce and when. PWBMsim’s family formations, dissolutions, and the resulting evolution of family structures over time are conditioned upon fertility (history), mortality, ethnic heterogeneity, and distributions of eligible singles by education. Micro-data information from the Panel Study of Income Dynamics (PSID) and other surveys is used to calculate parameters (conditional probabilities) that control mate matching and separations within the simulation. An accurate representation of marital history and current marital status of individuals is developed in PWBMsim’s initial population to set the stage for executing marriage and divorce transitions over time. Simulation outcomes are validated against historical data before making future projections.

Constructing such a detailed process of family formation and dissolution is important because of the well-known correlation of family structures with other individual attributes such as labor force participation, productivity, earnings, tax bases, and future eligibility to benefits from many federal programs such as Social Security, Medicare, Medicaid and other welfare programs. Appendix 5 provides details of the marriage and divorce procedures implemented in PWBMsim.

### Mortality

Mortality rates conditional on age (by single year), gender, and ethnicity, are first estimated from information from the National Center for Health Statistics and benchmarked to total mortality by age and gender published in the 2016 Annual Report of the Social Security Trustees. These rates control the incidence of death for different population sub-groups.

Mortality rates after 2016 are maintained at SSA’s mortality rate projections by age and gender through the year 2090. Figure 9 shows SSA mortality-rate projections for selected years by age and gender to which PWBMsim’s mortality rates are benchmarked.

Figure 9: Mortality Rates by Age and Gender consistent with 2016 Annual Report of the Social Security Trustees.

Figure 10: Male 2015 Mortality Rates by Ethnicity and Age

Better medical technology and healthier lifestyles led to a steady decline in U.S. mortality rates overall, as seen in Figure 10.

However, PWBMsim’s mortality rates are also distinguished by ethnicity groupings (White, Black, Hispanic, Asian, and Other), educational attainment, and marital status. The decomposition is done in several stages. The decomposition is executed sequentially, first by ethnicity, followed by education and lastly by marital status. The parameters used for decomposing mortality rates are described below and Appendix 5 describes the procedure.

#### Calibrating Mortality Rates by Ethnicity

It is well known that mortality rates differ systematically with ethnicity. Failure to incorporate these differences in a micro-simulation would result in overrepresentation of population groups at older ages that experience higher mortality rates in reality. In addition, it would affect the population distributions of many other attributes as well including education, family structures through marriage and divorce, labor force participation and so on, since the incidence and prevalence rates of those outcomes are dependent on ethnicity. Finally, the eligibility and distribution of government taxes and transfers and the estimation of many macro-economic and budget aggregates would be affected if systematic mortality differences by ethnic affiliation were ignored.
PWBMsim decomposes SSA’s historical and projected mortality rates by ethnicity using micro-data from the National Center for Health Statistics. These data provide information on the populations and total deaths by age, gender, and ethnicity. Appendix 5 describes the procedures adopted in applying NCHS information to obtain the decomposition.

PWBMsim decomposes SSA’s historical and projected mortality rates by ethnicity using micro-data from the National Center for Health Statistics. These data provide information on the populations and total deaths by age, gender, and ethnicity. Appendix 5 describes the procedures adopted in applying NCHS information to obtain the decomposition.

Figure 11: Mortality Decline Over Time. White Women (Panel A) and Hispanic Women (Panel B).

SSA’s overall mortality rates by age exhibit a steady decline over time as shown in Panel A (for whites) and Panel B (for Hispanics) of Figure 11.

#### Calibrating Mortality Differentials by Education

As demographers know well, mortality outcomes vary significantly across individuals with different education levels and across married and single individuals. This section describes how PWBMsim adjusts mortality rates based on educational attainment and marital status in addition to the other demographic variables (gender, race, age) as discussed earlier. Indeed, adding education and marital status among the determinants of mortality helps to improve the historical simulation’s match with observed data on education and marital status distributions and overall population growth. A higher level of educational attainment is associated with lower mortality risk. Researchers highlight four sets of mechanisms through which educational attainment may influence health and mortality risks: socioeconomic attainment, health behaviors, social psychological resources, and access to and greater utilization of health care services. (Not all of the correlation between education and mortality is causation. It is also possible that the selection into different levels of education is associated with health and risk factors.) Table 1 shows estimates of relative educational differences in males and females, stratified by gender, race, and age group. See Appendix 5 for details about the calculation procedure.

Table 1: Relative Mortality Rates by Educational Attainment

Table 1 shows relative mortality differentials by ethnicity and educational attainment. For example, Non-Hispanic black males aged 45-64 with eight or less years of education are 32 percent more likely to die compared to those in the same age-ethnicity group with 12 years of education. Note that the relative mortality differentials across the entire educational attainment range is considerably more compressed for older compared to younger individuals. It’s notable also that relative differentials are especially wide for younger non-Hispanic black males and females.

#### Calibrating Mortality Differentials by Marital Status

Married individuals have lower mortality rates than singles for several possible reasons. Single people may lack social and moral support, have inadequate mechanisms for relieving stress, and may be less motivated to ensure adequate personal care. (It is also noteworthy that mortality differentials by marital status may not necessarily be caused by that status. It is difficult to control for selection into marriage. The factors that obstruct marriage may also negatively affect health outcomes or risk behaviors.) Table 2 presents estimated relative risk of death for four different types of marital statuses - married, widowed, divorced/separated, and never married. (Estimates are estimated based on the Cox model for different gender and race. Age, income, education and labor force status are controlled.)

Table 2: Relative Mortality Rates by Marital Status

Table 2 indicates that mortality rates are higher for non-married individuals. For white males and females, mortality rates are at least as large for the never-married category. For black individuals, however, mortality rates are higher for widowed and divorced/separated individuals. Appendix 4 describes the procedure for implementing the dependence of mortality rates on marital status.

Finally, Figure 12 shows the closeness of fit for individuals by marital status:

Figure 12: Married Share by Age Group, 1996-2020.

### Disability Transitions

Annual transitions into and out of disabled status are calibrated based on bi-annually linked micro-data from the Current Population Surveys 1996-2015.

Linking CPS micro-data samples for adjacent years is implemented according to the procedures indicated in CPS Survey’s documentation. (See [https://cps.ipums.org/cps-action/variables/group?id=p-linking].) Only about one-half of sampled persons can be linked in this manner across any two adjacent years. Such linked data samples during 1996-97 through 2014-15 are used to estimate rates of transition into and out of disabled status. Those who answer affirmatively to the CPS question about whether the respondent’s health status imposes a work-ability-related impairment are classified as “disabled”. (The CPS variable “DISABWRK” identifies persons who had "a health problem or a disability which prevents him/her from working or which limits the kind or amount of work." Respondents were to ignore short, acute illnesses and temporary episodes of poor health. Note that this definition of “disabled” status does not necessarily imply receipt of disability benefits from any source. It is useful, however, as an explanatory variable for estimating labor income in each year. )

The transition rates are calculated separately by age group, gender, and ethnic affiliation, and are used to assign disability status (disabled or not disabled) to individuals in each year, given (simulated) disability status in the previous year. (To overcome paucity of observations for some groups, only two ethnic groupings were constructed out of five possible affiliations identifiable in CPS micro-data: “White and Asian” and“Black, Hispanic, and Other.” ) The results can be assessed by comparing simulated disability prevalence rates by age group, gender, and ethnicity with prevalence rates estimated directly from the CPS micro-data during the same historical time span.

Figure 13 shows disability prevalence rates by age groups for both ethnic groupings during 2006 and 2015. Disability prevalence rates are flat during youth, begin to increase for those in the mid-to-late thirties and are considerably higher for those approaching retirement age. The simulated disability rates are smoother than the rates estimated for each year in CPS micro-data because transition rates are calculated as averages over micro-data samples grouped for the years 1996-2000, 2000-2005, 2005-2010, and 2010-2015. Hence, they generate smoother rates of disability prevalence by age over the years relative to year-to-year sampling variability by age in annual CPS micro-data samples. As figure 13 shows, simulated disability prevalence rates for both ethnic groupings are quite close to the rates directly estimated from CPS micro-data.

Figure 13: Disability prevalence rates by ethnic and age groups for 2006 (Panel A) and 2015 (Panel B).

### Weeks Employed

The metric used to measure work effort is “full-time-equivalent weeks worked” (FTE weeks) during the year. It is measured from CPS micro-surveys as the product of “hours worked per week” and “weeks worked per year.” The result is hours worked per year. (The IPUMS-CPS variable UHRSWORKLY reports the number of hours per week that respondents usually worked if they worked during the previous calendar year. Individuals were asked this question if: 1) they reported working at a job or business at any time during the previous year or 2) they acknowledged doing "any temporary, part-time, or seasonal work even for a few days" during the previous year. The IPUMS-CPS variable WKSWORK1 reports the number of weeks, in single weeks, that the respondent worked for profit, pay, or as an unpaid family worker during the preceding calendar year. Respondents were prompted to count weeks in which they worked for even a few hours and to include paid vacation and sick leave as work.) To retrieve the FTE-weeks metric, total hours worked per year are divided by 40. Some CPS respondents report working for many more than 40 hours per week, on average, for almost the entire year, resulting in their FTE weeks worked to exceed 52. An upper limit of 104 weeks is imposed on the FTE metric. Just for display purposes, FTE weeks are placed into one of four categories, 0-FTE weeks, 1-26 FTE weeks, 27-52 FTE weeks and more than 52 FTE weeks. Figure 14 shows the initial simulated distribution (percentage of workers in each FTE category) of the four FTE weeks worked categories in 1996 by gender and age.

Figure 14: FTE weeks worked per year, PWBMsim 1996, by age and gender.

Figure 14 exhibits several well-known stylized facts about labor-force participation and work intensity by age and gender. Young males enter the work force earlier than young females. During their prime working years (age25-60), males generally work more FTE weeks compared to females. Many more males work more than 52 FTE weeks compared to females. Moreover, females begin to enter retirement (FTE=0) earlier than males. The simulated FTE weeks distributions closely follow those calculated from the CPS 1996 (survey year 1997) micro-data.

Figure 15: FTE weeks worked per year, 2015 (PWBMsim and based on CPS survey year 2016), by age and gender.

PWBMsim calculates the probability distribution across next period’s FTE-week values conditional on FTE weeks of the current period and other demographic and economic controls. The resulting probabilities are compared against a random draw of a uniform variate with {0-1} support to determine the current simulation year’s FTE-weeks value. Figure 15 shows, as an example, the distribution of FTE-weeks categories by age and gender for the 2015 simulated population (Panel A). Figure 15 also compares the simulated distribution to that computed directly from the CPS 2016 survey data (which informs about employment weeks during 2015). Evaluated on the features mentioned for the 1996 distributions of FTE weeks, the simulated distributions reflect the actual CPS distributions quite closely.

### Weeks Unemployed

Weeks unemployed is not the same thing as weeks not worked because unemployment is defined as involuntary non-work. The distinction hinges on whether a non-working person was looking for work during that time. Fortunately, it is relatively straightforward to measure weeks unemployed from CPS micro-data, which includes two questions pertaining to weeks spent looking for work. One of them is asked of those who were employed between 1 and 51 calendar weeks during the year, and the other is asked of persons who did not work for even a single calendar week during the year. Thus, both groups – those who worked and those who didn’t during the year – could have positive weeks-unemployed values.

Figure 16 shows a scatter chart of the joint distribution of the FTE-weeks employed and (calendar) weeks unemployed for respondents in 2014 (data taken from CPS micro-data survey year 2015). Charts for other years look quite similar.

Figure 16: Weeks Employed and Unemployed in 2014 (CPS survey year 2015).

As noted earlier, FTE weeks employed range between 0 and 104. Calendar weeks unemployed are obviously limited to 52. Figure 16 shows that many individuals have positive values of both – for example, those who worked 60-hour weeks for 4 months of the year would report FTE-weeks as 24. If they looked for work during another 4 months, they would simultaneously report weeks unemployed during the year as 16.

PWBMsim estimates the probabilities of weeks unemployed conditional on an adult’s demographic characteristics, the prior year’s FTE -weeks employed and calendar weeks unemployed. Those probabilities are applied to obtain a transition of weeks unemployed over all adults’ lifetimes.

Figure 17 shows PWBMsim’s distribution of weeks unemployed for the initial simulation year (1996) by age and gender across four categories of annual unemployment weeks: zero, 1-8 weeks, 9-26 weeks, and 27 weeks and more. Again, Figure 17 exhibits well-known patterns of unemployment. A high fraction of younger adults, especially females, experience high levels of unemployment (above 9 weeks per year). Most of it is probably frictional as young adults attempt their first entry into the labor force. Unemployment rates decline with age as middle-aged adults are largely working and they transit directly into retirement (which corresponds to zero unemployment weeks).

Figure 17: Simulated unemployment weeks per year, 1996, by age and gender.

The simulation outcomes for each successive year after 1996 are obtained by applying cross-year probabilities of transiting across different unemployment weeks given current and prior demographic characteristics and current and prior levels of FTE-weeks employed. The annual simulation sequence results in PWBMsim's distribution across weeks unemployed in 2015. Figure 18 compares these distributions to those calculated directly from the CPS for the same year (taken from survey year 2016).

Figure 18: Unemployment weeks per year, 2015 (PWBMsim and based on CPS survey year 2016), by age and gender.

Figure 18 shows, as an example, the distribution of unemployment-weeks categories by age and gender for the 2015 simulated population (Panel A). Figure 18 also shows the same distributions computed directly using micro-data from the CPS 2016 survey (which informs about unemployment weeks during 2015). The distributions by gender in the two panels show similar patterns: unemployment weeks are large for younger individuals and decline with age for both genders.

### Immigration

As described earlier, population shares of the foreign-born are calibrated according to data on their prevalence rates in the Current Population Survey, 1996, after adjustments to remove under-representation of certain ethnic groups in that Survey. In subsequent years, net new immigration into the United States is implemented in PWBM microsimulation by adding new immigrants and subtracting emigrants. The gross immigration and emigration flows are calibrated to the Social Security Administrations estimates. The distribution across legal status (native, naturalized, legal, and unauthorized) is based on information from the Social Security Administration (split between permanent and non-permanent immigrants), the U.S. Census Bureau (annual immigration flows), DHS and Pew Hispanic Center (stocks of unauthorized immigrants). Immigration and emigration may be implemented for individuals as well as entire families.

In addition, the CPS asks foreign-born individuals when they entered the country. This information is used to identify other immigrants’ characteristics by year of immigration. The attributes assigned include age, gender, race (source country), work-status, disability, marital status and associated spousal characteristics, the number of children and associated children’s attributes, and so on. Figure 19 shows prevalence rates of the foreign born by age in the CPS (Panel A) and PWBM microsimulation (Panel B). We project that future immigrants will have a similar composition as immigrants in the recent past.

Figure 19: Shares of Male Immigrants in Total Population by Year and Age -PWBM_SIM 1996 and 2015