Appendix 4: Education Transition Calibrations - Technical Details

This Appendix provides details about the calibration of annual transitions for the “years of schooling completed” $e$ variable.

Let’s say we have the probability distribution function $(pdf)$ of $e$ for particular age $a$, year $y$, and gender $g$, given by $f[e\mid a,y,g]$. For example, $f[e=12\mid 18,2001,\text{female}]$ measures the percentage of 18-year-old females in the year 2001 who completed high school but who did not obtain further education. We can compute each year’s transition rates by comparing the $f[e\mid a,y,g]$ distribution with $f[e\mid a+1,y+1,g+1]$ distribution. (Education transitions are constructed for 10 different subgroups by gender and five races, depending on the education distribution of each subgroup.)

\begin{equation} (F[e \mid a,y,g]-f[e \mid a,y,g]\ast t[e \mid a,y,g] ) = F[e\mid a+1,y+1,g] \end{equation}

In Equation 1, the cumulative distribution, $F$, of e for females decreases in age and year, due to educational transitions. For example,

$$ F[e = 12 \mid 18,2001,\text{female}] = f[e \leq 12 \mid 18, 2001, \text{female}] $$

is greater than

$$ F[e = 12 \mid 19, 2002, \text{female}] = f[e \leq 12 \mid 19, 2002, \text{female}] $$

and the gap measures the people of $f[e=12 \mid 18,2001,\text{female}]$ who attain one more year of education and move to $(e=13)$ at the transition rate of $t[e=12 \mid 18,2001,\text{female}]$. Thus, transition rates are computed as ($t[e=12\mid 18,2001,\text{female}]=\frac{f[e≤12\mid 18,2001,\text{female}]-f[e≤12\mid 19,2002,\text{female}]}{f[e=12\mid 18,2001,\text{female}])}$)

$$ t[e\mid a,y,g]=\frac{F[e\mid a,y,g]-F[e\mid a+1,y+1,g]}{d[e\mid a,y,g]} $$

The above e distribution requires some adjustments in $f[e\mid a,y,g]$. First, we set the maximum $e$ possible at each age, a, to $\min(a-5,18)$. Thus, $e$ can advanced at a given age only if $e\lt\min(a-5,18)$. This means $\max(e)=0$ for those aged 5, $\max(e)=1$ for those aged 6 etc. Children aged 6 have a positive probability of acquiring an additional year of education – that is, of moving from $e=0$ at age 5 to $e=1$ in the current year. Some of those aged 6 may end up with $e=0$, with the rest remaining at $e=0$.

Second, for people aged 18, 19, and 20 with $e=13,14,15$ (some college), pdf values are further adjusted to obtain transition rates consistent with observed education prevalence rates in the American Community Survey (ACS) micro-data. Before adjustments, $f[e=13\mid 18,2008,g]=0.204$ while $f[e=12\mid 17,2007]=0.077$ for white males. (The gender subscript is dropped in the rest of the description for convenience.) Considering that $13(12)$ is the maximum $e$ possible for those ages $18(17)$, 0.204 should be the outcome of transition from $e=12$, $a=17$ of the previous year. However, this is impossible with the current measure, 0.077, which is smaller than 0.204. Thus, we adjust the pdf value for $f[e=13\mid a=18,y]$ bounded by $f[e=12\mid 17,y-1]$. $f[e=12│18,y]$ is increased by the gap $f[e=13\mid 18,y]-f[e=12\mid 17,y-1]$. We do the similar adjustments and set an upper bound for $f[e=14\mid 19,y]$ using $f[e=12\mid 17,y-2]$, increasing $f[e=13\mid 19,y]$ instead. Likewise, $f[e=15\mid 20,y]$ is restricted by $f[e=16\mid 21,y+1]$ and the gap becomes $f[e=14\mid 20,y]$. ( By doing this adjustment, $e=13$ is strictly defined as people who ‘completed’ one year of college for these groups (ages 18-20) who are actively obtaining college education. If we do not have lagged data (for example, the first year data do not have year-1 information), we use data from the same year.)

Further Adjustments

By taking the average of transition rates $t[e\mid a,y]$ across years, we can obtain $t[e\mid a]$. (For $e\le 8$, we use data from years 2008-2014, since surveys from years 2001-2007 use combined categories for grades 1-8. Transitions at $edu\ge 9$ are computed based on 2001-2015 surveys.) Since data is cross-sectional, not panel, we set bounds for transition rates to resolve sampling errors. First, each year’s transition rates are bounded by (-0.999,0.999). (Note that negative numbers are allowed for each year’s transition rates. Negative transition rates would lower the final transition rates.) Seconds, lower and upper bounds for average transition rates are based on school attendance rates over the survey period. Let $a[e,a]$ (attendance rate) be the fraction of the subgroup population at the given age and education level who are currently attending school. It is reasonable to assume that transition rates should be lower than attendance rates. Thus, the upper bounds for transition rates are set based on attendance rates $t[e\mid a]\le a[e,a]$. Lower bounds are set to $(0.1)\times[e,a]$.

Some of minor adjustments are listed below:

  • For ages 39 and under, upper bounds are set to 0.999, instead of attendance rates. For ages 40 and over, attendance rates are computed for age groups (ages 40-49 constitute a group, 50-59, 60-69, 70-79, 80-89 likewise).
  • For transitions to $e=17$ and $18$ (post-college education) are restricted with lower bounds $0.0$ and upper bounds $0.5\times a[e=16, a]$ and $0.33\times a[e=17,\text{age}]$, considering that moving to next levels require more than one year of education. Transitions to $e=12$ and $e=16$(obtaining high school diploma or bachelor’s degree), lower bounds are $0.3\times a[e=11 or 15,a]$. Upper bounds are $3\times a[e=11 or 15,a]$. These exceptions for $e=11,15,16,17$ also improve fits of simulated results compared to historical data.
  • If estimation of a transition rate is not feasible for a particular group (by age, race, and gender) due to small samples, the transition rate for whites of the same gender and age is used. The same procedure is adopted for imputing attendance rates in small sample cases.
  • The order of the transitions in executing the simulation matters: During the simulation, aging comes before all other transitions. Thus, final transition rates take values from transition rates for one year younger people, $t^{\text{final}} [e\mid \text{age}]=t[e\mid \text{age}-1]$. For example, estimated $t[e\mid a=5]$ is applied to individuals who became age six in a simulation year.
  • Education transitions are executed for each simulated person aged six through 89. (Education transition rates are assumed to be 0 at ages over 89. It is difficult to compute transition rates for those who are age 90 or over, due to small sample sizes. However, it is a reasonable assumption given that the attendance rate of people who are age 90 or over is 0.0043 in ACS surveys (2001-2015). In addition, changes in education at older ages have little impact on other economic or demographic consequences. (Most of those are not engaging in economic activities. Mortality rates do not depend on education levels above age 84.))
  • Test simulations from years 1997 revealed that some groups in the initial simulated population exhibit high persistence at a low education level ($e=0$ and $8$), worsening the match with ACS micro-data distributions of education prevalence rates in subsequent years. A special correction is applied to transition rates to make them consistent with rates for younger groups in the initial simulated population from 1997-2007.
  • Test simulations also exhibited over-estimation of the “some college” education level and under-estimation of the “college graduate” level compared to ACS micro-data. Corrections to transition rates are applied to $e=15$, by reducing the applicable constraints on rates: $0.8\times a[e=15,a]$ at ages 21-29 through the year 2014.