Estimating Confidence Intervals in a Tax Microsimulation Model
Abstract
Since the creation of microsimulation models, the need to measure uncertainty has been recognized, but little research has been conducted in spite of the widespread use of these models. In this article, we calculate confidence intervals for a large tax microsimulation model, comparing a normal approximation to a bootstrap estimator. We estimate confidence intervals for five proposed changes to tax law. We explore the relationship between the size of proposals’ pointestimated impacts and their confidence intervals by considering high, central, and low scenarios for three proposals. To explore uncertainty around smallmagnitude estimates, we consider a proposal designed to keep tax revenue unchanged. Finally, we consider a proposal that affects a small number of taxpayers but has a large heterogeneous effect. We find that, overall, confidence intervals in the model fit tightly around the point estimates, but there are exceptions. We also find that in many cases the normal approximation is close to the bootstrap estimator but may differ for policy changes that affect a small number of taxpayers.
1. Introduction
In 1957, Guy Orcutt introduced microsimulation modeling by making a radical proposition: rather than using aggregate data to predict the behavior of populations, researchers could model the decisions made by households and aggregate the results.^{1} Data on households would be drawn from surveys to create a sample that would represent the population; the decisions of each household would be separately modeled and the aggregate decisions would predict the behavior of the population. Further, researchers could learn how various assumptions and parameters affect the model’s outcomes by recalculating the results under different assumptions and parameters. The benefit of this approach to study government policies was clear: by altering model parameters, the difference in outcomes attributable to different policy choices could be measured. Importantly, uncertainty could be estimated by repeatedly running the model on random subsamples and calculating standard errors and confidence intervals.
Scholars quickly embraced this new approach and by the early 1960s, Joseph Pechman and colleagues at the Brookings Institution had developed a microsimulation model of individual taxation that calculated both total federal revenue and the distribution of taxes across income groups. Since then the microsimulation approach has become the standard method for evaluating the United States individual income tax system, and it is used by organizations such as the Congressional Budget Office (CBO), the Treasury Department’s Office of Tax Analysis, the Joint Committee on Taxation (JCT) and the UrbanBrookings Tax Policy Center (TPC).
In microsimulation analyses, measures of uncertainty, such as standard errors or confidence intervals, can play at least two important roles. First, they can help us understand whether economically meaningful differences in estimates by different organizations reflect different modeling choices or mere sampling variation. Second, they can also help us infer whether apparent differences in outcomes among tax units reflect differences in the population or just sampling variation.^{2} For example, a proposed change in tax law may appear progressive when the average tax cut for a lower income group is larger than the tax cut for those in a higher income group in a sample. Yet if the estimates are not significantly different in a statistical sense, the policy change may not be progressive if applied to the population. Unfortunately, neither standard errors nor confidence intervals are routinely calculated or published in the U.S. although some recent research using EUROMOD, the tax and benefit microsimulation model for the European Union, has included asymptotic standard errors.^{3}
For many years, measures of uncertainty were not routinely estimated because they required faster computers than were available. For example, Citro et al. (1991) surveyed the state of microsimulation modeling and recommended the use of “new computer technologies to provide enhanced capabilities, such as the ability for a wider group of analysts to apply the models; conduct timely and costeffective validation studies, including variance estimation and sensitivity analyses.” Computational power has since improved substantially yet the few attempts at measuring uncertainty either use simplifying distributional assumptions (Pudney and Sutherland, 1994), or simple policy changes that allow for the use of normal approximations (Goedemé et al., 2013).
In this paper, we demonstrate the use of bootstrapping to create standard errors and confidence intervals for estimates from TPC’s individual income tax model. We also rely on a fundamental insight that may have escaped prior researchers: resampling does not require recalculating taxes for each draw. Instead, in each resample the weights are adjusted and applied to the original sample of tax units. These resamples’ weights only need to be produced once and can be used to compute confidence intervals for point estimates of any policy proposal. We also calculate confidence intervals using a normal approximation and compare the results to our bootstrapped intervals. To this end, we estimate five potential changes in tax policy. For (i) a partial restoration of the personal exemption, (ii) an increase in the standard deduction, and (iii) an increase in the top marginal tax rate, we estimate the effects of high, central, and low scenarios. In addition, we consider a revenueneutral proposal to replace the currentlaw farm loss deduction with a refundable tax credit based on the total farm loss that should create confidence intervals around point estimates of zero dollar for the average change in tax burden. Finally, we examine a rescission of the taxfree treatment of interest accruing from certain government bonds that should only affect a small group of heterogeneous tax units.
Our results are as follows. First, in most cases the standard errors of the estimated change in taxes for most income groups are small and the 95 percent confidence intervals fit tightly around the point estimates. This means that differences across income groups are statistically significant and, if the same holds true for other tax models, economically meaningful differences among models are also statistically significant. Further, the normal approximation are quite close to the bootstrapped estimates. Second, for the policy alternatives that have high, central and low scenarios, the scenarios leading to larger effects also lead to wider confidence intervals. Finally, when reform only affects a small number of tax units, the average change in taxes may be estimated imprecisely. In these cases, it may be difficult to determine if a proposed policy change is progressive or regressive. In addition, the normal approximation to a confidence interval may differ from the bootstrapped version.
2. Literature review
Orcutt (1957) is the foundational paper on microsimulation. He hypothesized that simulations could be calculated on “a large electronic machine, such as the IBM 704 or the UNIVAC II, or some improved successor to these powerful giants.” Both the IBM and the UNIVAC II used vacuum tubes and the IBM could calculate up to 12,000 floating point additions per second.^{4} He also noted that for such models “All predictions could be obtained in the form of expected values plus some measure of uncertainty. Or, if desired, they could be in the form of confidence interval estimates.”
Pudney and Sutherland (1994) test the finite sample properties of the Central Limit Theorem by estimating the sampling variation of microsimulation models under the assumption of normality. They conclude that “the baseline simulations are reasonably accurate, but that some widelyused measure of the effects of policy changes may be very imprecise estimates of population effects.” For example, when estimating the number of winners and losers from a revenueneutral reform of family benefits, the confidence interval for the population as a whole is plus or minus four percent, but the confidence interval for single parent families is plus or minus 29 percent. We follow Pudney and Sutherland (1994) and calculate confidence intervals under the normality assumption.
However, our research is closest to Fiorio (2003), who runs a tax microsimulation model “backwards” on a small dataset. The author starts with aftertax income and estimates pretax income by adding calculated taxes. The author then uses a bootstrap to estimate the 90 percent confidence intervals due to sampling variation and concludes that, while the confidence interval for the entire sample is small, the distribution is very asymmetric (and therefore badly approximated by the normal distribution) and the interval can be wide in subpopulations.
Creedy et al. (2007) use a parametric bootstrap to sample from the estimated distribution of labor supply elasticities. Their algorithm runs extremely slowly, however, and even with a sample of only 7,000, calculations take weeks to complete. They address the problem by using a small sample to estimate the mean and standard deviation and then create confidence intervals by assuming the parameters are normally distributed.
Goedemé et al. (2013) suggest that if there are no behavioral parameters, or if they are ignored, many changes in the output of tax microsimulations are nearly linear functions of income. In this case, it can be straightforward to calculate the variance of a difference between a baseline distribution of taxes and an alternative scenario using a Taylor firstorder linearization of a variance estimate.^{5} However, even absent behavioral parameters such as labor supply elasticities, decisions involving whether or not to use various credits or deductions can create difficulttomap nonlinear function of income.
3. Data
3.1. Public use file
We use TPC’s microsimulation individual income tax model in this research. The model is based on a baseyear data set composed of a sample of taxfiling ‘tax units’, which were constructed to be nationally and state representative of the tax filing population for the 2011 tax year, and a sample of nonfilers in 2011. The taxfiling sample was a projection of the 2006 publicuse file (PUF) produced by the Statistics of Income (SOI) Division of the Internal Revenue Service (IRS). The 2006 PUF contains 145,858 tax unit records with detailed information from federal individual income tax returns filed in calendar year 2006. Nonfilers were derived by matching the projected 2011 taxfiling tax units with the March 2012 Current Population Survey (CPS) of the U.S. Census Bureau.^{6} Additional variables were imputed for analyzing a wide range of tax policy proposals such as education, health, retirement, estate, and consumption taxes and tax benefits.
To perform revenue and distribution analyses for future years, the 2011 data are further extrapolated for years beyond 2011 using a twostep process based on forecasts and projections from CBO, JCT, the IRS, and U.S. Census Bureau. First, the dollar amounts of income, adjustments, deductions, and credits on each record are inflated by their appropriate forecasted per capita growth rates. Second, TPC uses a linear programming algorithm to adjust the weights on each record so that the major income items, adjustments, and deductions match aggregate targets. TPC also adjusts the overall distribution of adjusted gross income (AGI) to match published information from SOI for available years and projections from CBO for further years.
3.2. TPC tax model
Using the extrapolated 2011 data, TPC simulates policy options using a detailed tax calculator that captures most features of the federal individual income tax system. The model reads the data for each tax unit, calculates its tax liability, and outputs the same data with the liability appended. Importantly, this includes the weight for that tax unit. The model’s current law baseline reflects major income tax legislation enacted through early 2019, including the Tax Cuts and Jobs Act of 2017 (TCJA).
In its analysis of the distributional effects of a simulated policy change, TPC includes the following federal taxes in its calculation of effective tax rates: individual and corporate income taxes; payroll taxes for Social Security and Medicare; excise taxes; and the estate tax. TPC calculates effective tax rates using a broad measure of income called Expanded Cash Income (ECI), which is defined as adjusted gross income (AGI) plus: abovetheline adjustments (e.g., IRA deduction, student loan interest deduction, selfemployed health insurance deduction, etc.), employer paid health insurance and other nontaxable fringe benefits, employee and employer contributions to taxdeferred retirement savings plans, taxexempt interest, nontaxable Social Security benefits, nontaxable pension and retirement income, accruals within defined benefit pension plans, inside buildup within defined contribution retirement accounts, cash and cashlike (e.g., SNAP) transfer income, employer’s share of payroll taxes, and imputed corporate income tax liability.
The effects of tax policy are reported by ECI group. We focus here on groups based on dollar values of ECI. (TPC also reports groups based on percentiles.) ECI is a broad measure of pretax income which serves as a proxy for tax units’ economic wellbeing and their ability to pay taxes. In this exercise, we first separate out tax units with either negative ECI or AGI. Then, we categorize the remaining tax units into 11 groups by ECI level. Table 1 shows these ECI groups and their sample sizes.
4. Method
The bootstrap is a common method for estimating the sampling variation and confidence interval of nonlinear functions applied to a set of observations. It is particularly useful when the distribution of these observations, such as the change in tax liability from a tax policy proposal, is asymmetric. We start by following a description of the bootstrap given in Kolenikov (2010). We then describe the bootstrapping method used in this paper.
We start with a population distribution function $F\left(x\right)$ and a sample of n independently and identically distributed observations (x_{1} ,…,x_{n} ). We wish to estimate a statistic $\theta $ as a function of the distribution $\theta =T\left(F\right)$ . For example, $\theta $ could be the average change in aftertax income for tax units in a given range of income, and $T\left(F\right)$ could be a function of the tax code for those tax units. We estimate this statistic using our sample, ${\hat{\theta}}_{n}=T\left({F}_{n}\right)$ , where ${F}_{n}$ is the empirical distribution. In a stratified data set such as the one used in this paper, $\hat{\theta}}_{n$ is created using a weighted sample. Nevertheless, we do not need to assume that the point estimate $\hat{\theta}}_{n$ is an unbiased estimate of $\theta $ in finite samples.
Of course, the exact estimate depends on our sample. If other samples had been used, the estimate would be different. In general, the estimate will vary across those samples so that we cannot be certain about the distance between the estimate and the statistic in the population. If $T\left(F\right)$ is a linear function, we can calculate the variance of the estimate using $V\left(\hat{\theta}\right)=E{\left(\hat{\theta}E\hat{\theta}\right)}^{2}$ , where $E$ is the expected value operator.
The Central Limit Theorem implies that if a large number of independent tax units are affected, the distribution of an average tax change can be approximated using the normal distribution. We calculate confidence intervals under this assumption, as $\hat{\theta}\pm C\times \sqrt{V\left(\hat{\theta}\right)}$ , where C is the appropriate twosided critical value for the normal distribution.
But under the U.S. individual income tax system, taxes owed are a highly nonlinear function of income. In addition, if the sample size is small, confidence intervals may be poorly estimated by methods that assume asymptotic normality of the statistics.
As an alternative, the bootstrapping method estimates various aspects of the distribution of $\hat{\theta}}_{n$ by repeatedly drawing subsamples. The steps are: (i) draw with replacement R samples of size m $\left({x}_{1}^{\ast},...{x}_{m}^{\ast}\right)$ from the original observations (x_{1} ,…,x_{n} ), (ii) calculate the statistic ${\hat{\theta}}_{m}^{\ast}=T\left({F}_{m}^{\ast}\right)$ on each bootstrap sample, where ${F}_{m}^{}$ is the empirical distribution function of the bootstrapped sample, and (iii) calculate the appropriate moments $F}_{m}^{\ast$ from the R statistics $\hat{\theta}}_{m}^{\ast$ . In the first step the size of the bootstrap sample m is usually set to n, the size of our sample. The number of samples R can be set to any number, although Kolenikov (2010) indicates that R is usually set to be between 100 and 1,000. In the second step the statistic is calculated all R times.
The third step relies on the insight that simulated sampling via bootstraps mimics the original sampling process, so that $\hat{\theta}}_{m}^{$ relates to $\hat{\theta}}_{n$ as $\hat{\theta}}_{n$ relates to $\theta $ . For example, the potential bias of $\hat{\theta}}_{n}^{$ , defined as $E\left({\hat{\theta}}_{n}\theta \right)$ , can be estimated as ${\hat{b}}_{n}={E}^{\ast}\left({\hat{\theta}}_{m}^{\ast}{\hat{\theta}}_{n}\right)$ , where ${E}^{}$ is an unweighted average across the R samples. Similarly, the variance of ${\widehat{\theta}}_{n}$ is estimated as $E}^{\ast}{\left({\hat{\theta}}_{m}^{}{E}^{\ast}{\hat{\theta}}_{m}^{}\right)}^{2$ , and the calculation of the standard error of $\hat{\theta}}_{m}^{$ follows naturally.
As with the bias and variance, we can estimate the cumulative distribution function of the estimated parameter $Prob\left({\hat{\theta}}_{n}\theta <t\right)$ with its analogue based on the bootstrapped sample $Prob\left({\hat{\theta}}_{m}^{\ast}{\hat{\theta}}_{n}<t\right)$ . A naïve estimate of a confidence interval would be based on ${E}^{\ast}I\left({\hat{\theta}}_{m}^{\ast}{\hat{\theta}}_{n}\le t\right)$ , where $I$ is an indicator function equal to 1 if the condition holds and 0 otherwise. It has long been recognized that the bootstrap can also be used to estimate a biascorrected confidence interval as ${E}^{}I\left(2{\hat{\theta}}_{n}{\hat{\theta}}_{m}^{}\le t\right)$ .^{7}
Our goal is to estimate confidence intervals for the point estimates $\hat{\theta}}_{n$ , such as average tax changes and percent change in aftertax income, of the proposed changes in the individual income tax from the TPC’s microsimulation model. To obtain confidence intervals of these point estimates, we bootstrap Tax Model runs in two steps: (i) producing a series of alternative analytical weights of tax units in the Tax Model and (ii) calculating relevant bootstrapped statistics using these alternative weights.
4.1. Producing a series of alternative analytical weights
Similar to a standard bootstrap exercise, we want to draw observations with replacement from the original data set after tax liabilities have been calculated. The idea is that this set of alternative analytical weights can be used to infer the population characteristics of the original Tax Model sampled tax units. However, because the Tax Model observations carry different weights (and these observations represent one hundred eightyfour million US tax units in 2019), the process is not straightforward. Here we rely on the Stata bsweight command formulated in Kolenikov (2010) to produce a series of replicated observations’ weights. In particular, each set of alternative analytical weights is constructed such that observations are sampled with replacement, for each ECI group h, set m_{h}=n_{h}1, and the observations weights w_{i} within each ECI group are scaled such that totals by ECI group match the totals derived using the original weights.^{8}
In the current exercise, we produce 201 sets of alternative analytical weights.^{9} Producing a large set of alternative analytical weights is time consuming although they only need to be produced once and can be used to compute confidence intervals for point estimates of any policy proposal. In addition, we can use a smaller number of these alternative analytical weights to speed up the calculations of bootstrapped statistics (more on this below).
4.2. Calculating relevant statistics
In this exercise, we calculate confidence intervals using both the normal approximation and the bootstrapping approach. A complication occurs with the normal approximation because there are several formulas for calculating standard errors of weighted averages. We address this by following Gatz and Smith (1995) and use the standard errors calculated from our bootstrap procedure. The normal approximation of the 95 percent confidence interval is thus calculated as the average change in taxes, plus or minus 1.96 times the bootstrapped standard error (BSE). Gatz and Smith also describe several formulas for approximating normal standard errors of weighted averages and for comparison we use one of their formulas GSSE = $\sqrt{\frac{1}{n}\left(\frac{1}{{w}_{i}}\right)\sum {w}_{i}{\left({x}_{i}\overline{x}\right)}^{2}}$ . We then calculate confidence intervals as the average change in taxes plus or minus 1.96 times the GSSE.
To calculate the bootstrap standard errors and confidence intervals we rely on builtin biascorrections (BC) available in the Stata bootstrap command bs4rw.^{10} In particular, we use bs4rw with the summarize command to obtain bootstrapped statistics for totals and averages and with the ratio command to obtain bootstrapped statistics for percentages.
In practice, the program runs progressively more slowly as the set of alternative analytical weights become larger. In this exercise, we use a set of 201 alternative weights, which allow us to theoretically rank the bootstrapped estimates with an interval of 0.5% from 0^{th} to 100^{th} percentiles.^{11}
As a caution, it is a good practice to inspect the derived bootstrapped statistics in detail. For example, when a policy proposal only affects a few tax units in an income group, many of these 201 simulations may not pick up any tax units affected by the policy and as a result the bootstrapped statistics will be calculated based on only the subset of simulations where affected tax units were drawn. In such case, it may be useful to focus as well on bootstrapped statistics of the fraction of tax units affected by the proposal and the average tax changes per tax units affected.
5. Results
We apply the bootstrapping methodology described in the previous section to examine TPC’s microsimulation distributional estimates of five alternative tax policy proposals if implemented in 2019. These proposals are (i) a partial restoration of the personal exemption eliminated by TCJA, (ii) an increase in the standard deduction, (iii) an increase in the top statutory income tax rate, (iv) replacing the currentlaw farm loss deduction with a revenueneutral refundable tax credit based on the total farm loss, and (v) a rescission of the taxfree treatment of interest accruing from certain government bonds. We chose these policy proposals because point estimates should be relatively more uncertain when proposals affect fewer, more diverse tax units. In addition, to allow for confidence intervals to vary with the point estimates, we also varied the relevant parameters of the first three proposals to three levels: high, central and low scenarios. In our discussion, we only present a subset of the estimates for tractability. The complete estimates are available in an online appendix.^{12}
5.1. Partially restore the personal exemption
Before 2018, tax units could claim a personal exemption for every adult and qualified dependent. In 2017 the exemption amount was $4,050. For many lowincome households, this exemption eliminated their tax liability.^{13} The exemption phased out at high levels of adjusted gross income (AGI), with the 2017 phaseout beginning at an AGI of $261,500 for singles, $287,650 for heads of household, $313,800 for married couples filing a joint return and $156,900 for married couples filing separately. Personal exemptions were completely phased out when AGI reached $384,000 for singles, $410,150 for heads of households, $436,300 for married couples and $218,150 for married couples filing separately. TCJA eliminated the personal exemption and increased the standard deduction and the child tax credit between 2018 and 2025.^{14}
We consider a scenario that would restore the personal exemption (and phaseout) partially in 2019 to $4,000, and estimate how taxes would change. We then repeat this exercise with scenarios that would restore the personal exemption to $2,000 or $1,000. In each scenario, each tax unit’s tax change under the proposal is calculated as the difference between the tax unit’s tax liability with the exemption restored and its tax liability under current law. We then apply the bootstrapping procedure and calculate estimates, such as the average change in income for tax units in a given range of income, their standard errors, and confidence intervals.
Recall that we examine uncertainties around a point estimate due to sampling variation. For this proposal, sampling variation come from three sources: (i) variation in the number of exemptions across tax units, (ii) variation in tax units’ tax rates and (iii) variation in the amount of the exemption each tax unit can claim. For (ii), a tax reduction for a full $4,000 per person exemption reduces tax by only $480 for a single tax unit with no dependents in the 12 percent tax bracket but $1,280 for a similar tax unit in the 32 percent bracket. For (iii), low income tax units whose taxable income is less than the available exemptions under the proposal ($4,000 multiplied by the number of personal exemptions) will only be able to claim an exemption amount up to their taxable income while high income tax units may face a phaseout and can only use part or none of the available exemptions.
To conserve space, we present our results for a small number of ECI groups. Here we use three: a lowincome group of $75,000 to $100,000; a middleincome group of $200,000 to $500,000; and a highincome group of more than $1 million. The results for all income groups are available in the online appendix. Because ECI accounts for components of income that are not part of AGI and tax proposals generally depend on AGI, a proposal may affect tax units in an ECI group that at first glance seems counterintuitive. For example, a small number of tax units with ECIs in excess of $1 million have AGI below the phaseout limits of the personal exemption and, as a result, may receive a tax cut under this proposal.
The point estimates in the last column of Table 2 indicate that restoring the personal exemption at $4,000 would reduce the average tax burden of tax units in the lowincome group by $1,023, in the middleincome group by $2,459, and in the highincome group by $8.^{15} The middleincome group get the largest average tax cut. The corresponding width of each point estimate’s confidence interval is $23, $43, and $10, respectively.
But the first column of Table 2 shows that only a very small number of tax units in the highincome group (0.6 percent) receives a tax cut. Highincome tax units whose AGI are larger than their phaseout limits do not benefit from the proposal because their available personal exemptions are completely phased out.
As the second column of Table 2 shows, the average tax cut among tax units receiving a tax cut under the proposal are $1,085, $2,497 and $1,214 for the low, middle, and highincome group, respectively. The average tax cut for the middleincome group is more than twice the size of the cut for the lowincome group mainly because the middleincome group faces a higher income tax rate and a greater share of tax units in the middleincome group file as marriedfilingjointly, allowing them one more exemption than those filing as single or as head of household. In contrast, the smaller benefit for highincome tax units  even with the large average tax rate  occurs because some of them are in the phaseout range. As a result, their tax cut is only half of the middleincome group’s average based on the point estimates.
Below the point estimates and standard errors in Table 2 are the confidence intervals around these point estimates of average tax changes among tax units receiving a tax cut. The confidence interval around the lowincome group’s tax cut of $1,085 is [$1,077, $1,094], the confidence interval around the middleincome group’s tax cut of $2,497 is [$2,475, $2,515], while the confidence interval around the highincome group’s tax cut of $1,214 is [$789, $1,681]. Note that the point estimate for the lowincome group is within the confidence interval of the highincome group. That is, the estimated effects of the high and low groups are not statistically different. In the addendum, we compare confidence intervals using a normal approximation with BSE to the confidence intervals created with the bootstrap. Overall, the normal approximation works well, although its confidence interval of the highincome group is not as wide as the bootstrap interval.
The highincome tax units’ point estimate is much less precise because the proposal only affects a very small number of sample observations in this income group and the change among the few observations is large. Table 3 shows that 40,100 tax unit observations represent 836,500 tax units in the topincome group. However, only 71 observations represent 5,190 tax units that receive a tax cut under the $4,000 personal exemption proposal. In other words, the proposal affects 0.62% of tax units in the highincome population but only 0.18% of those in the sample. Tax units’ tax cuts vary from $15 to $3,840 so the weighted average tax cuts from different mixes of observations can be quite different. In contrast, Table 3 shows that out of 35,400 tax unit observations representing 14.0 million tax units in the middleincome group, 32,220 observations representing 13.8 million tax units would receive a tax cut under the $4,000 personal exemption proposal. Their tax cuts vary from $14 to $11,000 but because most of the sampled tax units in the middleincome group receive a tax cut, the weighted average tax cuts from different mixes of observations should be quite similar.
5.2. Increase the standard deduction
We next consider a more complicated change in tax liability. When filing tax returns, tax units can reduce their taxable income using either itemized deductions or the standard deduction. Because tax units with the same taxable income and filing status may have different amounts of expenses eligible for itemized deductions, some will choose to itemize and others will not.
In 2019, the current law standard deduction amounts are $12,200 for single filers and married couples filing separately, $18,350 for heads of household and $24,400 for married couples filing jointly. We consider scenarios that would increase the standard deduction amounts by $4,000, $2,000 and $1,000.^{16} In each scenario we compare the itemized deductions to the standard deduction for each tax unit and select the deduction that results in the lowest tax liability. We describe the effects on tax units with incomes between $50,000 and $75,000 (lowincome group), between $200,000 and $500,000 (middleincome group), and greater than $1 million (highincome group).
Sampling variation in estimating the effects of this change comes from three sources: (i) variation in the amount of the increased standard deduction that can be used (ii) variation in the amount of itemized deductions and, (iii) variation in the tax rates. The first occurs because married couples filing jointly receive twice the amount of single filers but they may not be able to deduct the full additional amount available if the increase makes the total deduction greater than their taxable income. The second comes from variation in the amount of expenses that can be deducted. Tax units who were originally claiming the standard deduction before the increase will continue to do so. Tax units with itemized deductions greater than the increased standard deduction will still itemize. Those with itemized deductions greater than the old level of standard deductions but less than the new level will stop itemizing.
Because there are more potential sources of variation from raising the standard deduction than from the partial restoration of the personal exemption, this proposal may generate more variation in average benefits for each income group, and possibly wider confidence intervals. In particular, there may be more variation among tax units with similar incomes.
The variation across income groups can be seen in Table 4. Under the high scenario in which the standard deduction is increased by $4,000, tax units in the lowincome group see an average tax cut of $501, those in the middleincome group see a tax cut of a $1,286, and those highincome group see an average tax cut of $491.
This pattern reflects two offsetting factors. On the one hand, highincome tax units are more likely to have a sufficient amount of deductions to itemize even when the standard deduction is increased. As shown in the first column in Table 4, nearly 80 percent of tax units in low and middleincome groups benefit from a higher standard deduction, but only 23 percent for those tax units with incomes greater than $1 million benefit.^{17} On the other hand, among those units that take the standard deduction, higherincome tax units face higher marginal rates, so an increased deduction would reduce tax liabilities more for higherincome tax units than for lowerincome tax units. This is shown in the second column in Table 4, in which those in the lowincome group that use the standard deduction see an average tax reduction of $626 while those in the middleincome group see an average tax reduction of $1,622. Those in the highincome group see an average reduction of $2,131.
Unlike the personal exemption, a sizable share of each income group receives a benefit and the confidence intervals are small for all groups, although the confidence interval is larger in the highincome group, similar to the personal exemption proposal. The confidence interval for the average tax cut ranges from $494 to $506 for the lowincome group, from $1,269 to $1,297 for the middleincome group and from $467 to $521 for the highincome group. Overall, the confidence intervals using normal approximation with BSE are similar to the bootstrapped confidence intervals.
5.3. Increase in the top tax rate
TCJA lowered the top rate from 39.6 to 37 percent starting in 2018. The top tax rate applies to taxable income over $500,000 for single filers and over $600,000 for married couples filing jointly. Tax brackets are adjusted annually for inflation. We consider an increase in the top rate to 41, 39 or 38 percent. Increasing the top rate to 41 percent would only affect about 0.6 percent of all tax units, almost exclusively those with expanded cash income of over $500,000. We therefore describe the effects on tax units with incomes between $200,000 and $500,000 (lowincome group), between $500,000 and $1 million (middleincome group), and with incomes greater than $1 million (highincome group).
Sampling variation comes from four sources: (i) from the mix of single tax filers and married couples filing jointly in an income group, (ii) from the share of tax units in each income group to which the top rate applies, (iii) from the amount of income above the threshold, which determines the amount of the tax increase, and (iv) from the small number of tax units that will be affected by the complex interaction of the regular income tax system and the Individual Alternative Minimum Tax (AMT).^{18}
Successively higher income groups have greater shares of affected tax units and higher average changes among those affected, as shown in Table 5. Only 0.06 percent of tax units in the lowincome group would see a tax increase while about 18 percent of the tax units in the middleincome group and more than three quarters of those in the highincome group would pay more taxes. The average increase among units with a tax change in the lowincome group is just over $2,000. This increases to over $4,000 for the middleincome group and to more than $52,000 in the highincome group.
The small number of affected records in the lowincome group results in relatively large confidence intervals. The interval around the share of affected tax units ranges from 0.038 to 0.091, while the interval around the average tax cut among those affected ranges from $1,349 to $2,956. The confidence interval calculated from the normal approximation with BSE, ranging from $1,173 to $2,916, is slightly wider than the bootstrapped interval. The other income groups have much smaller confidence intervals and the normal approximation is much closer to the bootstrapped confidence intervals. Overall, the confidence intervals using normal approximation with BSE are similar to the bootstrapped confidence intervals.
5.4. Revenue neutral replacement of a deduction for farm losses with a refundable tax credit
Having found generally small confidence intervals in the above three cases, we examine two additional cases in which the confidence intervals might be more sensitive to policy changes. In this section, we estimate a revenueneutral change of replacing the currentlaw farm loss deduction with a refundable tax credit based on the total farm losses in 2019. Under the current law, tax units with a farm loss can deduct the loss up to a limit.^{19} The proposal would repeal this farm loss deduction but allow tax units to claim a refundable tax credit equal to 82.5% of the product of tax units’ statutory income tax rates and their total, i.e. unlimited, farm losses (the 82.5% factor brings about the proposal’s revenue neutrality.) Because the proposal is revenueneutral, it would generate both winners and losers. Specifically, this proposal would help tax units with large farm losses at the expense of those with small losses. However, it is not clear how it would affect tax units in different income groups.
We examine a proposal with farm losses because of the large sampling variation due to: (i) few tax units report a farm loss; (ii) there is wide variation in farm losses across observations; and (iii) the tax rate will have a tangible impact on those with a farm loss.^{20}
As shown in Table 6, this proposal would on average minimally reduce tax burden of tax units with income between $40,000 and $50,000, slightly increase tax burden of tax units with income between $500,000 and $1 million, and slightly reduce tax burden of tax units with income more than $1 million. By construction, the proposal would not change tax burden of tax units overall. Although the bootstrapped confidence intervals and the interval based on normal approximation are quite close in most cases, there are several exceptions.
Based on the bootstrapped confidence intervals, it is inconclusive whether the average change in the tax burden was positive or negative for tax units with incomes between $40,000 and $50,000 or between $500,000 and $1 million. In each of these income groups, some tax units would see an increase in their tax burden and others would see a decrease, leading to confidence intervals for the average tax change that include zero. On the other hand, the confidence interval for the average change in the tax burden for tax units with incomes of $1 million or more is strictly less than zero.
In contrast, the confidence interval assuming a normal distribution with BSE is strictly greater than zero while the bootstrapped confidence interval for tax units with incomes between $500,000 and $1 million straddles zero. Although the bootstrapped and normal confidence intervals are similar for average tax increases, they diverge for tax units facing a tax cut. For example, the bootstrapped confidence interval for tax units in the $500,000 to $1 million income group is [−8,022, −1,417] but [−6,541, −391] for the normal confidence interval. This leads to a bootstrapped confidence interval for the overall change of [−2.7, 32.6] which includes zero but a normal confidence interval of [1.0, 36.5] which omits zero.
5.5. Rescind the taxfree treatment of interest accruing from certain government bonds
Finally, we estimate the effect of rescinding the taxexempt status of interest from government bonds. Sampling variation in this case comes from three sources: (i) few tax units have taxexempt interest; (ii) there is wide variation in interest income across observations; and (iii) the tax rate will have a tangible impact on those with taxexempt interest income.
Municipal (i.e. state and local) bond interest is exempted from federal income tax. In 2018, there were $3.8 trillion in municipal bonds outstanding.^{21} The exemption primarily benefits higherincome individuals. In our dataset, 17% out of approximately 256,000 records receive tax exempt interest, representing 3% out of 175 million tax units in the population.^{22} The unweighted distribution of taxexempt interest income is extremely wide and asymmetric, with a median of $1,647 and a mean of $12,605.
As shown in Table 7, rescinding the taxexempt status of interest from government bonds would increase tax liabilities of the top income group the most – an average of $13,545 for those with ECI more than $1 million. We find that the confidence intervals are tightly wrapped around the point estimates for most income groups and that the normal approximation with BSE is very similar to the bootstrapped confidence interval.
However, for the group with income between $40,000 and $50,000, the confidence interval is not as tight, and the results are asymmetric, with the average tax increase among tax units with a tax increase being $383, and a confidence interval ranging from $174 to $816. In the addendum, the confidence interval calculated with the normal approximation is noticeably different, ranging from $104 to $662. The asymmetry of the bootstrapped confidence interval is caused by the small number of affected records and the skewness of the underlying distribution of interest income from taxexempt bonds. While more than 95 percent of records in this group have no taxexempt interest income, several records have substantial amounts of taxexempt interest income. If these records are included in the bootstrapped samples, then the resulting tax changes of the policy are extremely large and the normal approximation may not be accurate.
5.6. Discussion
Although the bootstrapped confidence intervals show that most point estimates of our analyses of the distributional impact of policy alternatives are precisely estimated, it also indicates that the model more precisely estimates some policy analyses than the others. For example, for tax units with income between $200,000 and $500,000, two policy alternatives would have almost the same magnitude of impact: rescinding taxexempt interest would increase average taxes by $2,693 among those with taxexempt interest income, and restoring the personal exemption by $4,000 would decrease average taxes by $2,497. However, with bootstrapped standard errors of 124 and 11, respectively, the confidence interval of the former is much wider than the latter.
Differences in the number of tax units affected in each income group only explains some of the differences in confidence intervals among the proposals. Differences in the source of income facing a change in tax treatment also plays a role, as do differences in demographic characteristics, such as the number of dependents, and interactions with other parts of the tax system, such as the Alternative Minimum Tax.
In most cases, the confidence interval constructed using the normal approximation with BSE is close to the bootstrapped confidence interval. However, it is not clear whether this finding holds if we calculate the standard error using a formula rather than a bootstrap. To explore this issue, Table 8 shows the bootstrapped confidence intervals and the confidence intervals calculated using GSSE for each proposal’s average tax change of all tax units, as well as both the BSE and the GSSE. The GSSE is comparable to BSE for proposals that affect many tax units, such as restoring the personal exemption and increasing the standard deduction. The GSSE is noticeably larger than the BSE for proposals that affect few tax units, such as increasing the top tax rate, rescinding taxfree treatment of certain bonds and replacing the farm loss deduction with a credit. In a similar fashion, confidence intervals calculated from the GSSE are comparable to the bootstrapped confidence intervals for proposals that affect many tax units and noticeably wider for proposals that affect few tax units.
6. Conclusion
The need for measuring the uncertainty of estimates from microsimulation modeling has been understood since its inception, yet there exists little research on the topic. In this article, we use a bootstrap technique to estimate the sampling variation from TPC’s microsimulation model of the U.S. individual income tax system. The bootstrapped exercises show that point estimates produced by the model are generally very precise. This supports the standard practice of not reporting associated standard errors and confidence intervals of point estimates. However, there are exceptions, and the methodology demonstrated in this paper can be used to address uncertainties of the point estimates, with the goal of providing adequate information for better policy formation.
Calculating confidence intervals using a normal approximation works best when many tax units are affected and the point estimates are precisely estimated. When calculating confidence intervals for income categories with a small number of impacted tax units, the normal approximation may differ from the bootstrapped confidence intervals.
Sampling variation is not the only source of uncertainty in microsimulation modeling. Cohen (1991) discusses other sources of uncertainty, such as the use of “various control totals and regression equations, from the use of imputation and statistical matching, from the use of demographic and macroeconomic projections, and from the use of aging modules.” Estimating the uncertainty arising from some of these sources is substantially more difficult than from sampling variation, and is a subject for future research.
Footnotes
1.
See Orcutt (1957). The concept applies to studies of not only households but also individuals or firms.
2.
A tax unit is an individual, or a married couple, that files a tax return or would file a tax return if their income were high enough, along with all dependents of that individual or married couple.
3.
See, for example, Paulus et al. (2019).
4.
According to Wikipedia, accessed June 19, 2019.
5.
This is also known as the delta method.
6.
TPC uses published tax data to calculate perreturn average growth rates for income, deduction, and other items between 2006 and 2011 by adjusted gross income (AGI) class. These growth rates are used to adjust the dollar amounts on each PUF record. Then, it uses a constrained optimization algorithm to reweight the records to match an extensive set of about 100 national targets and 39 to 51 state targets, depending on AGI classes, for both return counts and dollar amounts. The resulting file is referred to as the 2011 “Look Alike Public Use File” (LAPUF). Afterward, TPC adds information on other demographic characteristics and unreported sources of income by matching the LAPUF with data from the March 2012 Current Population Survey (CPS) of the U.S. Census Bureau. That match also generates a sample of individuals who do not file individual income tax returns (“nonfilers”). Finally, the tax model database contains imputations for wealth, education, consumption, health, and retirementrelated variables. The full tax model database is a representative national sample of the US population for calendar year 2011.
7.
This correction is made under the assumption that the bias is a constant. For biases that are a function of θ, a “biascorrected and accelerated” confidence interval may be used.
8.
If other values of m_{h} are used, the weights must be scaled to correct for a bias. This is of greater concern for very small datasets. See Rao and Wu (1988) for more information.
9.
It is 201 because it allows us to calculate percentiles precisely. For example, if we want to calculate percentiles (from 0 to 100) precisely, we need 101 points. Likewise, if we want to calculate 1/2 of percentiles (0 to 200) precisely, we need 201 points.
10.
As described in Kolenikov (2010), bs4rw “is an analogue of the official bootstrap command that uses the replicate weights instead of actually resampling the data in Stata memory”. The command produces two types of biased corrections for confidence intervals, biascorrected (BC) and biascorrected and accelerated (BCA). We opt for the BC option and hence implicitly assume that bias is not a function of the parameter of interest. It also helps that we use 201 alternative weights, a relatively large number of replicates which help ensure approximate normality.
11.
It took about 80 minutes to obtain bootstrapped statistics for all point estimates in the Tax Model’s standard summary distribution table using 201 alternative weights. In contrast, it took slightly more than ten hours (approximately 7.5 times longer) when using 1,001 alternative weights.
12.
The complete online appendix for this paper, including tables and links to Stata code, is available for downloading at the Urban Institute’s Data Catalog: https://datacatalog.urban.org/dataset/estimatingconfidenceintervalstaxmicrosimulationmodel
13.
14.
If the personal exemption had been entirely restored in 2019 it would be $4,200 using the chained CPI to index the exemption and $4,250 using the CPIU to index the exemption.
15.
In all the tables we present, tax cuts are shown as negative tax changes, and tax increases are shown as positive tax changes.
16.
These increases are for single filers and married couples filing separately. Heads of household and married couples receive 1.5 times and double of the singles’ increases, respectively.
17.
The TPC table TM180001 shows that under current law, the share of itemizers in 2018 is only 7 percent among those with income between $50,000 and $75,000, 47 percent among those with income between $200,000 and $500,000, and 82 percent for those who earn more than $1 million. https://www.taxpolicycenter.org/modelestimates/impactitemizeddeductionstaxcutsandjobsactjan2018/t180001impactnumber
18.
Some tax units are required to calculate their liability under the rules for the regular income tax and under the AMT rules and then pay the higher amount. TCJA reduced the number of affected tax units to only 200,000 filers in 2018. For more on the Alternative Minimum Tax, see http://www.taxpolicycenter.org/briefingbook/whatamt
19.
There are three relevant limits: (a) atrisk limits limiting deductions for losses from most business or incomeproducing activities including farming, (b) passive activity limits generally limiting deductions for losses from passive activities to not be greater than income from passive activities, (c) excess business loss limitation limiting deductions for losses of businesses including farming (losses above this limit can be carried over to the next tax year). See https://taxmap.irs.gov/taxmap/pubs/p225017.htm for more detail.
20.
In our dataset, 5,380 (2%) out of approximately 256,000 records reported a farm loss, representing 1.4 million (0.8%) out of 175 million tax units in the population. The losses range from $4 to $23 million.
21.
22.
Discussing tax incidence is beyond the scope of this paper, but Galper et al. (2013) show that the current distributional methodology does not capture the shift of benefits of taxexempt interest. Since the tax exemption affects the relative prices of taxable and taxexempt bonds, making yields on taxexempt bonds fall and yields on taxable bonds rise, the benefit of the exemption for holders of taxexempt bonds is overstated, while some of the benefit of the exemption accrues to holders of taxable bonds in the form of higher yields.
References

1
Improving Information for Social Policy Decisions  The Uses of Microsimulation Modeling: Volume I, Review and RecommendationsImproving information for social policy decisions: the uses of microsimulation modeling, In:, Improving Information for Social Policy Decisions  The Uses of Microsimulation Modeling: Volume I, Review and Recommendations, Washington, DC, The National Academies Press, 10.17226/1835.

2
Variance estimation of microsimulation models through sample reuseImproving information for social policy decisions: the uses of microsimulation modeling 2:237–254.

3
Confidence intervals for policy reforms in behavioral tax microsimulation modellingBulletin of Economic Research 59:37–65.

4
Assessing the reliability of microsimulation models using the bootstrap: an analysis of the sampling error when population is not infiniteEuropean Economic Association and Econometric Society European Meeting, London.

5
Who Benefits From TaxExempt Bonds? An Application of the Theory of Tax Incidence. Working PaperWashington, DC: UrbanBrookings Tax Policy Center.

6
The standard error of a weighted mean concentration—I. Bootstrapping vs other methodsAtmospheric Environment 29:1185–1193.https://doi.org/10.1016/13522310(94)00210C

7
Testing the statistical significance of microsimulation results: a pleaInternational Journal of Microsimulation 6:50–77.

8
Resampling variance estimation for complex survey dataThe Stata Journal: Promoting communications on statistics and Stata 10:165–199.https://doi.org/10.1177/1536867X1001000201

9
A new type of socioeconomic systemThe Review of Economics and Statistics 39:116–123.https://doi.org/10.2307/1928528

10
Indexing out of poverty? Fiscal drag and benefit erosion in crossnational perspectiveEUROMOD Working Paper Series EM 3/19, Indexing out of poverty? Fiscal drag and benefit erosion in crossnational perspective, https://www.euromod.ac.uk/sites/default/files/workingpapers/em319.pdf.

11
How reliable are microsimulation results? An analysis of the role of sampling error in a UK taxbenefit modelJournal of public economics 53:327–365.

12
Resampling inference with complex survey dataJournal of the American Statistical Association 83:231–241.https://doi.org/10.1080/01621459.1988.10478591
Article and author information
Author details
Funding
The authors are grateful for the support of this research by the Alfred P. Sloan Foundation grant G20179845.
Acknowledgements
We would like to thank Mark Mazur, Eric Toder and other staff at the Tax Policy Center for their help in this research.
Publication history
 Version of Record published: August 31, 2020 (version 1)
Copyright
© 2020, McClelland et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.