1. Dynamic microsimulation
  2. Taxes and benefits
Download icon

Dynamic Simulation of Taxes and Welfare Benefits by Database Imputation

  1. Justin van de Ven  Is a corresponding author
  2. Patryk Bronka
  3. Matteo Richiardi
  1. National Institute of Economic and Social Research, United Kingdom
  2. University of Essex, UK, United Kingdom
Research article
Cite this article as: J. van de Ven, P. Bronka, M. Richiardi; 2025; Dynamic Simulation of Taxes and Welfare Benefits by Database Imputation; International Journal of Microsimulation; 18(2); 124-155. doi: 10.34196/ijm.00326

Abstract

This paper proposes a new method for describing a realistic tax-benefit system in a dynamic microsimulation model. The suggested approach draws upon existing third-party tax-benefit calculators and provides an alternative to integration of tax functions within a dynamic structure. It has the advantages of adapting to the degree of household sector detail described by a model and permits taxes and benefits to be imputed directly from commonly available micro-data sources. A practical application to a dynamic programming model designed to reflect the contemporary UK policy context indicates that the proposed method for projecting taxes and benefits implies a comparable computational burden and generates qualitatively comparable results to a detailed functional description of fiscal policy, while simplifying model development.

1. Introduction

The scope and sophistication of models adopting a micro-approach for projecting household-sector dynamics has grown substantially during the past six decades. A noteworthy feature of such models is that they tend to use highly stylised functional forms to approximate the complexity of modern tax and benefit systems. Although this type of abstraction helps to simplify model specification, it also limits scope for contributing to public policy debate and exposes model projections to distortions that may be difficult to anticipate. This study demonstrates a new method that simplifies the task of projecting realistic tax and benefit payments in a dynamic heterogenous household model.

The modelling approach described in this study delegates the task of reflecting tax and benefit policy to widely available third-party data sources. This task can represent a very significant developmental hurdle in context of complex tax and benefit regulations, particularly in dynamic contexts where such regulations are also subject to frequent change. The suggested approach allows the modeller to leverage the expertise of policy specialists to obtain a detailed approximation of a tax and benefit system, which may be readily modified and shared between diverse model structures.

While our imputation method can be used in conjunction with any degree of household heterogeneity that is simulated, the precision of imputed taxes and benefits increases with simulated household detail. It is our hope that simplifying the problem of modelling the incidence of a tax and benefit system will facilitate increasingly detailed descriptions for households, improving the realism of simulated projections. Such a transition may prompt parallel advances in economic modelling, from macro agent-based models to life-cycle models, dynamic microsimulations, and dynamic programming models.

The suggested approach is demonstrated via practical application to a structural dynamic microsimulation model designed to reflect the UK policy context. Results are reported that show the method is capable of imposing a comparable computational burden, and generating similar simulated projections, to a detailed functional description of contemporary UK tax and benefits policy.

1.1. Scope for application

Economic studies based on dynamic projections of heterogeneity in the household sector often abstract from demographic detail. For instance, macroeconomic agent-based models (Axtell and Farmer, 2022) typically explore household-sector interactions with firms and banks via labour, capital, and product markets. In this context, agents in the household sector are typically distinguished by their skills/earnings potential, capital/savings, and the firms with which they interact; demographic characteristics are ignored.1 With household demographics ignored, these studies often focus on after-tax incomes, or adopt highly stylised representations of existing tax and benefits policy.2

Household demographic detail is usually omitted from dynamic models with heterogenous agents to mitigate computational complexity and focus on the principal subjects of interest. Yet omitting this detail comes at a cost. Household characteristics that have been identified by the literature as helping to explain empirical regularities of labour/leisure and consumption/savings decisions (in addition to education, skills and wealth) include age, relationship status, dependent children and health.3 Omitting such demographic characteristics can consequently detract from the practical relevance of results, particularly in context of seminal trends in cohabitation, fertility, and healthy life-expectancy.

Given the potential for demographics to influence key margins of economic decision-making, demographic characteristics are likely to feature more prominently in prospective dynamic micro-analytic studies of the household. This trend was anticipated by Huggett (1996), who outlines a research programme (attributed to Atkinson, 1971), in which a ‘basic’ life-cycle framework is augmented to account for “(1) earnings, health and longevity uncertainty, (2) household structure, (3) institutional features such as social security, income taxation, and social insurance, and (4) market features such as borrowing constraints and the absence of some insurance markets” (p. 470). The order of the research programme outlined by Huggett (1996) is important, with “household structure” added before “institutional features”, thereby suppling the demographic detail upon which tax and benefit payments commonly depend.4

The method for projecting tax and benefits payments outlined in this study is designed to facilitate point (3) of the research programme set out by Huggett (1996). The objective is to simplify the task of reflecting realistic tax and benefit systems in a way that can be adapted to increasing micro-detail reflected by a simulation model (points 1 and 2 of Huggett). Taken together, these features can then be used to improve our understanding of “market features” referred to in point (4) by Huggett, extending ultimately to the workings of the macro-economy. How do caring responsibilities feed into economic insecurity? What are the implications of alternative funding models for tertiary education on inequality of lifetime incomes? What bearing has the evolution of tax and transfer policy on employment incentives? Realistic descriptions of tax and benefit policy lie at the heart of many questions such as the above, that are of both academic interest and of vital public importance.

1.2. Integration with third-party tax-benefit calculators

The method for projecting tax and benefit payments proposed in this paper was devised to permit a dynamic heterogenous household-sector model to draw upon existing third-party tax-benefit calculators (sometimes referred to as static, non-behavioural microsimulation models), but it applies more generally to any micro-data source (including household surveys) that contain information on both gross and net incomes Well-maintained tax-benefit calculators are now publicly available for many countries,5 and their development involves an appreciably different skill-set to the development of dynamic micro-analytic models more generally. Although the concept of combining complementary model functionality is far from new (Richiardi, 2017), we are aware of just a handful of examples where a static tax-benefit calculator has been integrated with a dynamic microsimulation context (O’Donoghue, 2001; Blundell et al., 2016; Blundell et al., 2021; Spielauer et al., 2020; Liégeois and Dekkers, 2014; Liégeois, 2021). At least two considerations may help to explain this lack of integration.

First, relying on third-party inputs for key model components implies a loss of control. For instance, a static microsimulation model for taxes and benefits may not be equipped to reflect intertemporal policy variation or explore potential subjects of interest in a dynamic model. Likewise, concerns may arise regarding the on-going maintenance of the candidate static simulation model that exaggerate risks associated with long-term research investments in dynamic model structures.

Secondly, integrating a dynamic model with an existing static model of taxes and benefits introduces practical difficulties concerning communications between possibly diverse model frameworks. These problems may appear technically insurmountable, especially where simulation times are a pressing concern (as in the dynamic programming literature – see Rust, 2008).

In a series of papers, Blundell and co-authors integrate dynamic programming models of household decision making with a stand-alone tax benefit calculator (Blundell et al., 2016; Blundell et al., 2021). The tax benefit calculator (Fortax – see Shaw, 2011) provides a functional description of the UK tax and benefit system. The authors directly “embed” this calculator within their structural models to explore the incentive effects of changes to tax and benefits policy.

Embedding one model structure within another, as described by Blundell et al., is arguably the most obvious way of integrating functionality from alternative model frameworks. However, the feasibility of such an approach depends upon model architecture(s) and programming language(s).

Liégeois (2021) introduces an interesting alternative perspective to this approach by exploring the feasibility of augmenting a tax-benefit calculator (EUROMOD, Sutherland and Figari, 2013) with dynamic model functionality based on the LIAM2 framework (de Menten et al., 2014). In the case of Liégeois (2021), interactions between the two considered model structures are facilitated by architectural similarities shared by EUROMOD and LIAM2. Unfortunately, complementarities of this sort can be difficult to identify ex ante, and difficult to maintain ex post. Furthermore, even where two models share similar programming architecture, they may differ along other dimensions – including periodicity of projected income flows and projected characteristics – which complicate associated integrations between them (see Liégeois for further discussion).

Spielauer et al. (2020) describe an approach for combining functionality from disparate models that avoids the complications discussed above. Specifically, Spielauer et al. document a continuous time dynamic microsimulation model that projects a range of individual-specific (non-financial) characteristics, including age, education, relationship status, fertility, and health. Spielauer et al. note that the dynamic population projections generated by their model can be used to define weights for imputing associated financial statistics from the tax benefit calculator EUROMOD via static aging methods.

Our approach shares with Spielauer et al. (2020) the objective of integrating third-party inputs for projecting tax and benefit payments within a dynamic framework. In common with Spielauer et al. (2020), the approach that we propose requires no direct (i.e. run-time) communication between alternative model structures. Furthermore, in common with Spielauer et al., the approach that we propose uses one model structure to generate a database that is read as input by another model structure.

However, Spielauer et al. (2020) use a dynamic model to “age” the microdata generated by a static tax-benefit calculator (by adjusting sample weights), we envisage using the output of a tax-benefit calculator to impute transfer payments for each agent at each point in time projected by a dynamic model. More specifically, we propose using a third-party source to obtain a reference database that describes “family specific” characteristics, including private (pre-tax and benefit) and disposable (post-tax and benefit) income. This database is loaded into the dynamic heterogeneous household sector model and used in a similar fashion to a look-up table for evaluating transfer payments whenever needed, given assumptions about future policy developments.6

The approach that we suggest has a number of practical benefits. Relative to the approach of Spielauer et al., our approach facilitates the production of panel data rather than temporally dynamic cross-sectional snap-shots. As discussed above, the approach allows the model developer to delegate the task of capturing fiscal policy to dedicated specialists. Furthermore, the micro-data that we suggest be loaded as input into a dynamic model takes a standard form, facilitating reference to alternative static microsimulation models, in addition to commonly available survey microdata with the relevant income information.7 The approach can also be adapted to reflect temporal transitions, including growth of tax and benefit rates and thresholds, which may be important in dynamic contexts: most simply, changes in the tax-benefit rules over time can be accommodated by loading multiple year-specific look-up tables.

The remainder of the paper is organised as follows. Section 2 describes and justifies principal features of the proposed modelling approach. A practical implementation of the proposed approach is described in Section 3, and an empirical evaluation of the practical implementation is reported in Section 4. Section 5 concludes. Source code that implements the proposed approach is provided in the Appendix, as are details of how to replicate the reported analysis.

2. Proposed Methodology

The need to impute tax and benefit payments in a simulation model is comparable to a missing variable problem. “Hot deck” imputation is a common approach used to replace “missing values of one or more variables for a non-respondent (called the recipient) with observed values from a respondent (the donor) that is similar to the non-respondent with respect to characteristics observed by both cases” (Andridge and Little, 2010). Part of the appeal of hot deck imputation is that it delivers imputed values that are guaranteed to be feasible (with respect to the survey population) and mitigates some of the risks of misspecification associated with regression-based methods.8 It also facilitates ex post validation via comparison of the characteristics of donors and their recipients.

2.1. Theoretical framework

Consider the functioning of a real-world tax-benefit system as a collection of mappings from the characteristics of the fiscal unit, Xi, to some (positive or negative) transfer, pi:

(1) pi=f(Xi)

where i is the fiscal unit (time subscripts are omitted to simplify notation). Modelling tax-benefit payments typically involves proxying Equation 1 by:

(2) pi=f(Zi)

where Z represents the model proxy for characteristics X and f~(.) is a simulation proxy for the real-world application of fiscal rules f(.).

Projecting tax-benefit payments in a dynamic simulation model, D, involves coding Equation 2, where the arguments of the function are limited to those included in the model: pD,i=fD(ZD,i). Similarly, a (static) tax-benefit calculator involves coding: pS,i=fS(ZS,i).

Static tax-benefit calculators typically enjoy two key advantages, relative to dynamic simulation models, in reflecting real-world transfer policy. First, the set of characteristics used to impute transfer payments can generally be more flexibly chosen to reflect features of transfer policy, so that ZDZSZ.9 This point must be qualified, however, as dynamic models may generate characteristics, like contributions histories, that have an important bearing on transfer payments but are poorly described by cross-sectional micro-data.10 Secondly, greater effort can be expended in a static tax-benefit setting in specifying assumed functions to capture real-world policy detail, such that: |fS(.)f(.)||fD(.)f(.)|.

Our approach envisages using a tax-benefit calculator to compute pS,i for a reference dataset, and then assign tax-benefit payments in a dynamic model by using matching methods to draw on the reference dataset based on the common variables ZC(ZDZS). It should be clear from the above discussion that the ability of the suggested approach to capture realistic transfer payments will depend upon:

i) The comprehensiveness of the set of input variables considered by the static tax-benefit calculator, ZS;

ii) The set of common variables used for imputation, ZC=ZSZD;

iii) The realism of the static functional description, fS(.);

The matching methods considered for imputation.

Concerning points (i) and (ii), where ZDZS (as discussed above), then ZCZD so that the approach we envisage suffers no loss of detail in reflecting the incidence of transfer payments. In this case, the envisaged approach will benefit to the extent that the tax-benefit calculator implements a more realistic function than would otherwise be implemented in the dynamic model: |fS(.)f(.)||fD(.)f(.)| addressing point (iii). The final point (iv) concerns the realism associated with methods for imputing missing data. Each of the steps of the suggested method are discussed in turn below.

2.2. Specification of the reference database

A common starting point for static taxes-benefit models is a publicly available microdata set. This dataset describes the set of input variables denoted ZS in the preceding subsection. The input data include all of the information used by a static tax-benefit model to distinguish transfer payments between individual agents. “Agents” in this context typically refers to people, where the dataset also includes links defining a range of demographic relationships between them. These relationships commonly permit identification of nuclear families (benefit units) – between dependent children and their parents and between cohabiting spouses – and may also indicate where multiple nuclear families share the same accommodation (households).

In the UK, for example, both the Intra-Governmental Tax and Benefit Model (IGOTM, HM Treasury) and TAXBEN (Institute for Fiscal Studies) load in microdata for variables derived from the Living Costs and Food Survey, whereas the Policy Simulation Model (PSM, Department for Work and Pensions) and UKMOD (based on the EUROMOD framework) load in data from the Family Resources Survey. Static tax-benefit models typically generate a range of tax and benefit statistics for each agent represented in the starting microdata set, I, and report results in the form of an augmented data set, (ZS,i,pi)iI.

The method we propose involves importing the augmented datasets generated by static microsimulation models – or equivalent survey data – for imputing tax and benefit payments in a dynamic model.11 The augmented datasets generated by static tax benefit microsimulation models tend to share common structural features, which facilitates switching between them. Furthermore, most static models of taxes and benefits include tools that permit analysis of user-defined case-studies (Hufkens et al., 2019), which may be used to tailor the starting database to key subjects of interest.

2.3. Matching methods

The econometric literature presents a range of alternative approaches that could be used to impute individual specific proxies for tax and benefit payments from a reference database.12 The approach proposed here uses matching methods to identify (donor) individuals in the reference database that possess comparable characteristics to (recipient) simulated agents, in common with hot deck imputation and the KNN (K-Nearest Neighbours) machine learning algorithm.

Best practice requires that the assumed matching method be tailored to the subject of interest (Stuart, 2010). Allocation of the treatment (evaluation of transfer payments by the static microsimulation model) is universal in the current context, which precludes use of propensity score matching.

The matching method should ensure that there is a strict delineation with respect to forms of population heterogeneity that have a substantial bearing on tax and benefit payments (e.g., adult marriage/cohabitation status, health status, and retirement), suggesting the use of exact matching methods over selected dimensions. Nevertheless, the method should also guarantee that a match is obtained for all conceivable simulated circumstances, which motivates interest in coarsened exact matching (Iacus et al., 2009).

Furthermore, the method should seek to limit differences in relation to a set of selected continuous variables. This last point recognises the need to approximate incomes, while acknowledging the impossibility of exact matches to all conceivable measures of income.

Finally, factors that affect tax and benefit payments but are not included in a model – such as imperfect benefits take-up, tax avoidance and tax evasion (whether sampled or simulated) – can result in noisy observations reported by a reference database. The matching method should consequently limit exposure to “high influence” observations (outliers).

In response to the above considerations, Spielauer et al. (2020) apply coarsened exact matching of discrete valued characteristics to adjust the weights of a database derived from a static tax-benefit model. This involves matching each observation in a dynamic simulation to an observation in the reference database that has the same set of discrete characteristics.

An advantage of this approach is that it can be readily tailored to the level of detail described by a dynamic model, and adapted as the simulated detail is expanded.

Furthermore, the approach permits considerable pre-processing to be conducted on the considered database, which helps to limit the computational overhead associated with the matching process. Specifically, it is possible to “pre-categorise” each observation in the reference database with respect to the feasible combinations of the discrete valued characteristics considered for matching. This implies that matching a dynamically generated individual involves identifying their respective subgroup, and then selecting an observation from the pre-categorised database observations of the respective group.

Here we propose extending on Spielauer et al. (2020) in two ways. We require a matching process that is guaranteed to identify a match for all individuals projected in a dynamic simulation while at the same time not wanting to sacrifice the degree of detail used for the matching process. Our first innovation to Spielauer et al. is designed to achieve this balance by using stratified coarsened exact matching over discrete valued characteristics. That is, we suggest use of multiple “levels” of exact matching, starting at the most fine-grained (discrete) set of matching criteria, and then only proceeding to more coarse grained matching criteria if a match is not otherwise identified.

Our second innovation to Spielauer et al. (2020) recognises that continuous valued variables, like private income, are important for determining tax benefit payments in addition to discrete valued variables. Spielauer et al. do not account for continuous variables as their model did not project these. That is not the case in dynamic microsimulation models more generally, and we recommend nearest-neighbour matching where continuous variables that are relevant for determining tax and benefit payments are projected by a dynamic model.

Where the set of continuous variables that could be included for matching extends beyond a single characteristic (e.g. formal childcare costs in addition to private income, and/or income splitting), then we envisage use of a method to reduce the dimensionality of the problem to a single distance. Here we suggest use of Mahalanobis distance matching (MDM – see Rubin, 1980), although it may be noted that Euclidean distance is a more popular method considered elsewhere in the literature (e.g. the KNN algorithm).

The computational overhead associated with matching continuous valued variables can also be economised in the pre-processing stage that is referred to above. Specifically, the tax database observations within pre-categorised subgroups can be stored in ascending order of the continuous variables included in the matching process. Further details concerning the practical implementation of the proposed matching procedure are provided for the example that is discussed in Section 3 (see Section 3.2.2).

2.4. Imputing taxes and benefits

The problem of imputing taxes and benefits for a dynamic model is complicated by the potential for temporal variation of the policy environment. In context of trend (real) wage growth, for example, holding the description of taxes and benefits fixed through time can result in widespread bracket creep (Bohanon, 1983), and declining relevance of welfare safety-nets. This section begins by discussing issues relating to imputation of taxes and benefits at a point in time, before proceeding to discuss how the approach can be adapted to reflect alternative forms of temporal policy variation. The section finishes with discussion of how the method can be adapted to account for policy variation beyond the scope of a reference database.

2.4.1. Imputations at a point in time

Having matched the characteristics of a benefit unit in a dynamic model to those in a reference database, as discussed in Section 2.2, the objective is to impute taxes and benefits in a way that takes into consideration the potential for inexact matching of continuous variables, including private (pre-tax and benefit) incomes. Suppose, for example, that the set of continuous variables is limited to private income, and that a benefit unit in the dynamic model, denoted A, is projected to have private income worth £531 per week. Suppose also that the closest observation to this income described by the reference database (after coarse-exact matching), denoted B, was £535 per week. How should disposable income be imputed from the reference database to account for the £4 per week difference in private incomes?

One approach would be to match each benefit unit in the dynamic model to multiple units in the reference database and use interpolation methods for imputation. In the above example, the next closest observation described by the reference database, denoted C, might have private income of £525 per week (£6 per week less than our target individual). In this case, using linear interpolation would equate disposable income for unit A to 0.6 times disposable income of observation B plus 0.4 times disposable income of observation C.13

We reject the interpolation approach described above in favour of an approach that permits each simulated unit to be matched to a single unit in the reference database. The motivation for this preference is that it facilitates post-simulation diagnosis and validation of imputed transfer payments.14

Given the above, the problem is how to impute taxes and benefits for a simulated unit from the taxes and benefits reported by a reference database for a (single) matched unit in a way that adjusts for potential differences in private incomes. Where the private incomes of both matched observations (simulated and database) are non-zero, transfer payments can be imputed by assuming that the ratio of transfer payments to private incomes is the same for matched individuals. That is, if xi and pi denote, respectively, the private income and transfer payment of individual i, then pD=pSxSxD, with D denoting the unit simulated in the dynamic model and S the respective matched unit from the reference database.

As x tends toward zero, the fact that p/x tends toward infinity complicates use of the ratio suggested above for imputation. We consequently propose a two-step process for matching individuals by private income. In the first step, coarsened exact matching is used to group individuals by broad income bands; e.g. low, middle, and high private incomes. In the second step, weighted nearest-neighbour matching is employed. For individuals in the lowest private income grouping, transfer payments of simulated individuals are set equal to those of the matched individuals identified from the reference database, appropriately uprated to consider time differences between the static and dynamic models; that is pD=pS. Otherwise, the ratio method described in the preceding paragraph is used.

2.4.2. Accommodating policy variation through time

If the policy environment is assumed constant, the use of a single tax database as described above is sufficient to model the evolution of taxes and benefits through time.15 However, our method also accommodates (real) policy variation over time. It is (conceptually) possible to consider a separate reference database for an arbitrarily large number of discrete time periods. This approach provides the analyst with tight control over the assumed temporal evolution of tax and benefit policy.16

Between the two extremes of using the same reference database for all simulated years and a different reference database for each year, intermediate solutions are also possible that trade off flexibility for speed of execution. In simulation contexts where execution speed remains a concern, changes between policy systems can be implemented as discrete jumps (i.e. the most recent policy system is used, with a step change when a new policy system is introduced), or smoothed using interpolation methods.17 Suppose, for example, that yi,t denotes disposable (post-tax and benefit) income of unit i at time t. Two databases are used to impute disposable income, one that would apply at time t=0, and another at time t=T. Then assuming geometric interpolation between the two databases, we have:

yi,t=yi,0.(yi,Tyi,0)t/T

This approach can be implemented without increasing the computational demands associated with evaluation of database matching, by limiting variation between the two reference databases to the reported taxes and benefits.18 An added advantage of this limitation is that it would facilitate post-simulation diagnosis of imputed transfer payments, as each unit in the dynamic model would be matched to the same observation in both reference databases.

2.4.3. Accommodating policy variation beyond the scope of a reference database

Many dynamic simulations are currently concerned with features of the evolving economy that may not be accommodated in a reference database. The incidence and costs of long-term care in an aging population, costs of aging public infrastructure, and environmental degradation are all currently subjects of keen public interest, and effects of interest are largely beyond the scope of most existing static tax-benefit models.

Where analysis focusses on fiscal policy that is beyond the scope of a reference database, then it remains possible to augment transfer payments imputed from the database with internally programmed functions that represent the subjects of interest. Suppose, for example, that we are interested in the fiscal reforms necessary to support the costs of long-term care. We have access to a static tax-benefit model (e.g. EUROMOD) that provides a detailed description of the direct tax and transfer system, but omits long-term care costs. Suppose also that we are able to obtain functions that provide plausible projections for the costs of long-term care.

In this context, we can use a database derived from the static tax-benefit model to project much of the tax and transfer system, and supplement this with the functional descriptions for the costs of long-term care. Given the projected costs of long-term care, we can then explore alternative fiscal reform packages capable of supporting the long-term care costs. These reform packages could be explored by obtaining alternative counterfactual databases from the static tax-benefit model, or from additional functions devised specifically to reflect the fiscal reforms of interest. The point is that use of a database to impute taxes and benefits does not preclude the use of dedicated functions to reflect particular fiscal policies of interest.

3. Practical Implementation

LINDA is a dynamic microsimulation model designed to reflect the contemporary UK policy context. The model includes a detailed functional description of the UK tax-benefit system in April 2016 including assumptions for how benefit rates and tax thresholds evolve through time.19 The current section describes how LINDA was adapted to undertake a validation analysis of the proposed method for projecting tax and benefit payments, which is reported in Section 4.

Following a standard validation strategy, we assume that the functional description of tax and benefit policy included in LINDA is correctly specified, and that the distributions of taxes and benefits simulated using the model are “pseudo-true”. We then used data generated by the model based on the functional description for taxes and benefits to create a tax database. This database reflects the type of data that could be obtained from a standard tax-benefit microsimulation model, such as UKMOD. Referring to Equations 1; 2 in Section 2.1, f(Xi)=fD(ZD,i)= fS(ZS,i)=p, and Xi=ZD,i= ZS,i. Finally, LINDA was amended to project tax and benefit payments based upon the tax database using the methodology outlined in Section 2.

The remainder of this section provides some further technical detail of LINDA and how the methodology described in Section 2 was practically implemented. Section 4 returns to describe how the implementation described here was used to validate the proposed method for projecting transfer payments.

3.1. Model overview

LINDA presents a useful case-study for considering the proposed method for imputing tax and benefit payments because the model is expressly designed to explore behavioural responses to policy alternatives, and associated computational burden is a pressing issue of concern.20 As LINDA is not the focus of the current paper, a brief overview of relevant aspects of the model is provided here, and the interested reader is referred to van de Ven (2017a) for technical detail.

LINDA projects panel data at annual intervals for an evolving sample of simulated adults. The decision-making unit assumed by the model is the “benefit unit”, defined as a single adult or partner couple and their dependent children. The specification of the model considered for the current study projects consumption, private pension scheme participation, and labour supply decisions as though these are made to maximise expected lifetime utility, where utility takes a nested Constant Elasticity of Substitution form.

The decision of whether to contribute to a private pension in each year is limited to individuals who choose to work, and pension contributions are defined as a fixed proportion of labour income. Labour supply is selected from three discrete alternatives for each simulated adult in each year, representing, full-time, part-time, and non-employment. The decision to supply labour involves a trade-off between leisure time and cash available for (non-durable) consumption. Consumption is chosen with respect to a standard budget constraint, with the upper bound to (unsecured) loans defined by each individual’s minimum net present value of all potential future streams of disposable (post-tax and benefit) income.

The model projects relationship status, fertility, and mortality for each adult in each year, all of which are considered to evolve with uncertainty from one year to the next. Wage potential is assumed to evolve as a random walk with drift, and pension and non-pension wealth both evolve with certainty based on standard accounting identities.

Starting with data reported by the Wealth and Assets Survey for the UK population cross-section in 2017, the model is designed to generate panel data forward and backward through time. Entry and exit from the simulated population are designed to reflect forward projections for the evolving population cross-section. The model was parameterised to survey data reported up to 2017, following the method described in van de Ven (2017b).

3.2. Practical implementation of proposed approach

LINDA was amended for the current study to allow taxes and benefits to be projected using the database approach described in Section 2. The revised code permits taxes and benefits to be projected by LINDA using a wide range of data sources, including databases supplied by IGOTM and UKMOD. Details of the implementation are provided below, and associated programming code can be accessed as described in Appendix A.2.

3.2.1. Importing the tax-benefit database

LINDA was adapted with parameters and routines that permit data to be imported for up to two reference databases. Data for each database can be supplied in up to three files, which may distinguish between alternative units of analysis (household, tax unit, and individual level data). The files are assumed to be saved in comma-separated-variable format, as this format is supported by a wide range of statistical software packages. The first row of each file is assumed to provide variable names, which are used by LINDA to manipulate the data.

3.2.2. Matching methods

As discussed in Section 2.3, multiple “levels” of matching are used to balance the countervailing objectives of obtaining matches between simulated and database observations that are as similar as possible, while also guaranteeing that a match is obtained for all observations generated by a dynamic microsimulation. LINDA uses benefit unit specific characteristics to identify three nested “coarsened exact matching levels”. The first matching level, which is most detailed, distinguishes between:

  • three age categories for the benefit unit reference person: under 45 (child-rearing), under state pension age, and post state pension age

  • relationship status of the benefit unit: single, couple

  • number of children under 5 (schooling age) in the benefit unit (maximum 2)

  • number of children aged 5+ in the benefit unit (maximum 3)

  • renter status (distinguishing renters from non-renters)

  • three labour categories for each adult in the benefit unit (not employed, part-time employed, full-time employed)

  • and three private income categories for the benefit unit: under £225 per week (low), under £710 per week (middle), £710+ per week (high).

The second (intermediate) matching level considers the same population division as the first, but aggregates the two lower age categories, considers a maximum of one child under schooling age, does not distinguish between renters and non-renters, and distinguishes only between employed and not employed adults. This reduction in detail considered for the matching process helps to match a wider set of observations projected by the dynamic model, at the expense of accepting greater diversity between matched observations. Finally, the third (most coarse) matching level considers the same population categorisation as the second but does not distinguish between child age (subject to a maximum of 3 children) and ignores differences by employment status.

It may be noted that the matching method described above makes no allowance for a range of factors that commonly influence transfer payments, including contribution and wage histories, health status, care needs, and disability.21 The set of characteristics included for the coarse exact matching reflects those that are explicitly distinguished by the specification of LINDA considered here (discussed in Section 3.1). As discussed in Section 2, any influence that omissions from the matching routine have on tax and benefit payments described by a reference database will appear as random innovations in the transfer payments simulated by the model. The effects of these features will consequently be indistinguishable from (unexplained) imperfect benefits take-up, tax avoidance, and tax evasion.

Step 1: Starting with the first matching level described above, LINDA checks whether the respective subgroup implied by an observation’s characteristics, as reported by the reference database, is non-empty. If so, then the model proceeds to the next step in the procedure; if not, then the second, and finally the third matching level is consulted to find a matching sub-population. The third matching level is designed to be sufficiently crude that all implied subgroups are non-empty.

Step 2: LINDA searches through the matching subgroup identified in step (1) to select a “candidate pool” of observations that share the closest taxable incomes to the targeted individual. This step identifies a pool of candidates, rather than the single closest individual, to mitigate difficulties that might otherwise arise due to disparate measures of taxes and benefits reported for otherwise similar individuals by the reference database (due, for example, to imperfect take-up rates, tax avoidance, or tax evasion).

Candidates are drawn from the subgroup identified in step (1) based on the proximity of their taxable incomes with respect to the targeted individual. Up to four measures of private income are accommodated for individuals within the “candidate pool”. Each candidate included in the pool is associated with a selection weight that combines their sample weight in the reference database with a factor that is inversely proportional to the difference between their taxable income and that of the targeted individual (subject to an income disregard), and to the total number of candidates.

Consider, for example, a targeted individual with taxable income of 10, where the coarsened exact matched subgroup from the reference database (identified in step 1) is comprised of individuals with taxable incomes in the set (6, 7, 9, 11, 12, 12, 14). Then, only individuals in the matched subgroup with incomes in the set of four income values (7, 9, 11, 12) would be included in the candidate pool.22 In this case, with an income disregard of 1 and equal sample weights, the selection weight would be 0.1 for the candidate with income 7, 0.3 for each of the candidates with incomes 9 and 11, and 0.15 for the two candidates with income of 12.23

Step 3: Given the candidate pool and associated selection weights as described in step (2), LINDA is adapted to use one of two methods to impute transfer payments. It can include all candidates for the imputation, with the contribution of each candidate defined by their respective weight. Alternatively, it can select a single donor by random draw for each individual at each point in time, with probabilities of selection reflecting each candidate’s selection weight.

3.2.3. Optimising the matching method

LINDA starts by distinguishing, for each “matching level” (of three, see Section 3.2.2), the coarsened exact matching subgroup of each observation described by data loaded in for a reference database. Within each subgroup, observations are also ranked by their respective private (pre-tax and benefit) incomes. These results are stored in a three-dimensional matrix, M(k,g,r), distinguishing the matching category, k{1,2,3}, the category subgroup, g, and the income rank, r.

Consider any observation projected by LINDA, i, for which tax and benefit payments need to be imputed. Starting from the most detailed matching level, k=1, and the vector of observation i’s characteristics (age, relationship status, dependent children, etc.), vi, a function is used to identify the respective subgroup, gk,i=f(k,vi). If the respective matching subgroup is empty, M(k,gk,i,.)=0, then the next level, k+1, is considered. Otherwise, starting from j=1, LINDA compares the private income of observation i, xi, with the private income of the database observation M(k,gk,i,j), xj. If xi>xj, then LINDA proceeds to observation j+1. Nearest neighbours to include in the candidate pool are selected about the first rank where xixj or where the matching subgroup M(k,gk,i,.) is exhausted (whichever comes first).

4. Validation

As previously mentioned, we assume that the specification of LINDA based on a functional description of tax and benefit payments is correct, and that associated simulated data derived from the model are “pseudo-true”. Given these assumptions, we use projections from LINDA to create a tax database as described in Section 2.1. This tax database mimics information that could be derived from survey data or a standard tax-benefit calculator, such as UKMOD.

4.1. Analytical approach

The starting point for the current analysis is a functional description of tax and benefit payments that was implemented in LINDA. This function was designed to reflect UK policy applicable in April 2016, focussing on variation of payments with respect to the number and age of household members. Tax thresholds and retirement benefits were also assumed to evolve through time in line with real wage growth (1.29% per annum, see Appendix A.1 for details). Given this specification for tax and benefit payments, LINDA was used to simulate panel data for adults in the evolving UK population cross-section between 2017 and 2081. This simulation is referred to below as the “function base scenario”.

After evaluating the function base scenario, a “function reform scenario” was simulated. The function reform scenario was specified to be identical to the base scenario, except that all rates of income tax were increased by 10 percentage points. Effects of this reform were then evaluated by subtracting summary statistics evaluated for the function base scenario from the same statistics evaluated for the function reform scenario.

The above analysis was then repeated under identical assumptions, apart from replacing the internally programmed tax and benefit function with database approximations as described in Sections 2 and 3. This involved identifying reference databases for two arbitrarily selected years (2017 and 2057) for each tax and benefit function (base and reform). Two reference databases were required for each tax (and benefit) function to capture the real growth assumed for tax rates and retirement benefits, following the approach described in Section 2.3.2.

All four reference databases (base and reform, 2017 and 2057) were populated from the panel data generated assuming the “function scenarios” described above. For example, the database for 2017 used to approximate the function base scenario was populated with data generated for 2017 under the function base scenario.

This analysis was carefully designed to facilitate evaluation of the database method for imputing tax and benefit payments. First, imputing tax and benefit payments using databases populated with data derived from specific tax and benefit functions permits clear conclusions to be drawn. That is, any differences between projections generated using the function and database descriptions for tax and benefit payments can be interpreted unambiguously as distortions associated with the database method. Such differences arise because of imperfect matches between the simulated agents and records described by the database, and to the variables used for matching donors and recipients, which do not include all of the arguments of the considered tax functions (Section 2.1, point iv).

Secondly, the fact that the assumed tax function allows for trend variation in tax thresholds and retirement benefits allows the current analysis to assess the efficacy of the method outlined in Section 2.3.2 for accommodating temporal variation using the database approach.

Finally, use of a structural model of decision-making to consider the effects of a policy reform permits an assessment of the ability of the database method to capture behavioural incentives. Such incentives are interesting to consider here because they are likely to reflect margins of policy detail of interest.

The current validation exercise consequently focusses on the ability of matching methods to reflect a known functional description for transfer payments. This focus concerns the key relative weakness of the proposed methodology, relative to functional descriptions for transfer policy. It is consequently important to bear in mind that the current analysis does not concern the relative advantages that motivate the proposed methodology: greater flexibility in defining input data for a dedicated static tax-benefit model (points i and ii in Section 2.1), and delegation of functional representation of evolving tax and benefit policy to policy specialists (iii in Section 2.1).

Simulation run-times

Relative to the functional description for transfer payments, the database method for imputing net transfer payments required almost identical run-times, for both the base and reform scenarios. Related statistics are reported in Appendix C.1.

4.2. Transfer payments for a population cross-section

A series of test statistics for alternative population cross-sections were evaluated to explore the correspondence between net tax and benefit payments imputed from the functional and database descriptions for policy. Representative statistics from this analysis are reported in Table 1, focussing here on simulated data projected by LINDA for 2037. The year 2037 is interesting, because it is intermediate to the two years for which database descriptions for policy were explicitly supplied (2017 and 2057). The correspondence between the functional and database descriptions for tax and benefit payments evaluated for 2037 consequently reflect both the description of policy in the considered reference years (2017 and 2057), and the (geometric) interpolation methods applied between them (described in Section 2.3.2).

Table 1
Correspondence between net tax burden imputed using functional and database descriptions for transfer payments of simulated population cross-section in 2037.
meancorrelation coefficientmean differencestandard deviation of differencemean absolute difference
base policy scenario
full population84.260.930-12.0798.6836.93
single working aged no children44.630.9601.0158.1023.27
single working aged with children84.730.9713.50104.3455.34
couple working aged no children212.750.816-72.01191.7785.24
couple working aged with children307.990.957-0.32128.6657.83
single pensioner2.390.807-13.5245.8822.00
couple pensioner33.360.786-22.3470.0826.08
reform scenario (all rates of income tax increased by 10 percentage points)
full population105.640.936-17.34112.7043.10
single working aged no children57.570.947-1.6766.9926.96
single working aged with children121.360.974-2.28119.9862.81
couple working aged no children245.930.868-87.95214.95101.98
couple working aged with children384.350.961-10.82153.6768.49
single pensioner5.710.860-14.6747.0023.86
couple pensioner41.680.847-27.7982.2532.27
  1. Source: Authors’ calculations using simulated data.

    Notes: working age adults between 18 and 66 years; pensioners aged 67 and over; children include all dependents aged 0 to 17 years; all income measured in £ per week; "mean" reports sample arithmetic averages for net taxes evaluated using functional description for transfer payments; "correlation coefficients" report correlations between net tax payments evaluated using functional and database descriptions for transfer payments. "mean difference" statistics report sample averages for net taxes evaluated using database description for transfer payments less net taxes evaluated using functional description for transfer payments. "standard deviation of difference" statistics report sample statistics for "mean differences". "mean absolute difference" statistics report sample averages for absolute differences between net taxes evaluated using functional and database descriptions for transfer payments.

The statistics reported in Table 1 indicate a reasonably close correspondence between the measures of net transfer payments imputed using functional and database descriptions for the policy environment, for both the base and reform scenarios. Correlation coefficients between the measures of net transfer payments are between to 0.93 and 0.98 for the full population cross-section and all population subgroups, except for those under state pension age and working couples with no children.24

Correlation coefficients are smaller for the population over state pension age (67) primarily due to the smaller net transfer payments identified for this population subgroup. In the case of working couples without children, the matching methods considered here are complicated by income splitting that is important in the UK, where taxes are levied on individual incomes. In related work, we have consequently extended the matching routine considered here to account for individual incomes via the incorporation of Mahalanobis distance metrics, as discussed in Section 2.3 (see Bronka et al., 2023).

The mean and (sample) standard deviations for differences between the database and functional measures for transfer payments indicate that these differences are not significantly different from zero for any of the population subgroups at any appreciable confidence interval. Although mean absolute differences are higher for the reform scenario, relative to the base, they are smaller when the respective statistics are expressed as percentages of the sample means for net taxes evaluated under the functional description for policy (left most column of the table).

4.3 Base policy projections

Simulating a base policy context, representing a “status-quo” scenario, is a common starting point considered for analysing the effects of a policy reform. Although the statistics reported in Table 1 indicate that, all else equal, the two alternative methods for imputing tax and benefit payments considered here are similar, they are not identical. Here we explore the practical bearing that the alternative methods for projecting taxes and benefits had on simulated profiles under the base policy scenario.

Figure 4.1 displays age specific arithmetic averages for selected characteristics of simulated adults born between 1980 and 1989. Consideration of multiple birth cohorts increases the sample of individuals contributing to the reported statistics. This helps to filter-out noise in the projected profiles: projections obtained using data of single-year birth cohorts are qualitatively similar to those reported here. To further clarify, averages displayed in Figure 4.1 for age 40 are based on simulated data for year 2020 for the oldest contributing birth cohort (born in 1980), and 2029 for the youngest birth cohort.

Selected age-specific simulated population averages projected under the base policy scenario for cohorts born between 1980 and 1989, by method of transfer payment imputation. Source: Authors’ calculations using simulated data Notes: Age specific population averages evaluated for population cohorts born between 1980 and 1989. All financial figures reported in 2017 prices. The two series reported in each panel are distinguished by the method used to impute tax and benefit payments. The “base policy scenario” refers to an assumed status-quo policy environment.

The panels of Figure 4.1, taken together, display close similarities between projections based on functional and database methods for projecting tax and benefit payments. There are, nevertheless, some noticeable differences between the two sets of projections reported in Figure 4.1, which it is useful to discuss.

The top panel of Figure 4.1 displays a close correspondence between the labour supply projected using the database and functional descriptions for tax and benefit payments for the ten birth cohorts born between 1980 and 1989. The most obvious disparities between the two series are reported from age 40: between ages 40 and 59 projected labour supply is appreciably higher using functional descriptions for transfer policy, and vice versa at higher ages. The differences between the two series peak at age 50, when labour supply projected under the functional description for transfer policy averages 36.9 hours per week, 2.5 hours higher than under the database description.

The differences in labour supply projections discussed above are also reflected by the statistics reported for disposable incomes in the middle panel of Figure 4.1. Like labour, projected disposable income is almost identical under the two methods for projecting transfer payments until age 40, is appreciably higher between ages 40 and 59 under the functional method, and vice-versa from age 60. In contrast, measures of consumption projected using the two methods for projecting transfer payments (bottom panel of Figure 4.1) are almost identical until age 70. From age 70, projected consumption reflects disposable income, with measures based on the functional description for transfer policy systematically lower than those based on database imputations.

Both panels of Figure 5.1 indicate slightly higher wealth accrual to age 60 under the functional description for transfer payments, relative to the database imputations, followed by a faster decline in pension wealth at higher ages. These statistics help to complete an underlying narrative: distortions attributable to imperfections in the database description of functional tax and benefit payments imply slightly stronger savings and work incentives during peak working years (40-60). The higher savings permit slightly earlier retirement, resulting in a faster drawdown of wealth, which is particularly evident for wealth held in private pensions. The faster drawdown of wealth, in turn results in lower disposable incomes, which support lower consumption later in life. Nevertheless, it is important to recognise that the scale of distortions identified here is generally slight.

Selected age-specific simulated population averages projected under the base policy scenario for cohorts born between 1980 and 1989, by method of transfer payment imputation. Source: Authors’ calculations using simulated data. Notes: See notes to Figure 4.1.

As previously noted, these distortions arise from the fact that some variables considered by the tax functions are not included or imperfectly approximated by the matching procedure. Greater accuracy can be achieved by increasing the individual-specific detail considered for the matching procedure. In the limit, when all variables considered by the tax functions are perfectly reflected by the matching procedure used to identify donors in the tax database, no distortions will arise. Note, however, that this last conclusion depends upon a sufficiently large and diverse tax database, so that associated small sample issues are avoided.

It is also noteworthy that the tax-benefit calculators and survey datasets that we envisage as sources for compiling a tax database typically describe at least as many individual-specific characteristics as a functional representation of taxes and benefits programmed within a dynamic heterogenous household model. This is important because it permits flexibility in specification of the matching methods used to impute taxes and benefits from the reference database. Given sufficient population diversity in the tax database, the implication is that the imputation method outlined in this study should not suffer from a loss of policy detail, relative to functional methods for representing taxes and benefits.

4.4 Projected effects of a policy reform

This section reports the effects of a simulated policy reform, in which all rates of income tax are increased by 10 percentage points from 2017. This policy experiment is designed to clarify two related points. First, that the projected profiles reported in Section 4.4 for the “base policy” environment are sensitive to assumed tax and benefit policy parameters; and second that the extent of this sensitivity is similar when tax and benefit payments are simulated using function and database methods. These related points carry over to more complex reforms implemented over alternative time horizons.25 The current discussion focusses on comparisons between the respective methods for projecting tax and benefit payments, rather than the effects of the simulated reform scenario per sè.

Comparisons of statistics for the reform projections “in levels”, as reported for the base policy projections in Section 4.4, are reported in Appendix C (Figure C1 and Figure C2). These do not provide qualitatively useful information that adds to the discussion in Section 4.4. The current analysis consequently focusses on the effects of the reform, as described by the difference between the reform and base projections for policy, simulated using the two alternative methods for imputing tax and benefit payments.

Figure 6.1 displays projected effects of a 10-percentage point increase in all income tax rates for the same 10 birth cohorts considered in Section 4.4. These 10 birth cohorts, who were born between 1980 and 1989, were aged between 28 and 37 when the policy reform is assumed to take affect (2017). Echoing discussion in Section 4.4, Figure 6.1 displays age specific averages projected under the reform scenario, less the same averages projected under the base policy scenario.

Selected age-specific population average effects of a 10-percentage point increase in income tax rates on cohorts born between 1980 and 1989, by method of transfer payment imputation. Source: Authors’ calculations using simulated data Notes: Age specific population averages for population cohorts born between 1980 and 1989 projected under the policy reform, less the same statistics projected under the base simulation scenario. All financial figures reported as percentage changes relative to the base simulation scenario. The two series reported in each panel are distinguished by the method used to impute tax and benefit payments.

As discussed in Section 3.1, LINDA projects savings and employment decisions as though they are made to maximise expected lifetime utility. The rise in income tax rates consequently imply income and substitution effects underlying the projected behavioural responses. The income effects of the considered policy reform tend to reduce consumption and increase labour supply. The substitution effects, in contrast, tend to reduce labour supply and have ambiguous implications for consumption, as leisure is made less expensive and saving more expensive relative to immediate (non-durable) consumption.

The top panel of Figure 6.1 indicates that the substitution effects tend to dominate labour/leisure responses, as employment is projected to fall under the higher income tax rates. Importantly, this is true for simulations based on both functional and database methods for projecting tax and benefit payments.

Nevertheless, it is notable that the projected falls in employment between ages 40 and 55 are appreciably larger when transfer payments are imputed using the database method, suggesting exaggerated substitution effects for labour supply. The larger falls in peak-working age employment projected using the database description for policy are, however, off-set by a more pronounced delay in the timing of retirement under the reform, relative to the functional description for policy. The overall impact is a widening of the disparities of labour supply projected using the functional and database descriptions for transfer policy, which are identified for the base simulation in Section 4.4.

The middle panel of Figure 6.1 reveals very similar age profiles for the effects of the policy reform on disposable income generated using the functional and database descriptions for policy. Disposable income is projected to fall appreciably from age 25 to 50, reflecting the increase in taxes implied by the rise in income tax rates, the projected declines in employment, and – later in life – falls in savings (discussed below). Although maintaining very similar profiles, disposable income is seen to fall slightly more under when simulated using database descriptions for tax and benefit payments between ages 40 and 60, reflecting the more pronounced falls for employment described above. From state pension age, both methods for projecting fiscal transfers suggest similar declines in disposable income, equivalent to approximately 4% relative to the simulation base.

The bottom panel of Figure 6.1 also indicates very similar age profiles for the effects of the policy reform projected using the two methods for reflecting transfer policy; this time for consumption. It is striking that both methods for projecting transfer payments suggest cohort-average declines of consumption between 5 and 8 percent from age 40. This is slightly lower than the 10% increase in all income tax rates considered for analysis, despite the coincident fall in employment, as not all income is subject to income tax.

Figure 7.1 also indicates broadly similar effects on wealth of the rise in income tax rates projected using the functional and database descriptions for policy. Both figures indicate more prominent declines of wealth to age 55 when transfer payments are projected using database methods, with the differences between the simulated profiles closing later in life. These differences reflect the more pronounced falls in employment projected to age 55 followed by delayed retirement when transfer payments are projected using database methods. Nevertheless, it is important to recognised that in all cases the signs of projected behavioural responses are independent of the method used to project transfer payments, and the scale of differences between the two methods is limited to a few percentage points.

Selected age-specific population average effects of a 10-percentage point increase in income tax rates on cohorts born between 1980 and 1989, by method of transfer payment imputation. Source: Authors’ calculations using simulated data Notes: See notes to Figure 4.1.

The overall impression made by Figure 6.1 is consequently that projected responses to the policy reform derived from the functional and database descriptions for transfer policy are qualitatively similar. These qualitative similarities are complemented with broadly similar quantitative effects projected using the two methods for simulating transfer payments, where the latter are subject to some discernible differences in precise scale and timing.

5. Conclusions

This paper describes a new method for imputing taxes and benefits in dynamic heterogenous agent models, that draws upon existing third-party data sources, including (static microsimulation) tax-benefit calculators. A practical implementation of the approach is described for a model parameterised to the UK policy context, and simulated statistics indicate that the proposed method can generate similar results – both qualitatively and quantitatively – to functional methods for imputing tax and benefit payments. These similar results were also obtained in similar simulation run-times.

Notably, a comparison against a functional representation of the tax and benefit system can be a more demanding test than applying the method to real-world data. The functional form reflects an “ideal” version of the tax and benefit system, which excludes the noise and errors stemming from issues such as tax-evasion or incomplete benefit take-up potentially found in real-world data. Even small matching discrepancies become apparent when compared to the precise functional-form outputs, but might be otherwise indistinguishable from natural variation present in real-world data. Validating against the functional form thus requires the proposed method to precisely replicate an idealised system, making it a more demanding test than one based on real-world data.

The proposed method offers the possibility of outsourcing the technical challenges associated with reflecting the complexity and fluidity of modern tax and transfer systems, either to statistical agencies responsible for the collection of widely available survey micro-data, or specialists in the construction of tax-benefit calculators. It is hoped that this will facilitate the development of more realistic dynamic heterogenous agent models (including dynamic microsimulation and agent-based models), thereby mitigating a potentially important source of simulation bias.

An advantage of the proposed method is that it can be easily adapted to reflect the influence of an expanded set of simulated characteristics on tax and benefit payments. Consider, for example, an existing dynamic model that used the proposed method to project transfer payments. Suppose that this dynamic model did not distinguish health status, but the reference database used to impute taxes and benefits payments did. Then, the influence on transfer payments of adding health status to the dynamic model would involve updating the matching routine used to impute transfer payments to distinguish by health status: it would not be necessary for the details of health-related tax and benefit programmes to be defined explicitly.

A caveat associated with the proposed method is that it cannot make-up for missing detail in a dynamic model structure. Specifically, a tax or benefit policy will only be suitable for analysis using the proposed approach if (1) it is reflected by the tax-benefit calculator, and its incidence is distinguished by both the (2) assumed matching method adopted for imputation and (3) the wider dynamic model. It is important that all three of these conditions are met for a policy to be eligible for analysis. In this regard, particular care should be exercised in relation to the matching methods employed, as these may be especially opaque to model users in some contexts.

Consider, again, the case where a dynamic model did not distinguish health status, but the reference database used to impute taxes and benefits payments did. In this case, health-related benefits described by a tax-benefit calculator will appear as (conditionally) random variations in the tax and benefit payments projected by the dynamic model. These random variations will be indistinguishable from all other features that influence transfer payments in the tax-benefit calculator but are unaccounted for by the assumed matching methods.

The emphasis on outsourcing in this paper contributes to a modelling paradigm that focusses on modular functionality, and toward the development of hybrid models more generally. Although the current study exclusively considers the problem of simulating tax and benefit payments, the suggested approach could conceivably be adapted to address a wide range of simulation problems, from relationship transitions, to fertility, and the evolution of health status.

Footnotes

1.

See, for example, Assenza et al. (2018), Ashraf et al. (2016), Neveu (2013), Dawid et al. (2019), Teglio et al. (2019), Dosi et al. (2015), Botta et al. (2021), and Caiani et al. (2019). Demographic characteristics (other than age) are also often abstracted in the literature that accounts for precautionary savings incentives in life-cycle frameworks; Imrohoroglu et al. (1995), Hubbard et al. (1995), Huggett (1996), Gourinchas and Parker (2002), Low (2005), Aydilek (2013), and Conesa et al. (2020).

2.

Where a government sector is included for analysis, most macro-economic agent-based models adopt simple linear functions for net fiscal transfers with households; see Assenza et al. (2018), Ashraf et al. (2016), Teglio et al. (2019), Dosi et al. (2015), Caiani et al. (2019). In contrast, Neveu (2013) and Botta et al. (2021) allow for income tax progressivity. Similarly, for microsimulation models that account for precautionary incentives and adopt linear transfer functions, see Albertini et al. (2021), Low and Pistaferri (2015), Huggett (1996), Imrohoroglu et al. (1995), and French (2005); in contrast, Keane and Wasi (2016), Conesa et al. (2020), De Nardi et al. (2017) allow for progressive income taxes. These models can be contrasted with those designed expressly to explore tax and benefits policy: see Orcutt et al. (1976), Caldwell (1997), Morrison (1998), Flood (2007), Nelissen (1993); Nelissen (1991), Spielauer (2013), and Dekkers and Bosch (2016) for a review.

3.

Attanasio and Weber (2010) and Browning and Lusardi (1996) for discussion of stylised facts underlying the life-cycle framework, including the role of household demographics. Attanasio and Weber (1995), Fernández-Villaverde and Krueger (2007), and Gustman and Steinmeier (2005) for the importance of household demographics in explaining consumption patterns.

4.

An example is the contemporary literature that accounts for life-cycle precautionary savings incentives. French (2005), Low and Pistaferri (2015), Keane and Wasi (2016), de Nardi et al. (2018) and Albertini et al. (2021) accommodate health status, Attanasio et al. (2018) reflect education, age, and family composition, and van de Ven (2017a) accounts for uncertainty over a relatively wide range of demographics, including relationship status and number and age of dependent children.

5.

For example, static tax-benefit calculators for the UK policy context include UKMOD (Richiardi et al., 2021), the Intra-Governmental Tax and Benefit Model administered by HM Treasury (IGOTM), Policy Simulation Model (PSM, Department for Work and Pensions), and TAXBEN (Giles and McCrae, 1995).

6.

Schofield et al. (2022) also follow a similar approach by matching simulated individuals in their dynamic microsimulation model of mitochondrial disease in adults with simulated individuals in STINMOD+, a tax-benefit model for Australia, based on a limited set of observable characteristics (age, sex, highest education attained and state of residence). Their objective, however, is to use the microsimulation model as a baseline to evaluate the economic impact of the disease, rather than informing the evolution of the dynamic structure itself.

7.

Nevertheless, use of a tax-benefit calculator does facilitate consideration of policy reform scenarios.

8.

Hot deck imputation does, however, depend upon important assumptions concerning the matching methods used to identify “donors” for “recipients”.

9.

Note that refers exclusively to variables endogenous to the dynamic model that are of use in projecting tax and benefit payments.

10.

One way to mitigate this type of issue is to adopt a hybrid approach for imputing transfer payments, where a database is used to impute “non-contributory” benefits, and functions are used to project the contributory component of a transfer system.

11.

Using a static-tax benefit calculator to impute transfer payments rather than drawing directly from a suitable survey micro-dataset has the advantage of facilitating consideration of policy counterfactuals that are often a focus of interest for dynamic microsimulation models.

12.

These methods range from functional regression specifications, through non-parametric descriptions, to pair-wise matching methods. For studies that apply regression methods to impute net tax and benefit payments, see Bargain et al. (2013), Biewen and Juhasz (2012), and Frenette et al. (2007).

13.

Alternative interpolation methods could be used. Cubic interpolation, for example, has the advantage of implying a continuous differential at the expense of requiring four database observations for computation (e.g. Keys, 1981).

14.

The idea is that the output for each individual at each point in time projected by a dynamic model should include identifiers for matched individuals used to impute tax and benefit payments from the reference database. Validation might also take into account the anticipated incidence of referencing alternative database observations with respect to their associated weights (as pointed out by an anonymous reviewer).

15.

This includes when the system is fixed in either real or nominal terms.

16.

For instance, EUROMOD-based tax-benefit models facilitate creation of policy scenarios for future years, including automated uprating of input data. The resulting output datasets can then be imported in the dynamic model, and the model directed to use the most recent dataset for any given simulated year.

17.

Computation time remains a significant barrier to some classes of dynamic microsimulation model, including those that use dynamic programming methods to project behaviour through time. In such contexts, tax and benefit payments may need to be evaluated for many more sets of circumstances than are actually represented in simulation output, due to the methods used to identify utility maximising decisions.

18.

That is, each reference database should be derived from a tax-benefit calculator using the same input data.

19.

Associated programming code can be accessed as described in Appendix A.1.

20.

The model is publicly accessible, and free to download from www.simdynamics.org. See also Ven (2017c).

21.

In more recent work, the authors take into consideration a wider range of characteristics for the matching process, including carer status, receipts and costs of social care, and income splitting; see Ven et al. (2024) for details.

22.

Those with incomes of 6 and 14 omitted because their income values exhibit greater difference to the target individual than four other income values observed in the group identified from step 1.

23.

Weights obtained by evaluating for each candidate a test statistic equal to the absolute difference between the candidate’s income and the reference income, subject to a minimum defined by the income disregard. The maximum of all candidate test statistics is then identified (3 = abs(10 - 7) in the example where abs(.) denotes the absolute operator), and intermediate weights calculated by dividing the maximum by each candidate’s test statistic (for the candidates with income 12, this would equal 1.5 = 3 / abs(10 – 12)). Weights summing to one are then obtained by rescaling all candidate weights; in the example, this involves dividing each intermediate weight by the sum of all candidates’ intermediate weights, equal to 10 = 1 + 3 + 3 + 1.5 + 1.5.

24.

Note that contributory benefits, which may be poorly reflected by the assumed matching methods, do play a role in the UK tax benefit system, but this role has declined appreciably over recent decades.

25.

More complex reforms associated with additional intermediate policy configurations have been considered using the SimPaths model; see Bronka et al. (2023).

26.

The base simulation involves projecting data 65 years forward and 65 years backward through time – 130 years in total – starting from the reference cross-section reported for the UK in 2017. In contrast, the reform simulation starts from the data generated under the base simulation and re-evaluates data only for the 65 years forward from 2017. This explains the longer computation time for population projections under the “base” policy scenario than for the “reform” scenario.

Appendix A: Code

Appendix A.1: Functional description of UK tax and benefit payments applicable in April 2016

Fortran code is provided in text file UK2016.f90.

Appendix A.2: Database imputations for taxes and benefits

Fortran code is provided in text file taxdb_comms.f90.

Appendix B: Analysis Walk-through

  1. Download the LINDA quick-start guide from: https://www.simdynamics.org/index_htm_files/quick%20start%20guide.docx

    1. Note that LINDA is based on a model framework called SIDD

    2. LINDA can be downloaded from:

      1. https://www.simdynamics.org/download.html

  2. Work through Sections 1.2 (Loading the model onto a new computer) and 1.3 (Extracting base data from the Wealth and Assets Survey) of the quick-start guide.

  3. Implement parameters to project transfer payments based on internally programmed functions for the UK in 2016

    1. Make a copy of MODEL\Job File.xls with name “Job File original.xls”

    2. Open MODEL\Job File.xls

    3. Worksheet “tax params”

      1. Copy columns AJ to AO

    4. Worksheet “input”,

      1. Paste to columns AH to AM

      2. Set cell AO2 to 7

      3. Set cell AO3 to 8

      4. Set cell AO5 to 0

  4. Run base simulation assuming internally programmed functions for the UK in 2016:

    1. Job File.xls, worksheet “input”,

      1. Revise preference parameters to adjust for altered tax system

        1. Set cell Y3 to 0.982

        2. Set cell Y4 to 0.972

        3. Set cell Y10 to 2.3

      2. Set cell A2 to “base2017_fn” (without quotation marks)

      3. Set cell BQ35 to 1 (generate cohort statistics reported in Section 4.4)

    2. Save

    3. Run SIDD.exe

      1. Note that this simulation takes appreciably longer to run than most others described below.

  5. Create database for functional description of policy in base specification

    1. Open MODEL\ANALYSIS_FILES\tax_test3.xls

    2. Worksheet “analysis”

      1. Set cell C2 to 2017

      2. Clear all data from row 7 in columns A to AA

    3. Add input data in columns A to W with values generated by the model for 2017 under the “base2017_fn” simulation, implemented in step (4)

      1. TIP: The steps described under 5.c.ii below require Excel to open large data files. Depending on your system, this may not be possible, and a short Stata do file is consequently provided with the appendix materials to facilitate extraction of the required data – the Stata file copies slightly more data than described below.

      2. Collate required data

        1. Open a new (temporary) Excel file

        2. Open file MODEL\SIMULATIONS\base2017_fn\age.csv

        3. Copy rows 1 to 40000 from column 66

          1. In Excel, you can change to R1C1 format to see column numbers via the File > Options > Formulas > Working with formulas menu

        4. Paste the data to cell A1 of Sheet1 of the temporary Excel file

        5. Close file MODEL\SIMULATIONS\base2017_fn\age.csv

        6. Repeat for data in columns B to Y of Sheet1 of the temporary Excel file, where:

          1. Data for column B are from file na.csv

          2. Data for column C are from file nk.csv

          3. Data for column D are from file nk_all1.csv

          4. Data for column E are from file nk_all2.csv

          5. Data for column F are from file nk_all3.csv

          6. Set Cells G1 to G40000 to 0

          7. Set Cells H1 to H40000 to 0

          8. Set Cells I1 to I40000 to 1

          9. Set Cells J1 to J40000 to 1

          10. Data for column K are from file emp1.csv

          11. Data for column L are from file emp2.csv

          12. Data for column M are from file labinc.csv

          13. Data for column N are from file peninc.csv

          14. Data for column O are from file cpinc.csv

          15. . Data for column P are from file ppc.csv

          16. Data for column Q are from file ninvinc.csv

          17. Data for column R are from file w.csv

          18. Data for column S are from file hsgw.csv

          19. Data for column T are from file hsgmd.csv

          20. Data for column U are from file hsgret.csv

          21. Data for column V are from file hsgmr.csv

          22. Data for column W are from file comexhs.csv

          23. Data for column X are from file psnno.csv (column 1)

          24. Data for column Y are from file ben_unit.csv

          25. Keep only rows where column X is equal to column Y

            1. TIP: You should be left with a sample of approximately 25000 observations.

        7. Delete data in columns X and Y of the temporary Excel file

      3. Copy all remaining data from the temporary Excel file (or the Stata do file) to columns A to W of tax_test3.xls, analysis worksheet, starting at row 7.

      4. Set cell C3 of tax_test3.xls, analysis worksheet to the number of rows of data copied from the temporary Excel file

        1. Note that the last row of data in tax_test3.xls, analysis worksheet should be equal to the value in cell C3 + 6

      5. Close the temporary Excel file without saving (if necessary)

      6. Save tax_test3.xls and close file

    4. Open MODEL\Job File.xls

    5. Worksheet “input”

      1. Set cell A2 to “temp” (without quotation marks)

      2. Set cell BQ55 to 1

    6. Save

    7. Run SIDD.exe

    8. Open MODEL\SIMULATIONS\temp\tax_test3.xls

    9. Extend the “reference database” worksheet to reference all data in the “analysis” worksheet

      1. Copy cells A6 to AA6 down to reference all rows of data included in the “analysis” worksheet

        1. Note that the last row of data in the “reference database” worksheet should be two rows above the last row in the “analysis” worksheet

    10. Save tax_test3.xls

    11. Copy all data in the “reference database” worksheet, from row 4 to the last row in the worksheet, and from column A to AA

    12. Open new workbook

    13. Paste values to Cell A1

    14. Create subdirectory: MODEL\TAX_DATABASE\base_fn

    15. Save the new workbook as MODEL\TAX_DATABASE\base_fn\base_2017.csv

      1. CSV format (not UTF8)

        1. Note: if you receive a warning, proceed by accepting the format

    16. Close all files

    17. Copy and paste the file tax_test3.xls from MODEL\SIMULATIONS\temp\ to MODEL\ANALYSIS_FILES\

    18. Open MODEL\ANALYSIS_FILES\tax_test3.xls

    19. Worksheet “analysis”

      1. Set cell C2 to 2057

    20. Save file and close

    21. Run SIDD.exe

    22. Open MODEL\SIMULATIONS\temp\tax_test3.xls

    23. Worksheet “reference database”

      1. Copy all data in the “reference database” worksheet, from row 4 to the last row in the worksheet, and from column A to Z

    24. Open new workbook

    25. Paste values to Cell A1

    26. Save file as MODEL\TAX_DATABASE\base_fn\base_2057.csv

      1. CSV format (not UTF8)

    27. Close all files

  6. Set simulation from (4) as base

    1. Open MODEL\Job File.xls

    2. Alt+F8

    3. Run SIDD macro

    4. Form 0 – specify new base using simulation “base2017_fn”

      1. Enter “base2017_fn” in the two empty text boxes at the bottom of the form

    5. Press the “CONVERT RUN TO NEW BASE” button

  7. Run reform of 10% increase in all income tax rates using functional description for policy

    1. Open MODEL\Job File.xls

    2. Worksheet “input”,

      1. Set cell A2 to “10pp_fn” (without quotation marks)

      2. Set cell AI12 to 0.3

      3. Set cell AI13 to 0.5

      4. Set cell AI14 to 0.55

      5. Set cell AI106 to 0.55

      6. Set cell AI107 to 0.55

      7. Set cell AI108 to 0.55

      8. Set cell BQ35 to 1 (to generate statistics reported in Section 4.5)

    3. Save

    4. Run SIDD.exe

  8. Create database for reform of 10% increase in all income tax rates using functional description for policy

    1. Open MODEL\ANALYSIS_FILES\tax_test3.xls

    2. Worksheet “analysis”

      1. Set cell C2 to 2017

    3. Save file and close

    4. Open MODEL\Job File.xls

    5. Worksheet “input”

      1. Set cell A2 to “temp” (without quotation marks)

      2. Set cell BQ55 to 1

    6. Save

    7. Run SIDD.exe

    8. Open MODEL\SIMULATIONS\temp\tax_test3.xls

    9. Worksheet “reference database”

      1. Copy columns A to Z from row 4 to the end

        1. TIP – Ensure that all instances from “analysis” worksheet are included in copied data

    10. Open new workbook

    11. Paste values to Cell A1

    12. Save file as MODEL\TAX_DATABASE\base_fn\10pp_2017.csv

      1. CSV format (not UTF8)

    13. Close all files

    14. Open MODEL\ANALYSIS_FILES\tax_test3.xls

    15. Worksheet “analysis”

      1. Set cell C2 to 2057

    16. Save file and close

    17. Run SIDD.exe

    18. Open MODEL\SIMULATIONS\temp\tax_test3.xls

    19. Worksheet “reference database”

      1. Copy columns A to Z from row 4 to the end

    20. Open new workbook

    21. Paste values to Cell A1

    22. Save file as MODEL\TAX_DATABASE\base_fn\10pp_2057.csv

      1. CSV format (not UTF8)

    23. Close all files

  9. Implement parameters to project transfer payments based on database description for internally programmed functions for the UK in 2016

    1. Replace MODEL\Job File.xls with “Job File database.xls” included with this appendix

    2. Run SIDD.exe

      1. Note that this simulation takes appreciably longer to run than most others described here.

  10. Set simulation from (9) as base

    1. Alt+F8

    2. Run SIDD macro,

    3. Form 0 – specify new base using simulation “base2017_db”

    4. Press the “CONVERT RUN TO NEW BASE” button

  11. Run reform using database description for policy

    1. Open MODEL\Job File.xls

    2. Worksheet “input”

      1. Set cell A2 to “10pp_db” (without quotation marks)

      2. Set cell E38 to “10pp_2017.csv” (without quotation marks)

      3. Set cell E48 to “10pp_2057.csv” (without quotation marks)

      4. Set cell BQ35 to 1 (generate quintile statistics reported in Section 3.3.2)

    3. Save

    4. Run SIDD.exe

Appendix C: Analysis Supplementary Statistics

Appendix C.1: Simulation run times

Each policy context projected using LINDA involves two discrete stages: evaluation of utility maximising behaviour for any feasible combination of individual specific characteristics, and population projections based on the evaluated behavioural solutions. Tax and benefit imputations are important for each of these stages. Table A1 reports disaggregated computation times for the base (status-quo) and reform scenarios, for each method of imputing net transfer payments.

Table A1 indicates that, relative to the functional description for transfer payments, the database method for imputing net transfer payments required almost identical run-times, both to evaluate behavioural solutions, and to project panel data for the population, for both the base and reform scenarios.26

Table A1
Simulation run-times by policy scenario and method of transfer payment imputation.
behavioural solutionpopulation projectiontotal
net transfer payments imputed using database
Base54.590.9145.4
Reform55.752.0107.7
net transfer payments calculated using function
Base54.387.0141.3
Reform54.453.3107.7
  1. Source: Authors’ calculations using simulated data. Notes: "base" defines reference policy context; "reform" defines policy context that is identical to "base" in all respects, expect that all income tax rates are increased by 10 percentage points. All times reported in minutes. Simulations run on workstation with dual Xeon E5-2670 processors and 96GB of RAM.

Selected age-specific population averages projected under the reform policy scenario of a 10-percentage point increase in income tax rates on cohorts born between 1980 and 1989, by method of transfer payment imputation Source: Authors’ calculations using simulated data Notes: Age specific population averages evaluated for population cohorts born between 1980 and 1989. All financial figures reported in 2017 prices. The two series reported in each panel are distinguished by the method used to impute tax and benefit payments.

Appendix C.2: Supplementary figures for policy counterfactual

Selected age-specific population averages projected under the reform policy scenario of a 10-percentage point increase in income tax rates on cohorts born between 1980 and 1989, by method of transfer payment imputation Source: Authors’ calculations using simulated data Notes: Age specific population averages evaluated for population cohorts born between 1980 and 1989. All financial figures reported in 2017 prices. The two series reported in each panel are distinguished by the method used to impute tax and benefit payments.

References

  1. 1
    Health, wealth, and informality over the life cycle
    1. J Albertini
    2. X Fairise
    3. A Terriau
    (2021)
    Journal of Economic Dynamics and Control 129:104170.
    https://doi.org/10.1016/j.jedc.2021.104170
  2. 2
    A review of hot deck imputation for survey non-response
    1. RR Andridge
    2. RJA Little
    (2010)
    International Statistical Review = Revue Internationale de Statistique 78:40–64.
    https://doi.org/10.1111/j.1751-5823.2010.00103.x
  3. 3
    How inflation affects macroeconomic performance: an agent-based computational investigation
    1. Q Ashraf
    2. B Gershman
    3. P Howitt
    (2016)
    Macroeconomic Dynamics 20:558–581.
    https://doi.org/10.1017/S1365100514000303
  4. 4
    Does fiscal policy matter? Tax, transfer, and spend in a macro ABM with capital and credit
    1. T Assenza
    2. P Colzani
    3. D Delli Gatti
    4. J Grazzini
    (2018)
    Industrial and Corporate Change 27:1069–1090.
    https://doi.org/10.1093/icc/dty017
  5. 5
    The distribution of wealth and the individual life-cycle 1
    1. AB Atkinson
    (1971)
    Oxford Economic Papers 23:239–254.
    https://doi.org/10.1093/oxfordjournals.oep.a041192
  6. 6
    Is consumption growth consistent with intertemporal optimization
    1. OP Attanasio
    2. G Weber
    (1995)
    Journal of Political Economy 103:1121–1157.
  7. 7
    Consumption and saving: models of intertemporal allocation and their implications for public policy
    1. OP Attanasio
    2. G Weber
    (2010)
    Journal of Economic Literature 48:693–751.
    https://doi.org/10.1257/jel.48.3.693
  8. 8
  9. 9
    Agent-based modeling in economics and finance: past, present, and future
    1. RL Axtell
    2. JD Farmer
    (2022)
    Journal of Economic Literature.
  10. 10
    Habit formation and housing over the life cycle
    1. A Aydilek
    (2013)
    Economic Modelling 33:858–866.
    https://doi.org/10.1016/j.econmod.2013.05.012
  11. 11
  12. 12
    Understanding Rising Income Inequality in Germany, 1999/2000–2005/2006
    1. M Biewen
    2. A Juhasz
    (2012)
    Review of Income and Wealth 58:622–647.
    https://doi.org/10.1111/j.1475-4991.2012.00514.x
  13. 13
    Female labor supply, human capital, and welfare reform
    1. R. Blundell
    2. M Costa Dias
    3. C Meghir
    4. J Shaw
    (2016)
    Econometrica 84:1705–1753.
    https://doi.org/10.3982/ECTA11576
  14. 14
    Wages, experience, and training of women over the life cycle
    1. R Blundell
    2. M Costa-Dias
    3. D Goll
    4. C Meghir
    (2021)
    Journal of Labor Economics 39:S275–S315.
    https://doi.org/10.1086/711400
  15. 15
    The tax-price implications of bracket-creep
    1. CE Bohanon
    (1983)
    National Tax Journal 36:535–538.
    https://doi.org/10.1086/NTJ41862549
  16. 16
    Inequality and finance in a rent economy
    1. A Botta
    2. E Caverzasi
    3. A Russo
    4. M Gallegati
    5. JE Stiglitz
    (2021)
    Journal of Economic Behavior & Organization 183:998–1029.
    https://doi.org/10.1016/j.jebo.2019.02.013
  17. 17
    SimPaths: an open-source microsimulation model for life course analysis
    1. P Bronka
    (2023)
    CeMPA Working Paper Series, CEMPA6/23.
  18. 18
    Household saving: micro theories and macro facts
    1. M Browning
    2. A Lusardi
    (1996)
    Journal of Economic Literature 34:1797–1855.
  19. 19
    Does inequality hamper innovation and growth? An AB-SFC analysis
    1. A Caiani
    2. A Russo
    3. M Gallegati
    (2019)
    Journal of Evolutionary Economics 29:177–228.
    https://doi.org/10.1007/s00191-018-0554-8
  20. 20
    Corsim 3.0 User and Technical Documentation
    1. S Caldwell
    (1997)
    Cornell University.
  21. 21
    Welfare implications of switching to consumption taxation
    1. JC Conesa
    2. B Li
    3. Q Li
    (2020)
    Journal of Economic Dynamics and Control 120:103991.
    https://doi.org/10.1016/j.jedc.2020.103991
  22. 22
    Macroeconomics with heterogeneous agent models: fostering transparency, reproducibility and replication
    1. H Dawid
    2. P Harting
    3. S van der Hoog
    4. M Neugart
    (2019)
    Journal of Evolutionary Economics 29:467–538.
    https://doi.org/10.1007/s00191-018-0594-0
  23. 23
    Applications of Microsimulation Modelling
    1. G Dekkers
    2. K Bosch
    (2016)
    Prospective microsimulation of pensions in European Member States, Applications of Microsimulation Modelling, Central Administration of National Pension Insurance: Budapest.
  24. 24
    LIAM2: a new open source development tool for discrete-time dynamic microsimulation models
    1. G de Menten
    2. G Dekkers
    3. G Bryon
    4. P Liégeois
    5. C O’Donoghue
    (2014)
    Journal of Artificial Societies and Social Simulation 17:9.
    https://doi.org/10.18564/jasss.2574
  25. 25
    The lifetime costs of bad health
    1. M de Nardi
    (2018)
    NBER Working Paper 23963.
  26. 26
    Fiscal and monetary policies in complex evolving economies
    1. G Dosi
    2. G Fagiolo
    3. M Napoletano
    4. A Roventini
    5. T Treibich
    (2015)
    Journal of Economic Dynamics and Control 52:166–189.
    https://doi.org/10.1016/j.jedc.2014.11.014
  27. 27
    Consumption over the life cycle: facts from consumer expenditure survey data
    1. J Fernández-Villaverde
    2. D Krueger
    (2007)
    Review of Economics and Statistics 89:552–565.
    https://doi.org/10.1162/rest.89.3.552
  28. 28
    Modelling Our Future: Population Ageing, Social Security and Taxation
    1. L Flood
    (2007)
    33, Can We Afford the Future? An Evaluation of the New Swedish Pension System, Modelling Our Future: Population Ageing, Social Security and Taxation, Elsevier, p.
  29. 29
    The effects of health, wealth, and wages on labour supply and retirement behaviour
    1. E French
    (2005)
    The Review of Economic Studies 72:395–427.
    https://doi.org/10.1111/j.1467-937X.2005.00337.x
  30. 30
    The tale of the tails: Canadian income inequality in the 1980s and 1990s
    1. M Frenette
    2. DA Green
    3. K Milligan
    (2007)
    Canadian Journal of Economics/Revue Canadienne d’économique 40:734–764.
    https://doi.org/10.1111/j.1365-2966.2007.00429.x
  31. 31
    TAXBEN: the IFS microsimulation tax and benefit model
    1. C Giles
    (1995)
    IFS Working Paper W95/19.
  32. 32
    Consumption over the life cycle
    1. PO Gourinchas
    2. JA Parker
    (2002)
    Econometrica 70:47–89.
    https://doi.org/10.1111/1468-0262.00269
  33. 33
    The social security early entitlement age in a structural model of retirement and wealth
    1. AL Gustman
    2. TL Steinmeier
    (2005)
    Journal of Public Economics 89:441–463.
    https://doi.org/10.1016/j.jpubeco.2004.03.007
  34. 34
    Precautionary Saving and Social Insurance
    1. RG Hubbard
    2. J Skinner
    3. SP Zeldes
    (1995)
    Journal of Political Economy 103:360–399.
    https://doi.org/10.1086/261987
  35. 35
    The Hypothetical Household Tool (HHoT) in EUROMOD: a new instrument for comparative research on tax-benefit policies in Europe
    1. T Hufkens
    2. T Goedemé
    3. K Gasior
    4. C Leventi
    5. K Manios
    6. O Rastrigina
    7. P Recchia
    8. H Sutherland
    9. NV Mechelen
    10. G Verbist
    (2019)
    The Hypothetical Household Tool (HHoT) in EUROMOD: a new instrument for comparative research on tax-benefit policies in Europe, JRC Working Papers on Taxation and Structural Reforms No 05/2019.
  36. 36
    Wealth distribution in life-cycle economies
    1. M Huggett
    (1996)
    Journal of Monetary Economics 38:469–494.
    https://doi.org/10.1016/S0304-3932(96)01291-3
  37. 37
    CEM: Software for coarsened exact matching
    1. SM Iacus
    2. G King
    3. G Porro
    (2009)
    Journal of Statistical Software 30:1–27.
    https://doi.org/10.18637/jss.v030.i09
  38. 38
    A life cycle analysis of social security
    1. A Imrohoroglu
    2. S Imrohoroglu
    3. DH Joines
    (1995)
    Economic Theory 6:83–114.
    https://doi.org/10.1007/BF01213942
  39. 39
    Labour supply: the roles of human capital and the extensive margin
    1. MP Keane
    2. N Wasi
    (2016)
    The Economic Journal 126:578–617.
    https://doi.org/10.1111/ecoj.12362
  40. 40
    Cubic convolution interpolation for digital image processing
    1. RG Keys
    (1981)
    IEEE Transactions on Acoustics, Speech, and Signal Processing 29:1153–1160.
    https://doi.org/10.1109/TASSP.1981.1163711
  41. 41
    New Pathway in Microsimulation
    1. P Liégeois
    2. G Dekkers
    (2014)
    Combining EUROMOD and LIAM Tools for the Development of Dynamic Cross-sectional Microsimulation Models: a Sneak Preview, New Pathway in Microsimulation, Burlington : Ashgate.
  42. 42
    Combination of EUROMOD and LIAM2 tools for the development of dynamic microsimulation models: feasibility, example and conditions for sustainability of the linkage
    1. P Liégeois
    (2021)
    Combination of EUROMOD and LIAM2 tools for the development of dynamic microsimulation models: feasibility, example and conditions for sustainability of the linkage, InGRID Final Report D8.9.
  43. 43
    Self-insurance in a life-cycle model of labour supply and savings
    1. HW Low
    (2005)
    Review of Economic Dynamics 8:945–975.
    https://doi.org/10.1016/j.red.2005.03.002
  44. 44
    Disability insurance and the dynamics of the incentive insurance trade-off
    1. H Low
    2. L Pistaferri
    (2015)
    American Economic Review 105:2986–3029.
    https://doi.org/10.1257/aer.20110108
  45. 45
    Overview of DYNACAN
    1. R Morrison
    (1998)
    https://www.actuaries.org/CTTEES_SOCSEC/Documents/dynacan.pdf, Accessed, 16 February 2022.
  46. 46
    Household and education projections by means of a microsimulation model
    1. JHM Nelissen
    (1991)
    Economic Modelling 8:480–521.
    https://doi.org/10.1016/0264-9993(91)90029-N
  47. 47
    Labour market, income formation and social security in the microsimulation model NEDYMAS
    1. JHM Nelissen
    (1993)
    Economic Modelling 10:225–272.
    https://doi.org/10.1016/0264-9993(93)90019-C
  48. 48
    Fiscal policy and business cycle characteristics in a heterogeneous agent macro model
    1. AR Neveu
    (2013)
    Journal of Economic Behavior & Organization 92:224–240.
    https://doi.org/10.1016/j.jebo.2013.06.006
  49. 49
    Redistribution over the lifetime in the Irish tax-benefit system: an application of a prototype dynamic microsimulation model for Ireland
    1. C O’Donoghue
    (2001)
    Economic and Social Studies 32:191–216.
  50. 50
    Policy Exploration through Microanalytic Simulation
    1. GH Orcutt
    2. S Caldwell
    3. RF Wertheimer
    (1976)
    Washington: Urban Institute.
  51. 51
    The future of agent-based modeling
    1. MG Richiardi
    (2017)
    Eastern Economic Journal 43:271–287.
    https://doi.org/10.1057/s41302-016-0075-9
  52. 52
    UKMOD – A new tax-benefit model for the four nations of the UK
    1. M Richiardi
    2. D Collado
    3. D Popova
    (2021)
    International Journal of Microsimulation 14:92–101.
    https://doi.org/10.34196/IJM.00231
  53. 53
    Bias Reduction Using Mahalanobis-Metric Matching
    1. DB Rubin
    (1980)
    Biometrics 36:293–298.
    https://doi.org/10.2307/2529981
  54. 54
    New Palgrave Dictionary of Economics
    1. J Rust
    (2008)
    Dynamic programming, New Palgrave Dictionary of Economics, Palgrave Macmillan: New York, 10.1057/978-1-349-95121-5_1932-1.
  55. 55
  56. 56
    Fortax: UK tax and benefit system documentation
    1. J Shaw
    (2011)
    IFS Working Paper W11/08.
  57. 57
    The LifePaths Microsimulation Model: An Overview
    1. M Spielauer
    (2013)
    WIFO.
  58. 58
    microWELT: A Dynamic Microsimulation Model for the Study of Welfare Transfer Flows in Ageing Societies from A Comparative Welfare State Perspective
    1. M Spielauer
    (2020)
    WIFO Working Papers.
  59. 59
    Matching methods for causal inference: A review and a look forward
    1. EA Stuart
    (2010)
    Statistical Science 25:1–21.
    https://doi.org/10.1214/09-STS313
  60. 60
    EUROMOD: the European Union tax-benefit microsimulation model
    1. H Sutherland
    2. F Figari
    (2013)
    International Journal of Microsimulation 6:4–26.
    https://doi.org/10.34196/ijm.00075
  61. 61
    Budgetary rigour with stimulus in lean times: Policy advices from an agent-based model
    1. A Teglio
    2. A Mazzocchetti
    3. L Ponta
    4. M Raberto
    5. S Cincotti
    (2019)
    Journal of Economic Behavior & Organization 157:59–83.
    https://doi.org/10.1016/j.jebo.2017.09.016
  62. 62
  63. 63
    Parameterising a detailed dynamic programming model of savings and labour supply usingcross-sectional data”, International Journal of Microsimulation
    1. J van de Ven
    (2017b)
    InternationalJournal of Microsimulation 10:135–166.
  64. 64
    Parameterising a detailed dynamic programming model of savings and labour supply using cross-sectional data
    1. J van de Ven
    (2017c)
    International Journal of Microsimulation 10:134–164.
    https://doi.org/10.34196/ijm.00152
  65. 65
    The life course effects of care
    1. J Ven
    (2024)
    CeMPA Working Paper CEMPA7/24.

Article and author information

Author details

  1. Justin van de Ven

    1. National Institute of Economic and Social Research, London, United Kingdom
    2. University of Essex, UK, Colchester, United Kingdom
    For correspondence
    j.vandeven@essex.ac.uk
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4965-0218
  2. Patryk Bronka

    University of Essex, UK, Colchester, United Kingdom
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1068-3186
  3. Matteo Richiardi

    University of Essex, UK, Colchester, United Kingdom
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3749-7386

Funding

This study benefited from financial support from JPI More Years Better Lives and the UKRI Economic and Social Research Council, grant number ES/W001543/1. Van de Ven also acknowledges financial support from HM Treasury, UK.

Acknowledgements

The views presented in this paper are those of the authors alone.

Publication history

  1. Version of Record published: August 20, 2025 (version 1)

Copyright

© 2025, Ven et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)