# Accounting for tax evasion profiles and tax expenditures in microsimulation modelling. The Betamod model for personal income taxes in Italy

1. Ca’ Foscari University, Italy
Research article
Cite this article as: A. Albarea, M. Bernasconi, C. Di Novi, A. Marenzi, D. Rizzi, F. Zantomio; 2015; Accounting for tax evasion profiles and tax expenditures in microsimulation modelling. The Betamod model for personal income taxes in Italy; International Journal of Microsimulation; 8(3); 99-136. doi: 10.34196/ijm.00123

## Abstract

The paper presents the main characteristics of Betamod, a static microsimulation model that reproduces the Italian personal income tax (Irpef), as well as local income taxes, namely the regional and municipal surtaxes, building on a detailed reconstruction of tax legislation. With respect to the vast majority of existing tax microsimulation models, the peculiarities of Betamod concern two aspects: the inclusion of a detailed set of tax expenditures, and the estimation of individual-specific tax evasion rates, which account for the total individual income level, its composition in terms of income sources, and the geographical area of residence.

## 1. Introduction

Tax-benefit microsimulation models have become a standard tool for the design and the evaluation of public policies in many countries (see, among others, Bourguignon & Spadaro, 2006; Mitton, Sutherland & Weeks, 2000; Sutherland & Figari, 2013). Indeed, devising effective policy interventions requires appropriate ex-ante evaluation instruments, not only informative about the macro-level revenue consequences, but also about the distributional outcomes of specific interventions. In this respect, and particularly in a single-country framework, the accuracy of a particular model in accounting for aspects that are more salient in the national context, or the object of planned reform interventions, is key to its predictive power and, therefore, relevance.

In the Italian context, two aspects currently deserve particular attention. The first is tax evasion, which is extremely high, i.e. estimated in the range of 18–25% of Gdp in terms of unreported incomes (Giovannini, 2011). Various political leaders, as well as a significant share of the public opinion, seem to justify tax evaders on the grounds that tax rates are too high and the tax schedule far too progressive. At the same time, the distributional consequences of tax evasion are often neglected in the public discourse, or dismissed with generic statements based more on anecdotal evidence than grounded empirical analysis. For example, little is known about the distinct effects that tax evasion may bear to progressivity (vertical effect) versus horizontal equity and re-ranking (Aronson & Lambert, 1994; Urban & Lambert, 2008). Microsimulation models have a major potential in this respect. The second aspect is that of tax expenditures. Over the last decade, tax expenditures in Italy have consistently increased as a share of Gdp. Recently, the Italian Ministry of Economy and Finance identified 720 measures of tax expenditures that account for about the per cent of Gdp (Keen et al., 2012). Among these, the individual income tax expenditures are the largest (4.84% of Gdp). Because of the entailed reduction in tax revenues, and the induced distortions in taxpayers’ behavior, there is an increasing debate, both at the national and international level, on the use of tax expenditures as alternative to direct expenditures (e.g. see Avram, 2014; Burman, 2003; Burman et al., 2008; Poterba, 2011; Tyson, 2014), also on the grounds of their regressive effect (e.g. Matsaganis & Flevotomou, 2007).

This paper presents a new microsimulation model, called Betamod, for the Italian personal income tax (Irpef), including also local income taxes, namely the regional and municipal surtaxes, which tackles these two aspects. In more detail, Betamod improves on the existing Italian models1 by estimating a distribution of individual tax evasion rates, based on total individual income level, its composition in terms of sources, and geographical area. With respect to other Italian models, where tax evasion rates are assumed to be constant within population subgroups (e.g. by income source type, by income classes), Betamod, assigns a tax evasion rate to each individual. This allows to evaluate more accurately how tax evasion may alter the redistributive effect of personal income taxation, and to measure the horizontal, vertical and re-ranking effects, each of which is possibly altered by tax evasion. Moreover, Betamod accounts thoroughly for a detailed set of tax allowances and tax credits. Compared to the majority of current microsimulation models for Italy, Betamod includes all kinds of individual income tax expenditures and allows us to estimate the distributional effects of all tax expenditures simultaneously and of specific tax reliefs or categories of expenditure in turn.

The paper is organized as follows. Section 2 describes the data set and the preliminary data adjustments and imputations required to simulate accurately the Italian personal income taxes. Section 3 illustrates, in details, the process of constructing Betamod, focusing in particular on its innovative aspects. With reference to the 2010 fiscal year, Section 4 tests the robustness of the model by comparing the baseline simulation of personal income tax and local income taxes with official figures provided by tax returns data. Finally, Section 5 provides novel distributional evidences on tax evasion and on its profile, as well as on individuals’ re-ranking between income classes resulting from it2.

## 2. The micro database and related imputations

Betamod runs on the Italian national version of the Survey of Income and Living Conditions (It-Silc), which represents, with a few exceptions, the micro-database currently chosen by most tax- benefit microsimulation models for Italy3. With respect to the alternative Survey on Households Income and Wealth (Shiw), It-Silc takes the advantage of a more generous sample size (19,399 households in It-Silc versus 7,951 households in Shiw), allowing to conduct analyses by geographical area sample; the drawback of this choice is the lack of information on household’s assets and tax-relevant expenditures.

We use the cross-sectional component of It-Silc 2011, which features a considerably larger sample size than the rotating longitudinal component. The interview is structured into an household level questionnaire, collecting information on household composition, accommodation, housing costs, and economic circumstances (including savings, debts, receipt of family-related and means-tested benefits and children’s incomes); and an individual level questionnaire, which is administered to all household members aged 16 years old or above. In the individual level questionnaire, besides information on education, health and occupation, detailed information on individual’s income from various sources relevant for tax base assessment (employment, self-employment, old age and disability pensions, incapacity and disability benefits, rents from properties, investment income and other incomes) is covered. For income components subject to taxation4, the amount as net of taxes (and of social insurance contributions, where applicable) is collected, because net amounts are generally regarded as less exposed to measurement error and recall bias than gross ones. Reflecting the structure of the Italian fiscal system, where incomes earned in the solar year t are taxed in the following (t+1), the reference period in income-related questions is the previous fiscal year, that is 2010. This represents a mismatch with respect to demographic information, which reflects the situation of households at the time when the fieldwork was carried out (i.e. March and April 2011), and which has therefore been brought backward to 2010.

Still, an accurate simulation of Italian personal income tax requires additional information with respect to It-Silc topics coverage. Most notably, the personal income tax base includes not only employment and self-employment income, replacement income, profits from non-corporate enterprises and a marginal part of investment income, but also figurative income on immovable properties, valued as cadastral rent5, which is not covered in the survey. Besides, information on specific items of expenditures (e.g. healthcare, house refurbishments, etc.) that are relevant for specific tax reliefs, are not available in It-Silc. Missing information has therefore been imputed, drawing from other population-representative surveys covering the subject domains of interest.

We use the 2010 Survey on Households Income and Wealth released by the Bank of Italy (Bank of Italy, 2012), for imputing information on the self-reported asset value6 of the main residence, and other immovable properties, used to compute cadastral values. Drawing from the same survey, we also impute insurance premiums and house refurbishments expenditures, relevant for the computation of specific, and quantitatively important, tax reliefs. Imputation from Shiw has been performed using statistical matching techniques, where Shiw individuals have acted as ‘donors’ of the otherwise missing information for It-Silc observed ‘recipients’. Matching aims at selecting, for each It-Silc recipient, the Shiw donor that is closest to observational identity, i.e. the most similar in terms of characteristics, observed in both surveys that are predictive of the variable to be imputed. The quality of the matching procedure relies crucially on a so-called common support requirement of overlapping in the distribution of predictive characteristics in the donors’ and recipients’ samples, which has been empirically tested. The matching procedure we have adopted is based on a combination of stratification and Mahalanobis distance nearest neighbor algorithm (Rubin, 1980). Matching has been performed at the household level and with replacement, that is allowing the same Shiw household to act as donor for multiple It-Silc households, if deemed as the most adequate, rather than being discarded after having served once as donor. The donors’ and recipients’ samples have been stratified by main residence homeownership, other properties homeownership and geographical area, so that exact matching on these variables is ensured; then, within each stratum, the donor household has been selected based on the Mahalanobis distance metric, measured on other predictive variables. These include equivalent household income, the percentage of household members with more than upper secondary educational qualification, a set of household composition dummies, and the main earner’s employment status. The quality of matching has been gauged by investigating the balance (e.g. in terms of equality in means) in predictive variables between recipients and matched donors, and the procedure adjusted as long as achieved balance was deemed unsatisfactory. More detail on the implementation of the matching- based imputation, and the achieved balance, is reported in Appendix A.

As a result of the matching-based imputation from Shiw data, information on the asset value of owned properties is integrated into It-Silc. However, for fiscal purposes, properties are valued in terms of cadastral rental values. Therefore, we build on existing Land Registry data, which provide information on the distribution of properties values and corresponding cadastral incomes (separately by gender, by age group, by household composition, by marital status, by geographical area and by main residence/secondary property) to derive a measure of cadastral income from the available information on properties asset values, integrated into It-Silc. For assessing the cadastral value of the main residence, Land Registry information on the ratio between asset value and cadastral income is first of all expanded, using the Ras methodology7, to obtain marginal distributions across 300 subgroups, defined in terms of the above mentioned variables. After the It-Silc sample has been correspondingly stratified, the cadastral value for It-Silc households is computed as the ratio between the asset value of the main residence imputed from Shiw, divided by the corresponding asset-value-to- cadastral-income ratio8, drawn from the expanded Land Registry statistics. A similar procedure is followed for imputing the cadastral value of secondary properties, where appropriate9.

As an additional micro-data source, we use the 201310 Multiscopo Survey on Health Conditions and the Use of Health Services, released by the National Statistical Office (Istat, 2014), to impute information on healthcare expenditures, necessary to compute other major tax reliefs, namely those that involve the largest share of taxpayers. The Multiscopo survey has been used to estimate, at the individual level, the conditional probability of incurring in tax-relevant healthcare expenditures, such as specialists visits, drugs purchases, medical tests and treatments, as a function of predictive characteristics observed again both in Multiscopo and in It-Silc. These include gender, six age groups, self-assessed health, reported chronic conditions, limitations in activities of daily living, geographical regions, marital status, occupation, education, presence of dependent children and household size11. The estimated parameters have then been used to predict the probability of incurring in health expenditures for individuals observed in the It-Silc sample, based on their characteristics. As illustrated in the later Section 3.1, the estimated probability of healthcare spending is then flexibly used, together with fiscal data on tax reliefs, to identify beneficiaries of healthcare tax reliefs, and to impute related expenditure amounts.

## 3. The construction of Betamod

After the preliminary data adjustments and the imputations, the model has been built through 4 modules, integrated in an iterative procedure where outer loops (identifying recipients of tax expenditures, building calibration weights, computing individual- and income-specific tax evasion rates) feed into the inner loop of net-to-gross conversion, as depicted in Figure 1.

Figure 1

In more detail, based on family and personal characteristics relevant for eligibility, Module 1 identifies round-specific beneficiaries and expenditure amounts for all of the non-simulated tax reliefs, calibrating them to obtain the totals and the income distribution for beneficiaries and expenditures resulting from administrative tax returns data. Module 2 deals with the net-to-gross income conversion, through a standard iterative algorithm. Once gross and reported income measures have been obtained, Module 3 estimates calibration weights in order to match both population totals and administrative taxpayers counts. By comparison of the grossed-up obtained income measures with disaggregated administrative tax returns data, Module 4 produces an individual tax evasion rate which accounts for the individual’s different income sources composition (employment income, pensions, self-employment income12, rental income from immovable property), the income level, and geographical residence (North-West, North-East, Center, South). After Module 4, round-specific convergence, measured in terms of equality between reported incomes as estimated by the model13 and as resulting from official tax returns data14 (both at the aggregate level, and by subgroups defined by main source of income and by geographical area), is assessed.

The overall iterative procedure, continues until convergence is achieved. Specifically, the iterations stop when the reported levels of income estimated by the model, reflecting estimated tax allowances, tax evasion rates and calibration weights, not differ significantly from official tax returns data, both at the aggregate level and by subgroups (defined by main source of income and geographical area). The overall procedure generates a battery of individual level variables, including true gross income, tax evasion rate, reported income, tax relevant expenditures, calibration weights, for later use in policy simulation modelling. The following sections provide in more detail the four modules and innovative aspects of the model construction.

### 3.1 Deductions and tax credits module

In the Italian fiscal system there are different kinds of deductions and tax credits. The most sizeable, collectively worth over 5 per cent of Gdp, are listed in Table 1, while a comprehensive list of tax reliefs, and their quantitative importance, is reported in Tables 7 and 8. In terms of design, all deductions and tax credits are non-refundable, with the only exception of the tax credit granted to families with four children. Typically, an upper threshold applies to most tax expenditures (mortgage interest payments, rent paid by tenants, education), while the healthcare tax credit is allowed on expenses in excess of a lower threshold. Also, a withdrawal rate often applies, so that the fiscal benefit is decreasing in individual’s gross income15.

Table 1

Among deductions, the most relevant in terms of number of recipients and lost revenue, are social insurance contributions paid by self-employed individuals16, the cadastral value of the main residence and voluntary contributions to private pension plans. Other deductions are granted for specific expenditures, including legal alimony payments to spouses, donations to religious institutions, personal care services and disability aids for the disabled, and social insurance contributions paid for domestic help.

Among tax credits, the largest single item is a universal tax credit granted for specific income sources: the tax credit is applicable for either employment income, or self-employment income, or pension income, with a withdrawal rate resulting in a decreasing credit as gross income increases. This tax credit contribute to the income tax progressivity design, even more so given the absence of a legislated no tax area or legal zero rate tax bracket. Another set of tax credits aims at accounting for individual’s ability to pay, given her/his household composition (i.e. presence of dependent household members) and her/his children characteristics, such as age and disability. These tax credits are decreasing in individual gross income and become zero above a certain income threshold. The children tax credit amount and income threshold depend also on the number of children, and increase for each child aged three years or below and for disabled children. An additional refundable tax relief is granted for taxpayers with at least four children. Further tax credits are granted for specific expenditures, and amount to the 19 per cent of such expenditures: these include mainly healthcare, mortgage interest payments on both the main residence and other properties, life insurance premiums, secondary and tertiary education, childcare and charitable donations. Finally, a tax credits for up to a maximum of 55 per cent of the expenses incurred for energy conservation’s interventions and house refurbishments, and a lump sum tax credit for rent paid by low-income tenants, are allowed.

As standard in other tax benefit models for the Italian system, and reflecting data availability constraints, Betamod fully simulates the deduction for main residence cadastral value, and the tax credits by income source and for dependent family members17. However, with respect to other Italian models, which typically18 impute tax expenditures though calibration with aggregate fiscal data by income classes, Betamod calibrates not only expenditure amounts, but also beneficiaries. In particular, we aim at achieving a more realistic identification of beneficiaries, for each specific type of tax expenditure item, based on household and personal characteristics relevant for eligibility. Table 2 and Table 3 report the individual and family characteristics we used to identify the potential beneficiaries of deductions and tax credits. Simulated and non-simulated tax reliefs include all of the current categories provided by tax rules, namely, 8 deductions and different types of tax credits, these last grouped into 17 main categories. Thus, the model offers a complete picture of the wide array of tax reliefs that are part of the Italian income tax.

Table 2
Table 3

Once potential beneficiaries have been identified, calibration of amounts and beneficiaries to fiscal data has been carried out for each tax relief type. Calibration accounts not only for income classes, as standard in other experiences, but also, building on the availability of additional ad hoc data obtained from the Ministry of Economy and Finance, for specific relief beneficiaries distribution across occupational status (employee, self-employed and pensioner) and number of dependent household members (none, one, two or more).

Overall, in the light of the importance of tax expenditures in current and future tax reform discussion (Burman, 2003; Burman et al., 2008; Mef, 2011; Poterba, 2011; Tyson, 2014), Betamod can be used to estimate more accurately the revenue and distributional effects of all tax expenditures simultaneously and of specific tax reliefs or categories of expenditure.

### 3.2 Gross to net conversion module

To derive gross incomes, we follow a widely used procedure based on an iterative algorithm (see, for instance, Immervoll and O’Donoghue, 2001), represented in Figure 2.

Figure 2

For each taxpayer the procedure estimates an initial true gross income based on an average tax rate applied to net income as collected in the survey19, then applies an individual tax evasion rate, and then simulates the appropriate 2010 tax rules to produce a net income measure20, to be compared with the It-Silc one. If they differ, a new estimate of the true gross income is computed applying a correction factor, equal to the ratio between the original and the estimated net income, to the previous round true gross income and a new iteration is run. When equality between the two values is achieved (up to 1 euro of difference), the iteration ends and the data are sent to Module 3 for the reweighting procedure. The output for each individual, feeding into the following modules, includes true gross income, tax evasion rate, estimated reported income, deductions and tax credits, and gross and net income tax liability.

Table 4 below compares descriptive statistics for the components of gross income by income source obtained from our model with those available in It-Silc. It is interesting to observe, with respect to self-employment income, that Betamod produces a lower estimated gross income, reflecting our tax evasion modelling yielding higher tax evasion rates. Differences observed for other income sources are plausibly reflecting our reweighting procedure, illustrated in the following section, which aims at obtaining a representative sample of both population totals and the number of taxpayers.

Table 4

### 3.3 Reweighting module

Calibration weighting is a general technique for adjusting probability-sampling weights as of It-Silc so that model estimates are consistent with external official data sources (among others, see Atkinson et al., 1988; D’Amuri and Fiorio, 2006). As external data sources, we consider both population counts (from Istat official statistics) and official fiscal data (Mef).

While It-Silc weights are built to match population totals (Istat), we adjust them to achieve consistency with fiscal data as well, so that the model estimates are reconciled with both the entire population and taxpayers counts. The variables used for performing the individual level grossing- up are reported in Table 5. In addition to the standard socio-demographic variables, we also consider the number of taxpayers with dependent family members because of the important discrepancies between the sample distribution of household composition and official tax returns data. We obtain the joint distribution across those variables using the marginal distributions in a Ras-like iterative proportional fitting. The household weights are then computed by averaging individual household members weights21. As apparent in the last colum of Table 5, the achieved difference between Betamod recalibrated weights and external official data totals are appealing with respect to model estimates representativeness.

Table 5

### 3.4 The tax evasion module

According to the previous empirical literature concerning tax evasion at micro-level in Italy (Bernasconi & Marenzi, 1997; Florio & D’Amuri, 2006), we apply the “discrepancy method” to estimate tax evasion rates. The method, based on the assumption that individuals report a more truthful income to an anonymous interview than to fiscal authorities, computes tax evasion by comparing the tax returns and income survey responses of similar individuals.

In the above mentioned studies the comparison is made in terms of after-tax income. This choice has two main drawbacks. Firstly, it overestimate the tax evasion rates since it computes them as the ratio of evaded income on net income, instead of on true gross income. Secondly, when taxpayers are compared by quantiles of net incomes, a problem of re-ranking may arise. In fact, with respect to the distribution of after-tax income recorded in the survey, tax evasion shifts downwards individuals in the distribution of net income in the official data, so that, especially at low-income classes, the tax evasion rates are over-estimated. To overcome these drawbacks, Betamod estimates tax evasion rates as the percentage differences between the true gross incomes (as resulting from the net-to-gross conversion module) and the reported incomes declared to fiscal authorities. Clearly, since the true gross income is unknown and it is the results of the net-to gross procedure, tax evasion may be affected by approximations that depends on the estimation method.

Tax evasion rates are estimated in three steps (see Appendix B). In the first step, aggregate tax evasion rates, stratified by area and main income source type, are computed comparing simulated true gross incomes with administrative tax data on reported income. As administrative data are provided in aggregates, by main income source type and, separately, by geographical area, we first apply a Ras technique to obtain the joint distribution of reported income by both dimensions. As a result, a 4×4 matrix of average evasion rates, by income type and geographical area, is obtained (see Table 12).

In the second step a distributional income profile of tax evasion is estimated for each area-by- income type stratum. We refine stratification expanding the 16 strata to account for the profile of tax evasion by income classes. In more detail, each area-by-income type stratum is expanded into 13 classes of true gross income, so that 16 income profiles of tax evasion are obtained. The design of each evasion-by-income profile results from an optimizing procedure, which aims at minimizing the distance between simulated and administrative reported income. The result is a 16×13 dimension matrix of tax evasion rates by main income source type, geographical area and true gross income level.

Finally, a tax evasion rate is assigned to each individual for each type of income source to overcome the standard procedure of assigning the same tax evasion rate to all individuals in each matrix cell. Betamod selects randomly, within each cell, individuals to be identified as tax compliers, and those to be identified as tax evaders, then assigns individual tax evasion rates by using a beta distribution whose mean value is equal to the average tax evasion rate of the cell. Namely, individual tax evasion rates are calibrated so that the sum of individual evaded incomes is equal to the total income evaded in the class. This represents an advancement, with respect to other models, where tax evasion rates are assumed to be constant within population subgroups (e.g. by income source type, by income classes). This feature allows assessing the relevance of re-ranking between tax-payers due to the presence of tax evasion.

## 4. Validation and main results

The ability of Betamod to reproduce each measure (gross income, taxable income, deductions, tax credits and net tax liability) relevant for personal income tax and local income taxes is validated through a comparison with official figures provided by tax returns data for the relevant fiscal year, that is 2010. To do this, we first compare the aggregate tax figures simulated by Betamod with the official fiscal statistics. Results are shown in Tables 6, 7 and 8.

Table 6
Table 7
Table 8

First, it should be noted that tax evasion reduces the true gross income of about 61 billions of euro, corresponding to an average tax evasion rate of 7.2 per cent. The estimated tax evasion rate might seem relatively low in a country, like Italy, where tax evasion is a widespread phenomenon (among others, Marino and Zizza, 2012; Fiorio and D’Amuri, 2006). However, the figure reflects the fact that employment income and pensions taken as a whole account for more than the 80% of total reported income (53% and 29%, respectively) and that the estimated average tax evasion rates for these two types of income are, respectively, 2.9 per cent and zero. As apparent in Table 6, Betamod output and official fiscal data presents trivial (i.e. lower than 1%) differences in most figures achieving a very good performance in simulating revenues amounts and taxpayers’ counts22.

The largest difference arises in the number of individuals with positive gross tax liability. This seems mostly driven by the model imputation of tax deductions, resulting in a larger number of individuals with positive taxable income in Betamod. This is because tax deductions have been imputed as a percentage of reported income, thus constraining their amount to be lower than reported income, and therefore taxable income to be positive, by construction. In addition, tax rules require some taxpayers to report a zero gross tax liability even if it is in fact positive: this applies for example to pensioners with gross income (excluding the cadastral return on main residence) lower than 7,5 thousands euros, or to taxpayers whose only income, if lower than 500 euros, is that from buildings.

Focussing on deductions, Table 7 reports the number and the amount of beneficiaries for each type. Again, no significant differences are found between Betamod results and tax returns data, in particular, Betamod replicates well the largest deduction (the social insurance contributions).

Some discrepancies can be observed only in simulating the number of deduction beneficiaries for donations to religious institutions (-8.7%) and for alimony payments to the spouse (-9.5%). In both cases the number of tax relief claimants is anyway negligible. Table 8 considers tax credits, covering both the model-simulated and the imputed ones. In general, Betamod estimates provide a good approximation of the tax returns figures.

The number of beneficiaries and the amount of the income-source tax credit are overestimated of about 3.9% and 6.2% respectively. This is mainly due to the fact that estimated reported incomes are more dense in the bottom of the distribution in Betamod than in tax data. Since the tax credit is decreasing in income, the Betamod tax credit results greater than in tax returns data. As to the dependent family members tax credit, the striking similarity in the number of beneficiaries is motivated by this variable having been taken into account in the weighting design, while the simulated amount of tax credit is -4.0% lower than the official figure, presumably reflecting the sample distribution of household composition, relevant for identification of dependants. The other most sizeable tax credits, namely healthcare expenditures, house refurbishment, energy interventions and mortgage interest tax credits are remarkably close to the administrative figures. As expected, the main discrepancies arise in the numbers of beneficiaries of the less sizeable tax credits23.

Besides assessing the model validity at the aggregate level, no less attention should be devoted to the validation of the distributional patterns of different components of the model output, as it mainly represents a tool for carrying out distributional analyses. First, we compare the distribution of taxpayers (Figure 3) and of simulated reported income (Figure 4) with official statistics. Overall, the Betamod distributions are strikingly similar to the fiscal data ones, especially in the classes of reported income where most of taxpayers fall (12–26 thousands of euros). Such pattern of similarity is confirmed when considering the distribution of average gross and net tax liabilities across income classes (Table 9). The following Figures 5 and 6 represent the progressive design of the income source and the family dependents tax credits, as arising from Betamod and from tax returns data. Again, the similarity between the two is striking, and is also confirmed for other tax allowances (the related figures are reported in Appendix C). Interestingly, both tax credits are partly lost by taxpayers in the bottom income class, due to their low level of taxable income/gross tax liability, and to the non-refundable nature of these tax credits.

Figure 3
Figure 4
Figure 5
Figure 6
Table 9

Although comparison with other Italian microsimulation studies (e.g. Fiorio and D’Amuri, 2005; Tomarelli and Acciari, 2010; Di Nicola et al., 2015) is hindered by differences in the fiscal years considered, as well as by different modelling choices, the redistributive impact and progressivity design estimated by Betamod result broadly in line with those.

Further insight into the distributional effect of different personal income tax components, can be gained decomposing the overall progressivity impact, as measured by the Kakwani index shown in Table 10. This reflects both the tax design (i.e. provisions for tax exemptions, deductions, the tax rate schedule, tax credits) and the effect of tax evasion. We build on a reinterpretation of the Pfähler (1990) decomposition of Kakwani index25, in the spirit of Verbist and Figari (2013). In more detail, the total Kakwani progressivity index πkTpit can be expressed as a weighted sum26 of gross tax liability progressivity πkK and tax credits progressivity πkK, as in:

(1) ${\pi }_{{T}_{n}}^{K}=\frac{{t}_{g}}{{t}_{n}}{\pi }_{Tg}^{K}+\frac{k}{{t}_{n}}{\pi }_{K}^{K}$
(2) ${\pi }_{Tg}^{K}={\pi }_{R}^{K}+\frac{ev}{\left(1-ev-e-d\right)}{\pi }_{EV}^{K}+\frac{e}{\left(1-ev-e-d\right)}{\pi }_{E}^{K}+\frac{d}{\left(1-ev-e-d\right)}{\pi }_{D}^{K}$
Table 10

In other words, the progressivity of gross tax liabilities (πkTg) is further decomposed in a direct progressivity effect resulting from the tax rate schedule πkR and an indirect progressivity effect depending on the amounts of various exemptions/deductions πkE πkD from gross income. Our decomposition measures directly also the contribution to progressivity of tax evasion πkEV. Each Kakwani index show the degree of disproportionality in each tax component, relative to the distribution of gross income. Results are shown in Table 11, with Kakwani indices reported in the last column.

Table 11

Tax evasion and exemptions (namely the cadastral value of the main residence) enhance progressivity, whereas deductions are wholly regressive. The effect of tax evasion is mainly due to its negative income gradient, reducing gross income more at the lower end of the distribution. The exemption of imputed rent increases progressivity since this figurative income component is proportionally more sizeable for lower income taxpayers. On the other hand, deductions have an inequality enhancing impact, plausibly motivated by the proportional effect of social insurance contributions on the self-employed being offset by pro-rich pattern of personal expenses. Not surprisingly, the tax schedule exhibits a major progressivity effect. It is tax credits though that are the most important determinant of progressivity, their contribution amounting to about 58% of overall progressivity. This is mostly driven by income-source and dependent family members tax credits, whose design entails positive withdrawal rates as taxable income increases. Other tax credits, subsidizing personal spending on a wide range of goods and services, including housing, healthcare and education, while less sizeable, do display a regressive effect.

Clearly, the overall progressivity impact of each component depends on their relevance with respect to gross income. For instance, the value of the Kakwani index for the dependent family members tax credit is remarkably higher (0.5329) than the one for the income-source tax credits (0.3989), but the contribution to progressivity of the latter exceeds that of the former because of the relative weights.

## 5. Tax evasion and its distributional profile

To showcase Betamod potential for analysis, in this section we provide some distributional evidence on tax evasion. According to our estimates, on aggregate €61 billions of gross income escape tax authorities, corresponding to a tax revenue loss amounting to about €16 billions27. Unsurprisingly, tax evasion arises mostly from self-employed income and, to a lesser extent, rental income from property: overall, 85% of evaded income is attributable to these two sources (65% and 20% respectively). The remaining 15% of evaded income is attributable to employment income, as pension income, representing a public transfer, can hardly be hidden from tax authorities.

Average tax evasion rates, by income source and geographical area, are reported in Table 12. The figures reveal that tax evasion on employment income, while not negligible, is low (2.9%), and that the largest tax evasion rates are registered on rental income from immovable property (33.6%) and self-employment income (24%). Relevant differences arise also between geographical areas: in particular, our results identify individuals living in the South of Italy as those displaying systematically higher tax evasion rate, followed by those in the North East. The Betamod estimated average values are slightly lower, yet not inconsistent, with estimates derived by above mentioned studies on tax evasion in Italy.

Table 12

As arises from Figure 7 (a,b,c), the distribution of tax evasion rates varies across different income sources. Among individuals who hide employment income from tax authorities, low tax evasion rates are most often estimated. On the contrary, more than half of self-employed income tax evaders display a tax evasion rate that is higher than 60%. A similar distribution arises for rental income evasion; about 50% of rental income tax evaders display tax evasion rates between 60 and 80%. Figure 7d plots the full distribution of estimated individual tax evasion rates, by true gross income, i.e. the ‘true’ amount individuals would report to tax authorities under full compliance. The Figure reveals that individuals’ tax evasion rates cluster around an upper and a lower level, reflecting the underlying individual income sources composition, i.e. the prevalence of employment (relatively low level of tax evasion) versus self-employed and rental incomes (high level of tax evasion). The evidently negative gross income gradient of tax evasion rates clearly reflects the tax evasion estimation procedure, which accounts for evasion-by-income profiles28.

Figure 7

The following Figure 8, where tax evasion rates by income class are shown, provides further evidence on the negative gross income gradient of tax evasion rates, and allows to better gauge the income profile of tax evasion behaviour by income source as well. In relative terms, both for each income source, and for their aggregate, consistently with previous studies (Bernasconi & Marenzi, 1997; Fiorio & D’Amuri, 2006), Betamod reflects tax evasion rates generally decreasing in income29. With respect to those works, Betamod yields a flatter income gradient for tax evasion by employees in the lower income classes. This plausibly comes as a consequence of our tax evasion rate being computed over gross income, while their figures are based on net incomes at the denominator.

Figure 8

Figure 9 shows the total amount of unreported income. It can be noticed that, despite the decreasing profile of tax evasion rates, most of evaded income is due to taxpayers with gross income in the range 12,000–50,000 euro, and mainly to self-employed income.

Figure 9

Tax evasion, by reducing reported income, causes a relevant downward shift in the distribution of taxpayers by reported income, with respect to that by (true) gross income. To begin with, tax evasion may modify the relative position (in terms of reported income) between fully-compliant taxpayers and same-true-gross-income evaders, generating an horizontal inequity effect in income taxation. Indeed, while horizontal inequity is one of the major consequence of tax evasion, very little studies measuring it exist. Betamod evidence is provided in Figure 10, where the two cumulative distributions of taxpayers, by (true) gross and reported income respectively, are shown. The distribution of individuals by reported income is thicker in the left tail, when compared with the distribution of gross income, suggesting a downward movement, along the income distribution, of taxpayers who “benefit” from tax evasion.

Figure 10

Betamod transition matrix, reporting the share of true gross income taxpayers falling in each income class, found in different reported income classes as a result of tax evasion, is reported in Table 13. As a result of non-compliance, the taxpayers in the bottom income class, for instance, moves from about 10% when considering true gross income to about 14% when considering reported income relevant for taxation. Evaders who enter the bottom income class come from the 2nd to the 8th income class (up to 29 thousands of euros), rather than from higher income classes, reflecting the decreasing income profile of tax evasion rates. Moving to the upper classes, we observe a similar pattern of shifts across income classes, although the number of shifts is progressively reduced, again because of the negative income gradient in tax evasion. While Table 10 only reports between class shifts, building on the availability of individual tax evasion rates, Betamod allows to detect further shifts happening within each income class.

Table 13

Once taxation applies to reported income, shifts along the income distribution, give rise not only to horizontal inequities, but also to a re-ranking effect, with a reversal of taxpayers’ relative positions before (i.e. reflecting the true gross income position) and after personal income taxation (i.e. based on the reported income position ), which the model also allows studying. Although preliminary, the novel empirical evidence showcased here bears major implications for the accurate measurement of the actual redistributive effect of personal income taxation and its decomposition in the horizontal, vertical and re-ranking effect, each of which is possibly altered by tax evasion.

## Footnotes

### 1.

In the past decade several microsimulation models were developed in Italy. For instance: the Siena microsimulation model (SM2) for net-gross conversion of Eu-Silc income variables (Betti et al. 2011); the Mapp model for studying the effects of taxes and transfers (in cash and in kind) on the level of poverty and inequality (Baldini et al., 2011); the Tabeita model that reproduces the Italian personal income tax (Ceriani et al., 2013), and the microsimulation model developed by Pellegrino et al. (2011) for the analysis of housing taxation.

### 2.

Further material is provided in three Appendices. Appendix A describes the statistical matching between the It-Silc dataset and the Bank of Italy’s Survey on Households Income and Wealth (Shiw); Appendix B illustrates the methodology used for the estimation of individual tax evasion rates, and Appendix C shows the incidence of tax reliefs on reported income.

### 3.

For instance, SM2 model (Betti et al., 2011), Mapp model (Baldini et al., 2011), and the Euromod module for Italy (Sutherland and Figari, 2011) use It-Silc data; while, Tabeita model (Ceriani et al., 2013) and the microsimulation model developed by Pellegrino et al. (2011) considers as input data those provided by the Bank of Italy in the Survey on Households Income and Wealth (Shiw).

### 4.

Non-taxable incomes and benefits are taken from the survey, rather than simulated, in order to obtain the disposable income measure.

### 5.

While cadastral income on the main residence is de facto exempted from personal income taxation trough a tax deduction, it is anyway relevant for other components of the tax benefit system, such as the means test for family benefits. For other properties, according to whether they are rented or left unoccupied, the actual rent received or cadastral income are respectively used in tax base assessment.

### 6.

The Shiw question asks respondents to assess subjectively the value of each of their properties.

### 7.

The Ras algorithm is an iterative proportional fitting procedure that estimates joint distribution of two or more variables given their marginal distributions. See Bacharach (1965).

### 8.

More precisely, the ratio has been multiplied by a 1.05 correction factor, to reflect a legislated uprating adjustment.

### 9.

When secondary properties are rented, the actual rent received, as collected in It-Silc, rather than cadastral income, enters in the tax base definition.

### 10.

The Multiscopo Survey did not take place in 2010. Even though time distance between the interviews in It-Silc 2010 and Multiscopo Survey 2013 seems quite large, this does not constitute an issue since we only used qualitative information that are actually comparable between the two datasets.

### 11.

We have not included income among the control variables since the Multiscopo Survey does not provide any information about it. However, research findings have suggested that, while at aggregate level there exists a positive and significant relationship between healthcare expenditure and Gdp (Newhouse, 1977), at individual level, there is not a significant association between healthcare expenditure and income (especially when the health system provides universal coverage free of charge as the Italian healthcare system does). Indeed, full insurance coverage would remove the individual budget constraint and reduce or eliminate the influence of cost of care on patients’ decisions of how much care to use. Typically, income elasticity of individual healthcare expenditure under full insurance coverage regime tends to be near zero (for details see Getzen, 2000).

### 12.

We consider as self-employed members of the arts and or professions, sole proprietors, free lances, owners or members of a family business and persons receiving profits from non- corporate enterprises.

### 13.

By ‘reported incomes as estimated by the model’ we mean the portion of true gross income that we estimate the individual will declare, given his tax evasion rate. In what follows, this will be referred to as ‘estimated reported income’, as opposed to ‘reported income’, which refers to official tax returns data.

### 14.

The tax returns of the entire population of taxpayers are disposable on the website of the Italian Revenue Agency (Ministry of Economy and Finance) only in tabulated form (e.g. by type of income source, by income classes, by area of residence, etc.). Additional ad-hoc data were required for better modelling tax reliefs.

### 15.

The gross income qualified for tax reliefs is net of cadastral income on the main residence.

### 16.

Employees’ social contributions are not listed among deductions as they are excluded from taxable employment income.

### 17.

The simulation of tax credit for dependents required the construction of fiscal family that may not coincide with the definition of household adopted in It-Silc. In fact, fiscal family members include the spouse, children and other relatives living with the referent person and having a personal gross income (before deductions) below € 2,840.

### 18.

A notable exception is the Siena microsimulation model (SM2) which, building on an exact record linkage between survey and fiscal administration data (Consolini et al., 2006, Consolini et al., 2009, Donatiello et al., 2009), is able to account for the full set of tax expenditures as observed from fiscal data.

### 19.

The measurement of net labour earnings accounts for specific pay components, namely: net salary and additional compensations including the thirteenth/fourteenth monthly pay (a peculiarity of the Italian institutional setting), income from temporary project-based employment contracts, which are fiscally equivalent to employment income, and taxable unemployment benefits.

### 20.

In computing individual tax liabilities, deductions are subtracted from reported income, to obtain taxable income. The gross tax is calculated applying the tax schedule to taxable income. Then net tax is obtained subtracting tax credits from gross tax.

### 21.

An appropriate factor of correction is applied to ensure representativeness of households by geographical area.

### 22.

The regional income tax is simulated by Betamod while the municipal income tax is imputed.

### 23.

Simulating the correct number of beneficiaries in the quantitatively less important tax credits is, in fact, one of the most common challenges in microsimulation modelling due to the lack of information relevant for identification of potential claimants in the survey data, as well as to the small number of individuals involved.

### 24.

The household equivalent income is obtained by applying the OECD-modified equivalence scales. We compute household’s net income by adding all true gross income earned by the family members ad subtracting the personal tax liabilities.

### 25.

Kakwani index measures the departure from proportionality as the difference between the concentration coefficient of tax and the Gini index of gross income.

### 26.

The weight for the gross tax liability progressivity (πk ) is the ratio between gross tax rate (tg) and net tax rate (tn); the weight for tax credits progressivity (πkK) is the ratio between tax credits as a proportion of gross income (k) and net tax rate (tn).

### 27.

The tax revenue loss refers to the personal income tax (15 billions), regional and municipal additional income taxes (800 and 160 millions respectively).

### 28.

As previously explained in Section 3.4, the decreasing aggregate profile results by the comparison between Betamod simulated gross income and reported income to tax authorities.

### 29.

Results must be considered taking into account that they are based on the income distribution which directly emerges from It-Silc survey. However, the survey doesn’t guarantee representation of true income distribution. Previous studies, although based on Bank of Italy’s survey (e.g. Cannari and D’Alessio, 1992) have in particular identified two major biases, which are indeed common to surveys conducted in other countries. The first is the selectivity bias due to the fact that not all families are equally available to participate to the survey; the second is known as under-reporting, and arises when the respondent reports a disposable income below the true income. Both selectivity bias and under-reporting can explained with the fear that some people have that their files could be accessed by the tax authorities. Evidence indicates that the fear is more pronounced in individuals belonging to the upper tail of the distribution. A third, though less relevant, bias is originated by some over-reporting of people belonging in the lower tail. Clearly all three biases contribute to making the sample distribution less unequal than the real distribution.

## A. Statistical matching between the IT-SILC and SHIW datasets

We describe here how the statistical matching between the IT-SILC dataset with the Bank of Italy's Survey on Households Income and Wealth (SHIW) at the household level was performed. First, two constraints need be satisfied to make matching feasible: (i) the two surveys must be random samples from the same population; (ii) there must be a common set of conditioning variables. In our case, the first condition is met by design, since both the IT-SILC 2011 and the SHIW 2012 data are representative of the Italian population. As far as the second constraint is concerned, the variables (X) common to each dataset and chosen for the process of imputation of self-reported asset value of the main residence, insurance premiums and house refurbishments expenditures are: equivalent household income, the percentage of household members with more than upper secondary educational qualification, a set of household composition dummies, and the main earner’s employment status. The final sample is made up of 7.951 households from the SHIW survey and 19.399 households from the IT-SILC Survey.

The dataset, integrated by IT-SILC-Bank of Italy was created using the Mahalanobis Distance Matching Method (MDMM), a statistical method which allows individuals with similar characteristics but from different datasets to be paired (Rosenbaum & Rubin, 1983). In order to obtain a more precise matching, the sample was stratified in cells according to the main residence homeownership, other properties homeownership and geographical area so that exact matching on these variables is ensured; then, within each stratum, the donor household has been selected based on the Mahalanobis distance metric, measured on the other X variables. The Mahalanobis metric is a measure of dissimilarity between observation which measures the distance between units i from the recipient dataset IT-SILC and j from the donor dataset SHIW weighting each coordinate of X in inverse proportion to the variance of that coordinate:

Matching has been performed at the household level and with replacement, that is allowing the same SHIW household to act as donor for multiple It-Silc households, if deemed as the most adequate, rather than being discarded after having served once as donor. Once the matching procedure was complete, we check the quality of the matching. The quality of matching was evaluated in terms of maintaining the asset value of the main residence, insurance premiums and house refurbishments expenditures distributions, both in terms of preserving the pre-existing variables distribution as well as in terms of pre-existing relations between variables of interest.

The next step was i) the comparison between the asset value of the main residence, insurance premiums and house refurbishments expenditures distributions in the integrated dataset and the pre-existing SHIW one, ii) the calculation of the correlation between asset value of the main residence, insurance premiums and house refurbishments expenditures distributions and the X vector to verify the maintenance of the sign recorded in the "donor set". The differences between the common-fusion correlations in the Shiw data set versus the fused It-Silc data set were well preserved for most variables. For the sake of brevity, tables showing distributions and correlations are not included but they are available on request.

Finally, the quality of the matching has been evaluated in terms of “balancing test”: we compared the mean covariate values in the recipients and matched donors i.e. each of the observable covariates within the recipients has the same average value within the matched donors. Before matching we expect differences, after matching the variables should be balanced in both groups and significant differences should not persist. The covariate balancing test, included in Table A1, shows that the matching is effective in removing differences in observable characteristics between the recipients and matched donors. In particular, the median absolute bias is reduced by approximately 82%-98%. The Pseudo R-squared after matching is always close to zero, correctly suggesting that the covariates have no explanatory power in the matched samples. The chi-square test conducted before and after matching, proves that the propensity score removed bias due to differences in covariates between the recipients and matched donors.

Table A.1

## B. Estimation of tax evasion rate

### B.1 Tax evasion rates by income source type and geographical area

From official tax returns data (Mef) we know the total amount of reported income and the number of taxpayers by four main income source type (Emp=employment income, Pen=pensions, Imm=rental income from immovable property, Self=self-employment income) and, separately, by four geographical area. From the Betamod simulated true gross incomes we compute the total amount of reported income and the number of taxpayers for the same characteristics. Then, it is possible to compute two sets of average tax evasion rates by main source of income i:

(B1) ${\overline{e}}_{i}^{s}=\frac{{\overline{y}}_{i}^{s}-{\overline{y}}_{i}^{s}{}^{MEF}}{{\overline{y}}_{i}^{s}}$

and by geographical area j:

(B2) ${\overline{e}}_{j}^{A}=\frac{{\overline{y}}_{j}^{A}-{\overline{y}}_{j}^{A}{}^{MEF}}{{\overline{y}}_{j}^{A}}$

To convert the tax evasion rates by main income source type of taxpayers ($e¯iS$ ) into rates referred to types of income received ($e¯jR$), we use the BETAMOD estimated true gross incomes to build a 4×4 matrix B, in which each element B ji is total amount of true gross income of type j (j = 1,…,4) received by taxpayers with main source of income type i (i = 1,…,4). The total amount of unreported income by main source of income type i is computed as:

(B3) ${U}_{i}^{S}=\left({\overline{y}}_{i}^{S}-{\overline{y}}_{i}^{S}{}^{{}^{MEF}}\right){N}_{i}^{S}$

where $NiS$ is the number of taxpayers with main source of income type i.

With this information it is possible to compute the tax evasion rates by income source by solving the linear system:

(B4) $\left[\begin{array}{llll}{B}_{EMP,EMP}\hfill & {B}_{PENS,EMP}\hfill & {B}_{IMM,EMP}\hfill & {B}_{SELF,EMP}\hfill \\ {B}_{EMP,PENS}\hfill & {B}_{PENS,PENS}\hfill & {B}_{IMM,PENS}\hfill & {B}_{SELF,PENS}\hfill \\ {B}_{EMP,IMM}\hfill & {B}_{PENS,IMM}\hfill & {B}_{IMM,IMM}\hfill & {B}_{SELF,IMM}\hfill \\ {B}_{EMP,SELF}\hfill & {B}_{PENS,SELF}\hfill & {B}_{IMM,SELF}\hfill & {B}_{SELF,SELF}\hfill \end{array}\right]\left[\begin{array}{l}{\overline{e}}_{EMP}^{R}\hfill \\ {\overline{e}}_{PENS}^{R}\hfill \\ {\overline{e}}_{IMM}^{R}\hfill \\ {\overline{e}}_{SELF}^{R}\hfill \end{array}\right]=\left[\begin{array}{l}{U}_{EMP}^{R}\hfill \\ {U}_{PENS}^{R}\hfill \\ {U}_{IMM}^{R}\hfill \\ {U}_{SELF}^{R}\hfill \end{array}\right]$

so $e¯R=(e¯EMPR,e¯PENSR,e¯IMMR,e¯SELFR)=B−1U$.

The amount of unreported income by source type is:

(B5) ${U}_{i}^{R}={\overline{e}}_{i}^{R}{Y}_{i}^{R}$

and $YiR$ is the total amount of type i’s received income. The amount of unreported income by geographical area is instead:

(B6) ${U}_{j}^{A}={\overline{e}}_{j}^{A}{Y}_{j}^{A}$

and $YjA$ is the total amount of area j’s received income.

From the Betamod simulated true gross incomes we compute the total amount of individual incomes by main income source type (i = 1,…,4) and by geographical area, (j = 1,…,4), obtaining the 4×4 matrix Y = {yij}.

By using matrix Y and the marginal distribution of unreported income by source type, UR, and by geographical area, UA, with the use of the RAS technique we first obtain the joint distributions of total unreported income by income source and by area, U = {uij}, and, secondly, the 4×4 matrix of average tax evasion rates, $e¯={e¯ij}$, by source type of received income and by geographical area:

(B7) ${\overline{e}}_{ij}=\frac{{u}_{ij}}{{y}_{ij}}$

The matrix $e¯$ is shown in Table 12.

### B.2 Tax evasion profiles by classes of true gross income

Each average tax evasion rate $e¯ij$ is then modulated in order to obtain a profile of tax evasion by classes of true gross income, i.e. a vector of tax evasion rates associated with 13 income classes (see Table 9). Define $e¯ijk$ as the average tax evasion rate for income class k, source type i and area j with the following function:

(B8) ${\overline{e}}_{ijk}=\frac{{k}_{i}^{e}{\overline{e}}_{ij}}{1+\left({k}_{i}^{e}-1\right){\left(\frac{{y}_{ijk}}{{k}_{i}^{y}{\overline{y}}_{ij}}\right)}^{{z}_{i}}}$

where:

yijk = mean gross true of class k, source type i and area j;

$y¯ij$ = mean gross true of source type i and area j;

$e¯ij$ = average tax evasion rate for source type i and area j;

and the parameters to estimate are:

$kie$ determines the ordinate intercept;

$kiy$ determines the level of income for which $e¯ijk=e¯ij$;

zi determines the curvature of the function.

With this formulation we need to estimate 12 parameters: $kie$, $kiy$, zi with i = 1,…,4. As we assume that pensions cannot be concealed, the number of parameters reduces to 9. The method used by Betamod is a procedure of numeric optimization that assigns randomly the value of the 9 parameters and choose the best combination that minimize the distance function

(B9) $D={\sum }_{k}|{Y}_{k}^{MEF}-{Y}_{k}^{BETAMOD}|$

where $YkMEF$ and $YkBETAMOD$ are respectively the official returns and the Betamod total amount of reported income by classes. The profiles obtained are shown in Figure 8.

### B.3 Assignment of individual tax evasion rates

The average tax evasion rate $e¯ijk$ (for the i-th income source type, the j-th geographical area and the k-th class of true gross income) is defined as the ratio between the unreported income Uijk and the true income Yijk of the cell:

(B10) ${\overline{e}}_{ijk}=\frac{{U}_{ijk}}{{Y}_{ijk}}$

And can be seen as the product:

(B11) ${\overline{e}}_{ijk}=\frac{{U}_{ijk}}{{Y}_{ijk}}=\frac{{U}_{ijk}}{{Y}_{Eijk}}\frac{{Y}_{Eijk}}{{Y}_{ijk}}={\overline{e}}_{Eijk}{H}_{Yijk}$

where:

YEijk is the total amount of tax evaders’ income in cell i,j,k

$e¯Eijk=UijkYEijk$ is the average tax evasion of tax evaders in cell i,j,k

$HYijk=YEijkYijk$ is the share of tax evaders income in cell i,j,k

The values of $e¯Eijk$ and HYijk are unknown, but we know their product $e¯ijk$ and their maximum value (i.e. 100%). In the absence of further information, we assume that the two values are equal, so:

(B12) ${H}_{Yijk}={\overline{e}}_{Eijk}=\sqrt{{\overline{e}}_{Eijk}}$

For instance, if the average tax evasion rate in a cell is 0.25, then we assume that $HYijk=e¯Eijk=0.25=50%$, i.e. tax evaders own the 50% of the true gross income in the cell and that their tax evasion rate is 50%.

To assign individual tax evasion in Betamod we proceed in the following way:

1. for each value $e¯ijk$ we compute the two values $e¯Eijk$ and HYijk;

2. we randomly assign a probability to be a tax evader to each taxpayer in the sample (by means of a uniform distribution) and we order taxpayers in decreasing order of probability;

3. starting with the taxpayer with the highest probability, we assign a random tax evasion rate drawn from a beta distribution with mean $e¯Eijk$ and a standard error varying with the mean;

4. we proceed to assign tax evasion rates to taxpayer with lesser probability until we reach the total amount of unreported income of the cell Uijk.

The beta distribution used to assign a tax evasion rate etijk to the income of source type i of the taxpayer t with characteristics j,k is then $etijk~beta(θe¯Eijk1−e¯Eijk,θ)$. This beta distribution has expected value equal to $E(etijk)=e¯Eijk$ and standard deviation equal to $sd(etijk)=e¯Eijk(1−e¯Eijk)2k+1−e¯Eijk$.

The standard deviation is close to zero when there is no tax evasion ($e¯Eijk=0$) or when all income is concealed ($e¯Eijk=1$), and is negatively correlated with the parameter θ. We assigned to θ a value of 5 in order to obtain a maximum value of the standard deviation approximatively equal to 1/6 when the average tax evasion rate is about 1/3.

Figure C.1
Figure C.2
Figure C.3
Figure C.4

## References

1. 1
Decomposing the Gini coefficient to reveal vertical, horizontal and reranking effects of income taxation
(1994)
National Tax Journal 47:273–294.
2. 2
Tax-Benefit Models, STICERD Occasional Paper, No. 10
(1988)
Grossing-up FES data for Tax-Benefit Models, Tax-Benefit Models, STICERD Occasional Paper, No. 10, London, London, School of Economics, STICERD.
3. 3
The distributional effects of personal income tax expenditure, EUROMOD Working Paper Series, EM14/14
(2014)
University of Essex.
4. 4
Estimating Nonnegative Matrices from Marginal Data
(1965)
International Economic Review 6:294–310.
5. 5
Mapp, a tax benefit microsimulation Model for the Analysis of Public Policies in Italy
(2011)
University of Modena and Reggio Emilia, Department of Economics.
6. 6
Household Income and Wealth in 2010, Supplements to the Statistical Bulletin, 6.
(2012)
Household Income and Wealth in 2010, Supplements to the Statistical Bulletin, 6..
7. 7
Gli effetti redistributivi dell’evasione fiscale in Italia
(1997)
Ricerche quantitative per la politica economica, Bank of Italy, Rome pp. 1–38.
8. 8
The Siena microsimulation model (SM2) for net-gross conversion of Eu-Silc income variables
(2011)
International Journal of Microsimulation 4:35–53.
9. 9
Microsimulation as a tool for evaluating redistribution policies
(2006)
Journal of Economic Inequality 4:77–106.
10. 10
Is the Tax Expenditures Concept Still Relevant?
(2003)
National Tax Journal 56:613–627.
11. 11
How Big Are Total Individual Income Tax Expenditures, and Who Benefits from Them?
(2008)
American Economic Review: Papers & Proceeding 98:79–83.
12. 12
Mancate interviste e distorsione degli stimatori, Temi di discussione, 172, Banca d’Italia.
(1992)
Mancate interviste e distorsione degli stimatori, Temi di discussione, 172, Banca d’Italia..
13. 13
The Importance of Choosing the Data Set for TaxBenefit Analysis
(2013)
International Journal of Microsimulation 6:86–121.
14. 14
Administrative and Survey Microdata on Self-Employment: the Italian Experience with the Eu-Silc project
(2006)
’, paper presented at the Iariw 29th General Conference, Joensuu, Finland, . pp. 20–26.
15. 15
Integrazione dei dati campionari Eu-Silc con dati di fonte amministrativa
(2009)
Collana Istat Metodi e Norme, 39, Rome, March 2009.
16. 16
The static microsimulation model of the Italian Department of Finance: Structure and first results regarding income and housing taxation, Economia Pubblica (forthcoming).
(2015)
The static microsimulation model of the Italian Department of Finance: Structure and first results regarding income and housing taxation, Economia Pubblica (forthcoming)..
17. 17
The Construction of Gross Income Variables of EU-SILC (EU Statistics on Income and Living Conditions) in Italy: a mixed strategy using microsimulation and administrative
(2009)
Paper presented at the 2nd General Conference of the International Microsimulation Association: Microsimulation - Bridging data and Policy, June .
18. 18
Grossing-up and validation issues in an Italian tax-benefit microsimulation model. Econpubblica WP, No. 117
(2006)
Milan: Bocconi University.
19. 19
Tax Evasion in Italy: An Analysis Using a Tax-benefit Microsimulation Model
(2006)
The IUP Journal of Public Finance 4:19–37.
20. 20
Health care is an individual necessity and a national luxury: applying multilevel decision models to the analysis of health care expenditures
(2000)
Journal of Health Economics 19:259–270.
21. 21
Economia non osservata e flussi finanziari, Rapporto Finale, Ministero dell’Economia e delle Finanze, Roma
(2011)
Economia non osservata e flussi finanziari, Rapporto Finale, Ministero dell’Economia e delle Finanze, Roma.
22. 22
Imputation of gross amounts from net incomes in household surveys: An application using EUROMOD, EUROMOD Working Paper Series, EM1/01
(2001)
University of Essex.
23. 23
24. 24
Italy: The Delega Fiscale and the Strategic orientation of tax reform
(2012)
IMF Fiscal Affairs Department.
25. 25
Tax Evasion and the Shadow Economy
(2012)
Personal Income Tax Evasion in Italy: An Estimate by Taxpayer Type, Tax Evasion and the Shadow Economy, Cheltenham, UK, Edward Elgar Publishing.
26. 26
The impact of mortgage interest tax relief in the Netherlands, Sweden, Finland, Italy and Greece, EUROMOD Working Paper Series, EM2/07
(2007)
University of Essex.
27. 27
Gruppo di lavoro sull’erosione fiscale: Relazione Finale, Roma, November 2011.
(2011)
Gruppo di lavoro sull’erosione fiscale: Relazione Finale, Roma, November 2011..
28. 28
Microsimulation Modelling for Policy Analysis. Challenges and Innovations
(2000)
Cambridge: Cambridge University Press.
29. 29
Medical care expenditure: A cross-national survey
(1977)
Journal of Human Resources 12:115–125.
30. 30
Developing a static microsimulation model for the analysis of housing taxation in Italy
(2011)
International Journal of Microsimulation 4:73–85.
31. 31
Redistributive effect of income taxation: Decomposing tax base and tax rates effects
(1990)
Bulletin of Economic Research 42:121–129.
32. 32
Economic Analysis of Tax Expenditures
(2011)
National Tax Journal 64:451–458.
33. 33
The Central Role of the Propensity Score in Observational Studies for Causal Effects
(1983)
Biometrika 70:41–55.
34. 34
Bias reduction using Mahalanobis-metric matching
(1980)
Biometrics 36:293–298.
35. 35
EUROMOD: the European Union tax-benefit microsimulation model
(2013)
International Journal of Microsimulation 6:4–26.
36. 36
Misure globali della progressività, dell’incidenza e della redistribuzione dell’imposta’, Statistiche Fiscali, Ministero dell’Economia e delle Finanze, Roma.
(2010)
Misure globali della progressività, dell’incidenza e della redistribuzione dell’imposta’, Statistiche Fiscali, Ministero dell’Economia e delle Finanze, Roma..
37. 37
Reforming Tax Expenditures in Italy: What, Why, and How? IMF Staff Working paper, WP/14/7
(2014)
Reforming Tax Expenditures in Italy: What, Why, and How? IMF Staff Working paper, WP/14/7.
38. 38
Redistribution, horizontal inequity and reranking: how to measure them properly
(2008)
Public Finance Review 36:563–587.
39. 39
The redistributive effect and progressivity of taxes revisited: An International Comparison across the European Union, GINI Discussion Papers 88
(2013)
Amsterdam Institute for Advanced Labour Studies.

## Article and author information

### Author details

1. #### Andrea Albarea

Department of Economics, Ca’ Foscari University, Italy
##### For correspondence
andrea.albarea@unive.it
2. #### Michele Bernasconi

Department of Economics, Ca’ Foscari University, Italy
##### For correspondence
bernasconi@unive.it
3. #### Cinzia Di Novi

Department of Economics, Ca’ Foscari University, Italy
##### For correspondence
cinzia.dinovi@unive.it
4. #### Anna Marenzi

Department of Economics, Ca’ Foscari University, Italy
##### For correspondence
anna.marenzi@unive.it
5. #### Dino Rizzi

Department of Economics, Ca’ Foscari University, Italy
##### For correspondence
rizzid@unive.it
6. #### Francesca Zantomio

Department of Economics, Ca’ Foscari University, Italy
##### For correspondence
francesca.zantomio@unive.it

### Acknowledgements

We thank Cesare Dosi, Devis Geron, Luciano Greco, Simone Pellegrino, Vincenzo Rebba, Tiziano Vecchiato, Michele Zanette, the editor and two anonymous referees for their useful comments. We are indebted to Fabrizia Lapecorella and Paolo Acciari (Ministry of Economy and Finance) for providing official tax returns data. Financial support from the Department of Economics at the Ca’ Foscari University of Venice is gratefully acknowledged. Data from the Survey of Income and Living Conditions, obtained through the Italian Office for National Statistics, have been used with permission.

### Publication history

1. Version of Record published: December 31, 2015 (version 1)