A new database on the contents and comparability of the income variables in EU- SILC: MetaSILC 2015

The MetaSILC 2015 database is a new addition to the toolbox of EU- SILC users and producers. It documents how individual income components are aggregated into the EU- SILC target variables. Even though general and country- specific descriptions of income target variables are available in the EU- SILC methodological guidelines and in the national quality reports, it is often not clear how exactly each of the national income components is classified and aggregated into a target variable in practice. On the basis of a survey among national statistical institutes, we compiled a database which maps the exact classification of income components onto the EU- SILC target variables. The focus of the database is on EU- SILC 2015, covering 26 EU- SILC countries. The database contains information on the composition of variables regarding total income before and after transfers; income from benefits, work and capital; social contributions and taxes; as well as on recent and planned changes to the composition of the variables.


Introduction
The EU Statistics on Income and Living Conditions (EU-SILC) is one of the principal datasets for studying income and living conditions in Europe. The data are widely used for studying poverty and inequality in the EU and serve as the data source for official poverty reduction targets. Furthermore, EU-SILC is often used for monitoring the social situation both within countries and across countries, and for carrying out both ex-ante and ex-post policy evaluations. It is also the main dataset for European countries covered by EUROMOD, the tax-benefit microsimulation model for the European Union. The quality and comparability of the income variables in EU-SILC are therefore of paramount importance. The data collection of EU-SILC follows a so-called ex-ante output harmonization model: target variables that countries must collect are specified, but the exact way in which these variables are collected is up to the participating countries. Furthermore, the fact that tax-benefit systems differ considerably across countries, makes it sometimes difficult to aggregate income components in a consistent way into the relatively generic target variables of EU-SILC. As a result, the collection and categorisation of specific income components varies considerably across countries.
Although general and country-specific descriptions of income target variables are available in the EU-SILC methodological guidelines and in the national quality reports, it is often not clear how exactly each of the national income components is classified and aggregated into a target variable. This implies that researchers sometimes do not realise the wide variety in types of income sources that some variables cover, potentially undermining the validity of the conclusions of their analysis. Also, this constitutes an important hurdle for researchers who want to use EU-SILC for tax-benefit microsimulations, given that it is essential to know which income sources are categorised under which variable in order to simulate tax liabilities and social security contributions, or to validate simulated benefits and allowances. Often, it can be a cumbersome process to identify the exact classification of each income component.
Therefore, in the context of Net-SILC 3[ 1 ], we created the MetaSILC 2015 database that documents how individual income components are aggregated into the EU-SILC target variables (Goedemé and Zardo Trindade, 2020a). MetaSILC 2015 allowed us to produce a report with a detailed analysis of compliance with the Eurostat guidelines and definitions, looking into inconsistent classifications of income components across countries and changes to the variables across time (Goedemé and Zardo Trindade, 2020b). Additionally, in a paper published by Eurostat we formulated recommendations about how to improve the quality and comparability of the income variables in EU-SILC (Zardo Trindade and Goedemé, 2020). In what follows we present MetaSILC 2015 in some more detail and discuss the main outputs as well as future updates.

The database
The focus of MetaSILC 2015 is on the 2015 wave of EU-SILC, and covers 26 EU-SILC countries. The database contains information on the composition of variables regarding total income before and after transfers; and the more disaggregated variables on income from benefits, work and capital; social contributions and taxes (see Table A.1). It contains additional information on the measurement of income in kind associated with a company car, imputed rent and the measurement of selfemployment income. It also documents changes to variables in the period 2010-2015 and prospective changes in subsequent years and contains more information on outlier detection and treatment. The database is freely available in excel to EU-SILC users and other interested parties (Goedemé and Zardo Trindade, 2020a).
The main source of the database consists of two questionnaires that were sent out to National Statistical Institutes (NSIs) between July 2016 and January 2017. The NSIs of Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, France, Germany, Greece, Hungary, Italy, Latvia, Luxembourg, Malta, the Netherlands, Poland, Republic of Serbia, Slovakia, Slovenia, Spain, Sweden and United Kingdom responded to both questionnaires, while the NSIs of Finland and Portugal provided partial information[ 2 ]. Responding NSIs were asked to list all income sources covered in EU-SILC, at the most detailed level, and how they were classified into the EU-SILC target variables. With regard to total income before and after transfers and income from benefits, they were asked to report if the equation used to compute all four variables (HY010, HY020, HY022 and HY023) was in accordance with the Eurostat guidelines (Eurostat, 2016). In the case they were not, they were asked to provide additional information on the deviations. With respect to income from benefits, income from work and from other sources, social contributions and taxes, respondents were asked to provide more details about the composition of each target variable. For each of the income components used to compute the target variables, we collected information on the official name (national language) and code in the national EU-SILC survey; the equivalent name in English; the target variable code and name; the question in the EU-SILC questionnaire, the source of the income information used; the level of aggregation when it was collected and information on whether it was collected gross and/or net.
Complementary information to reported income components was collected over the course of 2017, 2018 and 2019. In addition to the information from the survey among national statistical institutes, the accompanying report was supplemented with information from the national quality reports, the comparative quality reports, the national questionnaires and other sources. [ 3 ] NSIs were asked to answer follow-up questions for clarification and to review a preliminary version of the database and report. All comments received were included in the report and in the dataset when appropriate.
1. The Third Network for the Analysis of EU-SILC, funded by Eurostat, see https://ec.europa.eu/eurostat/cros/ content/third-eu-silc-network-income-and-living-conditions-net-silc3_en. 2. Ireland, Lithuania, Romania, Norway, Iceland, Switzerland did not take part even though they have data available for Eurostat (2016) 3. Some of the information was also cross-checked with information from the European System of Integrated Social Protection Statistics (ESSPROS), the Mutual Information System on Social Protection (MISSOC,) as well as the EUROMOD country reports.

The various outputs
Using the MetaSILC 2015 database, a report (Goedemé and Zardo Trindade, 2020b) and a working paper (Zardo Trindade and Goedemé, 2020) were compiled. In the report, for each income variable used to construct total disposable household income (see Table A.1), we discuss compliance with Eurostat guidelines, misclassifications and omitted income sources that could undermine crossnational comparability. The sections for each target variable contain a summary with main findings, the variable definition in the Eurostat guidelines, general results on cross-national comparability, and a section with detailed remarks by country. In the working paper, we bring together the main findings from the report, highlight prospective changes to EU-SILC in response to our report, and formulate some recommendations regarding how comparability can be improved in the future. Apart from updating and expanding the MetaSILC 2015 database, our main recommendations include the establishment of an expert panel which would help Eurostat to improve the description of the target variables and help NSIs to aggregate all income sources into the EU-SILC target variables in a way that is consistent across countries. Also, the ex-ante output harmonization model could be strengthened by providing concrete examples of a 'preferred way' to ask questions regarding income in the EU-SILC questionnaires, especially with regard to the level of detail with which questions are asked and the examples that are given as income sources to be included (eg, when asking about income from capital), in line with what is already done for the variables on material deprivation. Other recommendations include setting up a repository for research that documents the impact on the income variables of moving from survey data to register data, and creating more consistency, and transparency, regarding the way in which NSIs convert net incomes to gross incomes and the other way around.

Future updates
It is hard to find detailed information on the exact implementation of EU-SILC in each country, and to document the comparability of the income variables. In our opinion, the MetaSILC 2015 database, set out as a onetime survey among the data producers, proved to be very helpful for collecting detailed metadata on how specific income components were aggregated into the EU-SILC income target variables. Therefore, we believe it is a useful strategy to repeat and expand MetaSILC in the future, preferably by Eurostat, as it clearly provides essential information that is nowhere else available. Ideally, this would be done during the data production process, as at that point all required information is already available to NSIs. We are confident that for future updates, it should be possible to limit the required time investment on the part of NSIs. One can also think about ways to facilitate data users to share more information on the comparability of specific variables for specific research purposes.
A more ambitious expansion that could be considered, and would be very useful to many users, is to complete the MetaSILC dataset with an institutional description of each income component (especially benefits) covered by the target variables. This would allow individual users or ex-post harmonisers like Eurostat and the Luxembourg Income Study (LIS) to save considerable time when looking for institutional information before creating aggregates out of the detailed national components.

Funding
This work has been supported by the Third Network for the Analysis of EU-SILC (Net-SILC3), funded by Eurostat.