 # On Estimation and Other Problems of Statistical Inference in the Micro Simulation Approach

1. Anders Klevmarken 1. University of Gothenburg, Institute of Statistics, Sweden
Research article
Cite this article as: A. Klevmarken; 2022; On Estimation and Other Problems of Statistical Inference in the Micro Simulation Approach; International Journal of Microsimulation; 15(1); 107-110. doi: 10.34196/ijm.00254

The micro simulation approach to economic analysis is still in the beginning of its development. Although "numbers" are involved in the simulations much work is largely of a theoretical character one step away from empirical applications. This is so partly because of data shortage but also because there is a need to use the simulation-approach to learn about the properties of one’s theoretical constructs. The ultimate goal must, however, be to make an inference to the economy, whether on a macro or a micro level. To do this adequate micro data are needed as well as a basis for the inference.

The general principles of statistical inference apply to the micro simulation approach as well as to other research in econometrics. As a matter of fact, it is hard to find any useful alternative. This does not exclude, however, that there are methodological problems which are more or less specific to this approach. In the following I will first give a few comments on the analysis of micro data in general and then turn to some problems more specific to the micro simulation approach.

### Analysis of micro data, some common problems

Micro data, and in particular longitudinal micro data, certainly offer new possibilities to obtain a better understanding of micro and macro behaviour, but nothing is for free. The use of micro data makes it necessary to solve problems we tend to neglect at the macro level.

1. There is usually a large individual variability in micro data which show up in low R2:s. To explain this variability we will probably have to use models which involve more parameters than is typically the case at an aggregate level. For instance, an analysis of household consumption would not only involve household income and lagged consumption but also measures of household charac­teristics.

2. Partly because of the large range of variability micro relations are frequently non-linear which makes the statistical inference difficult.

3. Measurement errors become relatively important. Sometimes we will work with proxy or indicator variables which "suggest" models with latent structures, (c.f. Aigner and Goldberger (1977), Wold (1973; 1974; 1975)).

4. There are selectivity problems in micro data which may be difficult to handle. In panel data in particular self­-selectivity may demand a separate treatment. One promising approach is to incorporate the selection mechanism into the basic model and estimate both at the same time, (c.f. Heckman (1976), Maddala (1977)).

5. Although micro data are expected to be a rich source of information there will most certainly remain unmeasurable individual characteristics. In panel data these have sometimes been taken care of by a variance-components approach.

6. The relationships between cross-section, cohort and time series data deserve more attention. We do not only need to know how macro activities influence micro units and how micro units should be aggregated to macro. Because the in­creasing demand for personal integrity will limit our possibilities to obtain micro data, and in particular panel data, we will often also have to investigate if cross-sectional data could be used for an inference about longi­tudinal behaviour.

We already have statistical methods which can be used to treat some of these problems, but the new emphasis on micro data will have to "generate" new methods. To indicate the nature of these methods I would like to give a few key words:

1. Although macro theory usually has a micro theoretical foundation it is not always good enough for empirical studies of micro be­haviour. Our methods will thus have to be exploratory.

2. Because the sample size will be rela­tively large it is possible to emphasize consistency rather than efficiency. In traditional macro econometrics consistency is a completely uninteresting property because of the short time-series usually available. Frequently, however, we only know the asymptotic properties of our estimators. For this reason, I agree with those who claim that one should not give much credence to confidence intervals computed in macro econometric models. On the other hand, from this does not follow that statistical inference is useless.

3. One should also emphasize robustness of methods. There is usually a conflict between our desire to have robust and efficient methods. With large samples of micro data, however, we will not have to be overly concerned about the loss in efficiency.

4. In traditional econometrics we concentrate on mean relationships, while with micro data the distributional aspects will be more emphasized. For this purpose, we will probably have to develop better statistical methods than those available now.

5. There will be a need for methods which require neither linearity nor assumptions of particular non-linear forms, but rather admit data to determine the functional form of the relationships estimated.

### Problems in the micro simulation approach

Next I would like to comment on a few problems which are more specific to the micro simulation method. The size of the models contributes to many of the practical difficulties. It is important to know the properties of an estimated model and the predictions produced by this model.

It has been suggested that these properties could be explored by tracing out "reaction surfaces" by alternative assumptions about model structure and parameter values (sensitivity analysis). This is a good idea for small or medium sized models or for exploring particular features but cannot be used to evaluate a large micro simulation model. The sources of uncertainty in the predictions are the same as in most other econometric predictions. There will be genuine residual variation as well as measurement errors. Parameters will be unknown but estimated. Exogenous variables are not known but predicted. There will be specification errors, etc. The multiple of these errors cannot be explored in "reaction surfaces" because it would be unmanageable to analyze the large amount of computer printout required. With these large models it is not feasible to simulate all possible implications of a model and discover unrealistic features.

Also, such an approach would not give the probability of the occurrence of a simulated event. For these reasons it is very important that each detail (assumption) in the model be tested by statistical methods. It is also important to test the model carefully to balance what I would like to call the "size law", namely that the vested interest in our own model is proportional to its size.

Large size models also make simulations expensive. Methods have to be found which quickly trace out the distributions for strategic variables. Although the simulation methods will depend on the model structure, there are general, efficient Monte Carlo methods and there are also powerful computer languages for simulations like for instance SIMULA. Experts on numerical methods and computer simulations could undoubtedly contribute to a more efficient use of the computer.

Another major problem in micro simulation studies is the lack of data. A typical feature of some micro analytic studies is that the objective function which is maximized (or minimized) to obtain estimates of the micro parameters is formulated in macro variables because micro data are not available. For instance, with respect to the micro parameters one might attempt to minimize some quadratic function of the residuals between observed and predicted GNP, consumption expenditures, investment expenditures, rate of un­employment, rate of increase in consumer prices etc. This procedure might easily lead into identification problems. To illustrate by a simple example, if we only know the sum of two variables each of which are linearly related to two other variables, it is not possible to identify the two intercepts. In a more complex model, it might be difficult to see if the model is identified or not. If not, the search for a maximum (minimum) may go on forever. Even if the model is formally identified there may be cases analogous to multicollinearity in ordinary linear models, i.e. the surface of the objective function in the neighborhood of the extremum is flat.

It might then be possible to change some parameter values with but a very small change in the value of the objective function.

Gunnar Eliasson in his paper "How does inflation1 affect growth - Experiments on the Swedish Model" presented a slightly different data problem. He wanted to investigate if the "over shooting" response of his model to an external shock is a realistic feature. The problem is that so far, we have not observed such an "over shooting" in the economy which makes it difficult to put this property of the model to a direct test.

First, we would like to know if this particular property is the result of the general model structure or the particular parameter estimates obtained. Suppose we can write the model

$\mathrm{M}\mathrm{l}:\mathrm{F}\left(\mathrm{y},\theta \right)=0;\phantom{\rule{.5em}{0ex}}\theta \in \mathrm{S};$

where y is a vector of variables and $θ$ a vector of unknown parameters which belong to the set S. These relations define our maintained hypothesis. If F has the over shooting property for every $θ$ in S, no sample would be able to reject this property, i.e. no test is possible. In this case there is no support for the property, and one would like to consider a more general model which would include Ml.

Even if there are $θ$:s in S which do not imply "over shooting" one might think of cases when this property is "almost" untestable. Suppose our data are generated by another (stochastic) model M2 which does not have the "over shooting" property and that the distribution of y is such that we with a probability close to 1 will obtain estimates of $θ$ in Ml which give over shooting, then the probability to reject this property will be close to 0. To obtain some protection against this possibility one would like to investigate if theoretically plausible models different from Ml with about the same fit would also give the over shooting property. If they do, some support for overshooting is obtained.

In general, I can see no other way to solve the testing problem than to test each part of the model against micro data by statistical methods. If micro data are unavailable, we will most certainly encounter difficulties in discriminating between model structures. Suppose our data are generated by Ml but there are many parameter vectors θ which give almost the same fit to the observed (macro) data and some give "over shooting" while others do not. This result neither give support to the over shooting property, nor rejects it. Equivalently, if one estimate of θ implies overshooting but it is possible to find another θ which gives almost the same fit but no overshooting, then there is no support.

Eliasson discovered the over shooting property of his model by deterministic simulation. But assigning the value zero to the random errors does not always give unbiased predictions, c.f. the case of log-normally distributed errors.

Depending on the structure of the model it might also generate random shocks which would counteract the over shooting. If the random errors implicit in the behavioral relations are taken into account by stochastic simulations one might thus obtain different results vis à vis over shooting.

Finally, I would like to comment on what is called "the dynamic approach" to estimation. Let us take the following simple example:

Minimization of

$\sum _{1}^{T}{\left({y}_{t}-\stackrel{^}{y}|{y}_{t-1}\right)}^{2}=\sum _{1}^{T}{\left({y}_{t}-\alpha -\beta {y}_{t-1}\right)}^{2}$

gives the Ordinary Least Squares estimates which are maximum likelihood estimates and they are consistent, asymptotically unbiased and asymptotically efficient. In the dynamic approach the following residual sum of squares is minimized

$\sum _{1}^{T}{\left({y}_{t}-\stackrel{^}{y}|{\stackrel{^}{y}}_{t-1}\right)}^{2}=\sum _{1}^{T}{\left({y}_{t}-\alpha \sum _{i=1}^{t}{\beta }^{i-2}-{\beta }^{t-1}{y}_{1}\right)}^{2};$

where yl is the first y-observation. It remains to be shown that the estimates obtained have any desirable properties.

If the OLS estimates are used for "dynamic predictions", i.e. only the first y-observation is used to start the forecasting, and if all $ϵt$ are set equal to zero, one would probably obtain a sequence of y-predictions which deviates from the observed series in a seemingly non-random way. Is this result an indication of a bad model? Not necessarily! In a mean-square sense the prediction was the best possible given that we only knew the first y-value. The random number generator which we call the economy will generate a y-series with all $ϵ$ set equal to zero only with a probability close to zero. The probability that our random number generator would be able to generate the same series of $ϵ$ values as generated by the economy is also almost zero. To simulate only one future y-path thus is almost useless. What is of interest is to simulate the whole distri­bution of y-paths. Our interest must then be concentrated on building models which yield distributions with small variances.

## Article and author information

### Author details

1. #### Anders Klevmarken

University of Gothenburg, Institute of Statistics, Gothenburg, Sweden
##### For correspondence
anders@klevmarken.nu
##### Competing interests
No competing interests reported

### Acknowledgements

This article has been previously published as Part II in G. Eliasson (ed.) A Micro-to-Macro Model of the Swedish Economy. Papers on the Swedish Model from the Symposium on Micro Simulation Methods in Stockholm Sept. 19-22, 1977. IUI conference reports 1978:1, ISBN 91-7204-086-6, Stockholm.

### Publication history

1. Version of Record published: April 30, 2022 (version 1)

© 2022, Anders Klevmarken