1. Health
Download icon

Explaining the Size and Nature of Response in a Survey on Health Status and Economic Standard

  1. Fredrik Johansson-Tormod  Is a corresponding author
  2. Anders Klevmarken  Is a corresponding author
  1. National Institute of Economic Research, Sweden
  2. Department of Economics, Sweden
Research article
Cite this article as: F. Johansson-Tormod, A. Klevmarken; 2022; Explaining the Size and Nature of Response in a Survey on Health Status and Economic Standard; International Journal of Microsimulation; 15(1); 63-77. doi: 10.34196/ijm.00250

Abstract

Using rich register data to analyze response behavior in a survey on health and economic standard, a model to explain contact and participation probabilities is estimated. A main result is that both probabilities are lower among respondents who are less well off, out of the labor market, on benefits and among immigrants. We also find a significant time-cost effect on participation. Previous findings that the probability of contact is low in urban areas and among singles are confirmed.

1. Introduction and motivation

Nonresponse is probably the most severe problem in survey research. Today it is not unusual to find surveys with a response rate around or even below 50 percent. It is obvious that this high nonresponse will not only decrease sample size and correspondingly increase variances of estimates from these data, but the results might also become biased if response is selective. There is a large literature on methods to compensate for nonresponse ranging from calibration methods including standard post stratification, imputations, and to more sophisticated model-based methods. The key to a successful compensation is to understand the causes of nonresponse. This is also important because of its relevance to survey design, where resources have to be allocated between the possibly conflicting goals of increasing the precision of estimates and reducing nonresponse biases.

There is an increasing literature on the causes of nonresponse with more or less successful attempts to build models explaining response behavior. These attempts have been constrained by the usually very limited information available in the sampling frames. Researchers have then resorted to comparisons between the responding part of the sample and results from smaller but intensive studies of nonresponding sample members using the assumption that those who could be converted tell us something about those who belong to the hard core of nonresponders. They have also compared to larger surveys with more reliable measures and population statistics. Only rarely it has been possible to match individual survey records to reliable register data for the same individuals. The situation is somewhat different in panel surveys, because in a panel one can use information given by the respondents in a previous wave of data collection to explain response behavior in a more recent wave, see for instance, Brose and Klevmarken (1993), Lepkowski and Couper (2002) and Nicoletti and Peracchi (2005). The results from these studies are interesting and important, but they do not necessarily carry over to a cross-sectional survey or the first wave of a panel survey.

It is well-known that response is usually much lower in the first wave of a panel survey than in successive waves and that attrition thus takes place in an already selected sample. People who are notoriously difficult to trace and convince have already been eliminated from the sample in the first wave, see Laurie et al. (1999).1 Lepkowski and Couper (2002) argue that the response process in the first wave is fundamentally different from that of subsequent waves. This is both because of self-selection of the sample units and because of the extra information and organizational experience gained by the survey agencies at each successive wave. Fitzgerald et al. (1998) reported the same experience from the Panel Study of Income Dynamics (PSID). The attrition between the first and the second wave was 12 percent, for the 20 next waves attrition was on average between 2.5 and 3.0 percent.

Another problem with using data from a previous survey wave to explain response behavior is that survey data always have measurement errors and other types of nonsampling errors. Depending on the variables used this might become a problem when estimating a response model.

In this study we have the advantage of having exceptionally good sample frame data that can be used to explain response behavior. The sample frame was the 2001 wave of the longitudinal register-based data set LINDA of Statistics Sweden. LINDA is a random sample including a few hundred thousand individuals from the Swedish population. Register data include population censuses, schooling, income, wealth and tax data, etc. The sources of these register data are various administrative and statistical registers of Statistics Sweden such as the register of educational achievements, the income register, the wage rate register and registers from the Swedish social security system and labor market authority.

From LINDA we selected by simple random sampling a smaller sample of 1,430 individuals 50 - 84 years old to which CATI interviews were administered by Statistics Sweden.2 These telephone interviews included sequences of questions taken from the U.S. Health and Retirement Study (HRS) survey and the European Survey of Health, Ageing and Retirement in Europe (SHARE) and adapted to Swedish circumstances.3,4 There were thus questions about health, labor force participation, wages, incomes and wealth. Most of these questions were about “facts” not about feelings, perceptions and attitudes. The average interviewing time was less than 30 minutes. The field work was done in the period April 3–May 11 2003 with nonresponse follow up June 2–June 22. In this period most Swedes completed their self assessment for income taxation, so the information needed to answer questions about incomes, assets and taxes should have been timely.

Prior to the field work the questionnaire was tested in the questionnaire laboratory of Statistics Sweden and in a small pretest. Interviewers were experienced telephone interviewer. They got a four hours long training session focusing specifically on our survey and they were afterwards asked to train on the questionnaire before they were allowed to work in the field. The nonresponse follow up was done by a few of the most experienced interviewers.

The contribution of this paper is thus an analysis of the response behavior in a cross-sectional survey with standard questions about health, incomes, taxes and assets using unusually rich sampling frame data from the registers of Statistics Sweden and a model which simultaneously explains contact and cooperation.

2. Reasons for Nonresponse

2.1. A literature review

Singer (2006) gives a brief but interesting review of general trends in research about nonresponse in household surveys. Groves and Couper (1998) summarized and evaluated the literature on unit nonresponse in household cross-sectional surveys prior to the mid 1990s. They did that by separately analysing contact and cooperation. Contactability is primarily a function of physical barriers to accessing the respondents, the households at home pattern, interview mode and the contact schedule of the interviewers. The explanation of cooperation is more complex, it involves the interaction of survey design, survey topics covered, the organization behind the survey and its perceived motives to carry out the survey, interviewer behaviour, and demographic, socioeconomic and psychological influences on the respondent.

The design properties are known, data about the respondents might be obtainable from the sampling frame, while it is even more difficult to get detailed information about the interaction between interviewer and respondent. Process data from the field work sometimes give information about the number of contact attempts, reasons for noncontact/refusal, and perhaps also data on the interviewer’s experience. In some surveys the interviewers are asked to summarize their experiences from each interview, but this is not common practice, and usually the important interaction between interviewer and respondent becomes a black box.

In this paper we report the results from a study of the outcome of one particular survey which was fielded by one survey organization and used only one mode of data collection. Unfortunately, we have no data about the interviewers or their interaction with the respondents. We thus focus on the characteristics of the respondents and how they determine response. For the same reason the brief literature review to follow also has the same focus.

Groves and Couper (1998) found that contactability was lower in urban than in rural areas, a finding also replicated in many other studies. It is not clear why this is so. One explanation is that there are many multi-family houses in urban areas with access limited by entry barriers of various kinds. Another explanation is that people spend less time at home in urban areas. Commuting takes time and there is a greater supply of out-of-home events. It is also possible that crime rates are higher in urban areas and that the trust in other people is weaker. One is thus more reluctant to let an interviewer into one’s home. Still another potential explanation is that the share of singles and small families is higher in urban areas than in rural. Households with more adults and with children are easier to contact because the probability that someone will be at home is higher. Elderly adults also tend to be at home more frequently than young adults.

Groves and Couper (1998) note that previous studies have shown that cooperation rates are lower among lower socio-economic groups, among racial/ethnic minority groups and among the elderly. However, they find that once contacted these poorer groups appear no different than other groups when they control for social environment as measured by urbanicity, population density, crime rate and population share under 20 years old. One might however note that among these variables only population density comes out significant in their own study and the only indicator of poverty is the house value, which is likely to pick up differences in the degree of urbanicity.5

Another conclusion from the Groves and Couper (1998) study is that young and old respondents have a higher cooperation rate than middle aged. The authors speculate that there are different forces driving young and old. Young persons may have more experience of “standardized information seeking” from schools and jobs and be more curious about such efforts than elderly respondents, while elderly may “maintain norms of civic duty regarding requests from government” and academia (p. 150). These results were obtained after controlling for if the household was a single person household. It is well-known that it is more difficult to gain the cooperation of persons who live alone than that of persons who live in multi-person households. Many elderly are singles, and according to Groves and Couper (1998) old age does not decrease the probability of cooperation once one has controlled for household size. The smaller co-operation rate of single-adult households is interpreted as a result of less social integration of these households. The authors also conclude that once socioeconomic status is controlled (primarily measures by house value) the cooperation rates of minority groups are much closer to those of the majority group.

Socio-economic status is an elusive concept and can be operationalized in many different ways using information on, for instance, income, wealth, education, occupation, etc. It thus comes as no surprise that the literature on nonresponse shows a diversity of results.6 We just note a result from a previous Swedish study, Lindström (1983), which found that respondents tended to have higher incomes and less social assistance benefits than nonrespondents.

In their literature review Särndal and Lundström (2005) concluded that the response rate is usually expected to be lower among metropolitan residents, single persons, members of childless households, older persons, divorced or widowed persons, persons with low educational attainment, and self-employed persons.

One conclusion that came out of the Groves and Couper (1998) study was that the probabilities of contact and cooperation had distinctly different explanations. Lynn et al. (2002) also made a difference between the difficulty of contacting sample members and the difficulty of obtaining cooperation once contact is made. In a descriptive analysis based on various health and socio-economic surveys from the UK they found that the probability of participation was not dependent on the number of calls until contact. They also tested the hypothesis that households that were hard to contact have other characteristics than households who were easy to contact. Their main results were that respondents who were hard to contact were more likely to be smokers and drinkers, to have lower blood pressure, be less likely to have a severe illness, be younger, more likely to be employed and less likely to be white.

While many of the studies of nonresponse in cross-sectional surveys (first waves of panel surveys) are constrained by the usually limited information available about all sample members from the sampling frames or other sources, studies of attrition in panel surveys offer richer model specifications with more explanatory variables. (The smaller number of studies of cultural differences in nonresponse to cross-sectional surveys compared to the much larger number of studies of attrition in panel surveys in Tables 4.1 and 4.2 in Johnson et al. (2002) is suggestive.) For this reason it is of interest also to review some of the results from studies of attrition in panel surveys, even if these results do not necessarily immediately carry over to cross-sectional surveys.

Previous empirical research has suggested that attrition from a panel is more likely for individuals who are on welfare, unmarried, older and nonwhite. Also, attritors have less education, work fewer hours, have lower labor income, and are more likely to rent their homes than the average respondent (Fitzgerald et al., 1998). Zabel (1998) concluded that attritors were more likely to live in urban areas, be nonwhite and unmarried, have fewer children and rent their homes. Campanelli et al. (1997) analyzed attrition both on a household level and on an individual level – their main results are in line with the ones above, i.e., respondents who are economically less well off are less likely to be included in the survey.

In decomposing attrition into noncontact and refusal Campanelli et al. (1997) found, in line with previous research, that these two groups have different socio-economic characteristics. Nonwhites were harder to contact than whites, as was unmarried respondents compared to married. It was harder to establish contact with young respondents than with old, but once contacted they were generally cooperative. For elderly it is the other way around. Households with no children were more likely to refuse, as were households with many working members, and households consisting of couples.

The sample of The American Time Use Survey (ATUS) was drawn from the eighth wave of the Current Population Survey (CPS) and nonresponse in its first wave can for this reason be seen as attrition rather than initial nonresponse. In their study of response behaviour in the first wave of ATUS Abraham et al. (2006) tested the hypothesis that “busy” people were difficult to contact and also less willing to cooperate. Their multivariate analysis gave some but not very much support to this hypothesis. People who worked long hours had a somewhat higher probability of noncontact, but there was no significant difference in probability of refusal once contacted. Married people with a working spouse were not more difficult to contact than others, and if the spouse worked long hours the probability to cooperate was even higher than average. The presence of children had no significant effect on contact and cooperation for married sample members, but for unmarried, children aged 6 - 17 increased the probability of contact. Other results were very much in line with those of previous studies: renters and sample members living in big cities had relatively high noncontact rates, while they did not differ from average in cooperation. Households with low or missing incomes were both difficult to contact and unwilling to cooperate. The more schooling the higher contact and cooperation rates.

Finally, Nicoletti and Peracchi (2005) modelled the response behaviour using a bivariate probit model that distinguished between contact and cooperation. They used data from the European Community Household Panel (ECHP). Most of their results are in line with what is expected from previous research. They found that the number of children and home ownership increased the probability of contact, while the number of adults in the household and the equivalised household income (household income divided by the number of household members) both were insignificantly different from zero. They also found that being out of the labor market increased the probability of cooperation whereas being single decreased the same probability. There was no significant effect of the age or education of the respondent.

2.2. Our survey

In the remainder of this section, we will discuss contact and cooperation difficulties arising in our survey. Statistics Sweden had mailing addresses to everyone – the address on which the respondents had registered with the tax authorities – and through computerized telephone directories they could get telephone numbers to most of the respondents. However, it is possible to be registered on one address and live somewhere else, for instance old people might have kept their old home while they in fact stay for a longer or shorter period in a nursing home. In this case they might not even have a private telephone. Many Swedes have secondary homes and when they are retired, they sometimes live there for longer or shorter periods, not only in the summer. Cell phones have become very common and should in principles increase the chances to reach people, but the telephone directories have not always had full coverage of all cell phone numbers. Some people opt in favor of only having a cell phone and no conventional phone, but this is not as common among elderly people. According to the surveys of the Swedish National Post and Telecom Agency 95 percent of the Swedish population 16 - 75 has a regular telephone and 3 percent has no telephone. About 90 percent has a mobile cell telephone.7 In our survey contacting people meant to get the right telephone number and then get them on the phone. As usual many attempts were made at varying times of the day and at different days of the week. At the end of the fieldwork telephone numbers were still missing for 70 respondents and one respondent had a protected number. This is about what one could have expected given the telephone coverage in Sweden.

After a contact has been established it is very much dependent on the interviewer if it is successful or not. Unfortunately, our survey data do not have any information about the interviewers, so it is impossible to estimate any interviewer effects on response. All interviewing was done from the Örebro office of Statistics Sweden and interviewers thus called to all areas of the country. The area in which the respondent lives is thus not confounded with interviewer. Because the CATI system allocated respondents to the interviewers without knowing “the track record of the interviewer” it is a plausible hypothesis that any interviewer effects are independent of effects depending on the characteristics of the respondents. There is though one exception: The more difficult cases, which remained after the main field period had ended, were in the nonresponse follow up turned over to the most skilled interviewers.

What is possible to do in this study is to model response as a function of the characteristics of the respondents. In explaining the probability for a contact, we need variables that capture entry barriers, that some people are more mobile than others and that very old people due to old age and sickness, for instance dementia, are difficult to contact. The decision about participating in an interview once contacted depends on the time cost of the respondent and the presence of any competing activities. It also depends on the respondent’s understanding for and interest in the issues brought up in the interview and the general purpose of the survey. There is also the concern about invasion of privacy. Even if people are interested in contributing to a health survey many respondents are reluctant to reveal information about wages, incomes and in particular wealth.

3. Explanatory variables and descriptive analysis of response frequencies

In the end of May 2003, the response rate was 56.5 percent and the share of refusals 19.6 percent. After the conversion attempts in June total response rate increased to 61.6 percent and the share of not found was reduced by 2.9 percentage units and the share of refusals by 3.0 percentage units. In the end 22.6 percent of the sample members refused and 15.8 percent could not be found. The latter figure, however, includes 12 individuals who were classified as over coverage and should have been eliminated. If this is done the response rate increases to 62.1 percent.8

Before proceeding to a multivariate analysis we start by motivating our choice of explanatory variables and analyzing a number of tables showing the association between response and the selected variables. With each table there is a chi-square statistic for a test of independence and the corresponding P-value. A significant test suggests that an association is stronger than one could expect by chance. These tables, however, only display bivariate relations and any association or lack of association could well change in a multivariate analysis. For instance, as shown in Table 1 there is virtually no difference in the response behavior of males and females, but we still prefer to include this variable in our multivariate analysis, because gender might be confounded with other variables and while it is often available in sampling frames it is of interest to find out if there is any partial effect. We expect to find that females have a higher probability of contact than males, because they are less mobile and more frequently at home. Their probability of cooperation might, however, be lower. Even if the time-cost of working females usually is somewhat lower than for males, because females have lower wages, females tend to be more sensitive to the issues of invasion of privacy and this effect might dominate.

Table 1
Response rates by gender.
StatusMaleFemaleTotal
N%N%N%
Responded41261.246962.088161.6
Refusals14922.217423.032322.6
Not reached11216.611415.022615.8
Sample size673100.0757100.01430100.0
Chi2(2)=0.709(0.702)

Even if all studies have not found a clear relation between the age of the respondent and frequency of contact and cooperation, we expect to find one. Young people are more mobile than elderly and thus more difficult to contact. In our case the youngest cohorts are excluded from the study, but we expect to find that those who are in the peak of their career are more difficult to contact than those who are retired. However, many retirees in their sixties and early seventies might also be mobile, going on vacation trips, spending time at their vacation houses, visiting children, etc. We might also find that some of the oldest old are relatively difficult to contact because of the increased prevalence of health problems in these age groups, but as already mentioned our survey does not include people older than 84.

The relation between the probability to cooperate and age is more difficult to anticipate. Before retirement time cost is at its peak for many respondents, and for this reason one might expect to find a higher probability of cooperation among the elderly. However, elderly might be more sensitive to the issue of invasion of privacy than younger respondent, they might also find it tiring to spend half an hour in telephone and be more reluctant to bring out any documentation needed to give good answers.

The estimated age effects will also depend on other variables we choose to include. Some of them might pick up what otherwise would be interpreted as an age effect. It is difficult a priori to assume any particular functional form for the relation between response and age. For this reason, we have chosen to work with age group effects which will permit data to determine the shape of the relationship.

Table 2 shows that response rates are smallest among the youngest (50-55) and the oldest. But this result hides reversed age trends among refusals and not reached. Refusals increase with age while not found seems to be a bigger problem among people below the age of 70.

Table 2
Distribution of response status by age.
StatusAge group
50-5556-6061-6566-7071-7576-80>80All
Responded211
59.60
198
63.46
133
65.84
108
61.71
97
62.18
90
59.60
44
55.00
881
61.6
Refusal77
21.75
48
15.38
46
22.77
39
22.29
42
26.92
45
29.80
26
32.50
323
22.6
Not reached66
18.64
66
21.15
23
11.39
28
16.00
17
10.90
16
10.60
10.0
12.50
226
15.8
All354312202175156151801430
  1. Chi2(12)=32.686 (0.001).

  2. Note: Column percent in italics.

Schooling is expected to influence response behavior directly as well as indirectly as an indicator of other variables. A higher education might increase the understanding for the research issues involved in our project and make the respondent more sympathetic towards research. Schooling is also an indicator of labor market career and pay and thus of availability and time-cost. Also, after retirement respondents with long schooling are expected to be relatively more mobile, if for no other reason because they tend to have higher incomes. We thus expect schooling to have a negative influence on contact, while there are countervailing factors determining cooperation. There is no reason to believe that the effects of schooling are linear of take any particular nonlinear functional form, so we will work with three discrete dummy variables: Compulsory schooling, high school and university.

Register data on schooling are not ideal. They originate from census data, examination registers for all levels of education and from surveys to immigrants. The information on education obtained abroad is incomplete. The surveys to immigrants only provide part of the information. In all, data on schooling are missing for about 1.5 percent of the population covered. The major problem with these data is, however, that they only cover people in the age bracket 16 - 74. There are no register data for those who are 75 and above. In our sample we have missing data on the register schooling variable for 265 respondents. For these we have interview data for 150 respondents. Most of them, 73 percent, fall into the group with lowest education. In our multivariate analysis we have chosen to use the survey information for the 150 respondents and code the remaining 115 as having missing schooling data. We believe that most of these 115 respondents have at most basic schooling. The share of immigrants is twice that of the whole sample, 8 percent compared to 4.

Table 3 shows that there is no strong association between schooling and response behavior. (The high chi2-value is generated by the missing schooling category). The response rate is a little higher among respondents with university education, and respondents with only compulsory schooling are harder to convince than respondents with more schooling. None in the group with missing schooling data responded, most of them refused to participate. This result strengthens our belief that most of them are immigrants with limited knowledge of Swedish.

Table 3
Distribution of response status by education.
StatusCompulsory schoolingHigh school and at most 2 years of universityMore than 2 years of universityMissing valueAll
Responded339
66.3
355
66.5
187
69.3
0
0.0
881
61.6
Refusal98
19.2
95
17.8
47
17.4
83
72.2
323
22.6
Not reached74
14.5
84
15.7
36
13.3
32
27.8
226
15.8
All5115342701151430
  1. Chi2(6)=225.92 (0.000).

  2. Note 1: Column percent in italics. Note 2: Register data on schooling are missing for respondents older than 75 years. Survey information was used for 150 respondents.

The variables household size and if married are expected to have a positive effect on the probability of contact, while any effect on cooperation is less obvious. Table 4 confirms this for marital status. Unmarried persons are less likely to respond than married. Most of this difference comes from a higher frequency of not reached for unmarried, while the difference in refusal rate is small.9

Table 4
Distribution of response status by marital status.
StatusMarriedUnmarriedAll
Responded561
66.63
320
54.42
881
61.6
Refusal194
23.04
129
21.94
323
22.6
Not reached87
10.33
139
23.64
226
15.8
All8425881430
  1. Chi2(2)=47.349 (0.000).

  2. Note: Column percent in italics.

Table 5 gives similar information for household size. The bigger household the lower is the frequency of not reached, while there is no major difference in refusal frequency.

Table 5
Distribution of response status by household size
Status123 or moreAll
Responded284
55.47
430
64.76
167
65.75
881
61.6
Refusal113
22.07
154
19.65
56
22.05
323
22.6
Not reached115
22.46
80
14.45
31
12.20
226
15.8
All5126642541430
  1. Chi2(14)=38.25 (0.000).

  2. Note: Column percent in italics.

The most frequently used time-cost measure is the hourly wage rate. This variable is unfortunately not included in our register data, but we have a measure of a monthly wage rate. Because many sample members are retired, they do not have any wage rate. One approach to obtain a time-cost measure for those who do not work, is to estimate a wage rate for them had they worked. This can be achieved if a labor supply and an earnings function are estimated jointly with the contact (cooperation) function; for an application to panel data see Brose and Klevmarken (1993). In our case it is probably not very meaningful to estimate such wage rates for people who have retired, some many years ago. Very few Swedes work after the age of 65. Instead, we have chosen to use the wage rate measure only for those who have a wage. It is expected to have a negative effect on cooperation. If there is any contact effect it might also be negative. People with high wage rates tend to work long hours and might be difficult to reach. In addition, we introduce a dummy variable that takes the value one if the respondent has no wage income. We expect that it will have a negative effect on contact, because those who have no job tend on average to be more mobile, while we have no prediction as to its effect on cooperation.

There is no clear association between the monthly wage rate and response behavior, see Table 6. The refusal rate increases a little with increasing wage rate, while there is no trend in the share of not reached. We thus only find a very weak indication of a time-cost effect on cooperation.

Table 6
Distribution of response status by the monthly wage rate (SEK)
StatusI≤10,00010,000<I≤20,000I>20,000All
Responded155
69.8
167
69.9
202
65.6
524
68.1
Refusal33
14.9
42
17.6
61
19.8
136
17.7
Not reached34
15.3
30
12.6
45
14.6
109
14.2
All222239308769
  1. Chi2(4)=2.964(0.564).

  2. Note: Column percent in italics.

Table 7, however, suggests that having a paid job influences response behaviour. Those who do not work are both more difficult to contact and to convince to cooperate. We note though that the employment indicator probably is confounded with age, schooling and other variables.

Table 7
Distribution of response status by employment
StatusIf wage rateNo wage rateAll
Responded524
68.14
357
54.01
881
61.6
Refusal136
17.69
187
28.29
323
22.6
Not reached109
14.17
117
17.70
226
15.8
All7696611430
  1. Chi2(2)=32.018 (0.000).

  2. Note: Column percent in italics.

Previous results suggest that those who are relatively less well off are more difficult both to contact and to get to cooperate. For this reason, we have included in our analysis disposable income per capita and the indicators: if on welfare, if unemployed and if immigrant. Table 8 confirms the finding that respondents in low-income families have a much lower response rate. They are both harder to contact and convince to give an interview. There are no large differences between people with average incomes and those who have high incomes.

Table 8
Distribution of response status by disposable income (Y).
StatusY≤90,00090,000<Y≤120,000120,000<Y≤180,000180,000<YAll
Responded136
52.1
181
52.5
305
69.9
259
66.8
881
61.6
Refusal72
27.6
99
28.7
74
17.0
78
20.1
323
22.6
Not reached53
20.3
65
18.8
57
13.1
51
13.1
226
15.8
All2613454363881430
  1. Chi2(6)=39.99 (0.000).

  2. Note 1: Column percent in italics. Note 2: Disposable income in SEK per capita.

Although there are few respondents that have received any welfare, Table 9 suggests that those who are on welfare are difficult to find and also difficult to recruit for an interview.

Table 9
Distribution of response status if on welfare or not.
StatusNo welfareWelfareAll
Responded875
62.54
6
19.35
881
61.6
Refusal312
22.30
11
35.48
323
22.6
Not reached212
15.15
14
45.16
226
15.8
All842311430
  1. Chi2(2)=28.795 (0.000).

  2. Note: Column percent in italics.

If a respondent had been unemployed in 2002 had no significant effect on response, while immigrants are both difficult to contact and to recruit to an interview, see Table 10.

Table 10
Distribution of response status by nationality
StatusSwedishNonswedishAll
Responded860
62.82
21
34.43
881
61.6
Refusal306
22.35
17
27.87
323
22.6
Not reached203
14.83
23
37.70
226
15.8
All1369611430
  1. Chi2(2)=27.766 (0.000).

  2. Note: Column percent in italics.

Previous studies have also found that it is more difficult to contact respondents in urban areas than in rural. We thus include indicators of the degree of urbanization. Living in one of the three major metropolitan areas is the standard of comparison in our multivariate analysis below, while we have dummies for other urban areas and rural areas. We also use a dummy indicator if the respondent has a secondary home and expect that the probability of contact will be relatively less for this group.

Table 11 confirms that it is more difficult to reach people in urban areas than in rural, while interestingly the refusal rate is higher in the rural areas. The leisure home indicator does not give the expected result (not shown). There is no difference in contact rate between those who have and do not have a leisure home, while people with a leisure home more frequently cooperate.

Table 11
Distribution of response status by urbanization
StatusMajor cityOther urbanRuralAll
Responded282
59.87
474
62.04
125
64.10
881
61.6
Refusal94
19.96
182
23.82
47
24.10
323
22.6
Not reached95
20.17
108
14.14
23
11.80
226
15.8
All4717641951430
  1. Chi2(4)=11.61 (0.020).

  2. Note: Column percent in italics.

We use three indicators of the health status of the respondent: if the respondent got any sickness benefits in the survey year, if the respondent had stayed for at least one night in hospital during the year, and if the respondent had any psychiatric diagnosis in 1997 - 2002.10 None of these indicators necessarily show that the respondent is sick or in hospital at the time of the interview, but given that we know that the respondent has been sick, the probability of contact should be relatively high because the respondent is at home and – if not too ill – able to answer the phone. The probability of cooperation might however be low. Similarly, if the respondent was taken into a hospital, the probability of contact is likely to be low. Respondents with a psychiatric diagnosis might have both a reduced contact probability and a reduced cooperation probability.

Looking at raw data we found no significant difference in contact and cooperation frequencies between those who had collected sickness benefits and those who had not. Respondents who had stayed in hospital in the survey year had a lower contact frequency, while there was almost no difference in cooperation frequency. The P-value of the chi2-test was only 0.06.

Register data from the Centre for Epidemiology at the National Board of Health and Welfare include historical information about past psychiatric diagnosis for the period 1984 - 2002. Using all this historical information would increase the share with a psychiatric diagnosis from 1.3 percent in 2002 to approximately 8 percent. However, it is not obvious that all years contribute useful information. Some of those who got a diagnosis in, for instance, 1984 might have recovered by 2002. For this reason, we have only used data for a shorter period, 1997 - 2002.

A total of 3.4 percent of our sample frame had a psychiatric diagnosis at least once in this period and many of these individuals had a psychiatric diagnosis for more than one year, and some of them also had other problems diagnosed.

In Table 12 wee see the association between response and having at least one psychiatric diagnosis in 1997 - 2002. The response rate was 54 percent for this group. 18 percent could not be reached and 28 percent refused to participate. People with psychiatric problems are thus both more difficult to reach and to get to cooperate than an average respondent.

Table 12
Distribution of response status by having a psychiatric diagnosis 1997-2002
StatusNo diagnosisDiagnosisAll
Responded862
68.14
19
54.01
881
61.6
Refusal313
17.69
10
28.29
323
22.6
Not reached206
14.17
20
17.70
226
15.8
All1381491430
  1. Chi2(2)=24.485 (0.000)

  2. Note: Column percent in italics.

Just by looking at univariate distributions it is difficult to assess which variables are the most important to explain response, because many are confounded. We get, however, a very clear message from these tables, namely that response rates are much lower among low skilled and low-income people, many of whom are found among the oldest in the sample. We also confirm findings from previous studies that the contact frequencies are higher in large households and among households living in rural areas.

4. A sequential bivariate probit model with univariate selection

The sequence of events we wish to model is first the contact and if contact is established the event of giving an interview. Following Nicoletti and Peracchi (2005) we will use a bivariate probit model. Let Y1 be a dummy variable that takes the value one if a contact is established and Y2 another dummy variable that takes the value 1 if an interview is obtained. Assume the following model

Y1=β1X1+ε1;Y2=β2X2+ε2;Y1=1  if  Y1>0;  otherwise  Y1=0;Y2=1  if  Y1>0  and  Y2>0;  otherwise  Y2=0;

where Y1  and  Y2 are bivariate normal latent variables, while ε1 and ε2 are bivariate standard normal. The X-vectors are vectors of exogenous explanatory variables uncorrelated with the ϵ:s.

The parameters of the censored bivariate probit model have to satisfy certain constraints to make the model identifiable. If the covariates in the contact and the participation equations are the same, then the model is not identified. Identification becomes possible if X1 and X2 are not identical, i.e. exclusion restrictions are needed. In this respect we were guided by previous results and common sense. For instance, the variables “if having a leisure home” and “if having stayed in a hospital” were assumed to determine the probability of contact rather than the probability of cooperation. In the final specification a few insignificant variables were deleted from either equation. The model was estimated by maximum likelihood.

Table 13 gives summary descriptive statistics. Because the descriptive statistics suggested that the relation with age was not exactly the same for the contacts as for the response once contacted, two different age classifications were used, one in the contact equation and one in the response equation.

Table 13
Descriptive statistics of independent variables.
VariableMeanS.D.
Age1 (≤55)0.2470.432
Age2 (56-75)0.5910.491
Age3 (76-)0.1620.368
Age4 (≤60)0.4660.499
Age5 (61-70)0.2630.440
Age6 (71-)0.2710.444
If female0.5300.499
If compulsory school0.3570.479
If high school0.3730.484
If university0.1890.380
If schooling missing0.0800.391
Wage (monthly)9,49012,870
If no wage0.4620.498
Disposable income159,995309,114
If sickness benefit0.1110.314
If social security0.0220.146
If major city0.3300.470
If urban area0.5340.498
If rural area0.1360.343
If leisure home0.1410.348
Household size1.8940.895
If unemployed0.0460.210
If married0.5880.492
If immigrant0.0420.202
If hospital stay0.1110.314

The maximum likelihood estimates are presented in Table 14. These results show that the probability of contact increases with age. Elderly people are more frequently at home to answer the telephone. There is no significant difference between males and females, while couples are easier to contact than singles. People with high school or university are somewhat more difficult to contact than people with only compulsory schooling, but these estimates are uncertain. The group with missing schooling data has a small probability of contact. When this group was deleted from the analysis, the effect of the immigrant dummy became stronger. This suggests that there is a positive correlation between having no schooling data and being immigrant. The missing schooling variable now picks up part of the immigrant effect.

Table 14
ML estimates of a bivariate probit model.
VariableEstimateS.D.p-value
Participation given contact
Constant0.8390.298(0.005)
Age2 (55-75)0.2090.118(0.076)
Age3 (76-)6.6890.161(0.000)
If female0.0090.092(0.924)
If schooling missing-13.9910.248(0.000)
If high school*-0.1940.136(0.153)
If university0.0260.125(0.837)
Wage rate-11.4e-065.72e-06(0.045)
If no wage-0.5350.125(0.000)
Disposable income7.88e-076.87e-07(0.251)
If welfare benefits-0.8150.374(0.000)
Household size-0.0870.067(0.193)
If married0.1650.145(0.254)
If immigrant-0.3600.250(0.150)
Contact
Constant0.4280.206(0.038)
Age5 (61-70)0.4940.123(0.000)
Age6 (71-)1.1540.183(0.000)
If female0.0780.087(0.366)
If schooling missing-0.9960.202(0.000)
If high school*-0.1680.131(0.201)
If university-0.0440.124(0.727)
If no wage-0.3160.131(0.015)
Disposable income1.66e-073.32e-07(0.616)
If sickness benefits0.1520.151(0.314)
If welfare benefits-0.3660.255(0.152)
If urban area0.2080.093(0.025)
If rural area0.3340.143(0.019)
If leisure home-0.1840.132(0.161)
Household size0.0650.067(0.332)
If unemployed0.1100.207(0.595)
If married0.4380.119(0.000)
If immigrant-0.4030.189(0.033)
If hospital stay-0.2520.133(0.057)
Residual correlation0.2330.423(0.596)
Log pseudolikelihood-1080.595
  1. *

    Includes individuals with a high school degree or individuals who studied at the university for less than two years.

  2. Includes individuals with more than two years at university.

The estimate of having sickness benefits is positive (sick people tend to be at home) but insignificant, while having stayed in hospital reduces the probability of contact.11 We have also tried alternative specifications using the data on respondents having a psychiatric diagnosis. Replacing the hospital stay indicator with this variable in the contact equation also gave a negative and significant effect. If both variables were included both point estimates became negative, but the P-values increased. The P-value of the hospital stay variable became 0.06 while the value of the psychiatric diagnose variable increased to 0.16. For this reason, we only kept the hospital stay variable as our preferred specification. The other parameter estimates were robust to these changes in the specification.

People who do not work for pay and immigrants are much more difficult to contact than the average person, and the probability of contact is smaller in the big cities than in other urban and rural areas. The point estimate suggests that those who are on welfare are more difficult to contact than average, but this effect is not well determined. There is no significant effect of being unemployed in addition to not working. The estimate for the wage rate variable was small and insignificant and thus dropped from the equation. Household disposable income had no significant effect either.

The point estimate for those who have a secondary home has the expected negative sign, but the standard error is relatively high.

There are a few variables which we have tried but then dropped from the model. One of them is the number of children in the household, which in previous studies has been shown to explain response, in particular (many) children increase the probability of contact. In our case this variable became insignificant both in the contact and in the cooperation equations. This is perhaps not so strange, because in the age groups included in our study there are relatively few families with children and if they have children they are in the upper teens.

We have also experimented with a few interactions, namely gender x schooling, wage rate x schooling and disposable income x schooling, but they were all insignificant in both equations.

The probability of a successful interview increases with the age of the respondent. In particular those above 75 are willing to grant an interview. We do not find any gender effect in this case either. Household size and marital status are insignificant too.

Not having a job, being on welfare and being immigrant all reduce the willingness to cooperate. The immigrant effect is however rather uncertain. Disposable income does not contribute to the explanation of cooperation in addition to these variables.

There is a significant time-cost effect as the effect of the wage rate variable is negatively significant. If we drop the income variable from the equation the wage rate effect moves closer to zero and becomes insignificant. In this case the wage rate variable thus picks up some of the positive effect of the income variable. According to a conventional economic time allocation model both variables should be included.

The schooling variable does not contribute much to the explanation of cooperation with the exception that respondents with missing data on this variable have a much lower probability of cooperation than everyone else. As already suggested this result might mask effects that are unrelated to schooling.

In previous model runs the variables capturing the degree of urbanization became insignificant in the participation equation, and for this reason we dropped these variables.

Our model allows for a correlation between the contact and participation equation whereas Lepkowski and Couper (2002) assumed independence. They thus assumed that omitted variables have no joint impact on contact and participation. A more general assumption is to allow for unobservables influencing both the probability of contact and that of participation and thus creating a correlation between the contact and participation equations. Results are not conclusive. The correlation is moderately positive but insignificant.

5. Concluding remarks

From an economist’s perspective it might be reasonable to believe that time cost has a strong influence on the probability of contact and participation, and consequently that high wage earners and high-income people are difficult to convince to participate in surveys. Confirming previous results about nonresponse in cross-sectional surveys and attrition in panel studies this study shows that this notion is largely false. It is true that we have found a significant time cost effect on participation, but the major finding is that nonresponse primarily comes from the left tail of the income distribution. Respondents without work, on welfare, and immigrants are those who both are difficult to contact and to convince to participate. People at the peak of their career are also difficult to contact and to convince. Similar to many other studies we have also found that the probability of contact is relatively low in the big cities and among singles, but we could not find any significant decrease in cooperation.

This result would seem to have implications both for survey design and post survey compensation measures. The characteristics of the respondents that contribute to nonresponse suggest that this is a group which is rather uninterested in the research purpose of our survey and that measures should be taken to try to wake up a greater interest. The properties of those who do not respond also suggest that this is a group in an economic situation such that they should be sensitive to economic incentives even if they are rather small.

In addition to the major group of nonresponding, contact efforts should also focus on people who live in urban areas, who are single, have more than basic training, are in the peak of their work career and have a secondary home.

Recent nonresponse research has focused on the circumstances under which nonresponse damages inference to the target population and results in biased estimates of population entities (Singer, 2006). Groves (2006) demonstrated that high nonresponse does not necessarily result in biased estimates. In addition to the response ratio the magnitude of the bias depends on the correlation between the propensity to respond and the attributes the survey researcher is measuring. If there is no or only a very weak correlation, there is no or only a small bias even if the survey is burdened by nonresponse. This implies that researchers who in their analysis focus on variables that explain the contact and cooperation probabilities or on measures that are highly correlated with these variables will suffer from a biased inference unless proper compensation measures are taken. Our literature review and our own results are suggestive as to the nature of these variables. It also follows that calibration methods which try to compensate for nonresponse should use variables and population information that explain response. According to our findings gender is not such a variable while, for instance, age, marital status, labor force participation, if immigrant, health status, family income and population density of the area are such variables.12

Footnotes

1.

Depending on design one might try to recruit those who did not participate in the first wave to participate in a second wave, but in many surveys this is never attempted.

2.

For this age group Linda included 137,557 individuals and the population size was 3,026,499.

3.

The web address to HRS is https://hrsonline.isr.umich.edu/, and to SHARE http://www.share-project.org/home0.html

4.

The 50-84 age cohorts were used because both the HRS and the SHARE surveys cover the population 50+. The restriction to people below the age of 85 was enforced to avoid the response problems that arise when respondents are demented or have other types of old age-related illnesses.

5.

The variables used are not clean measures of socio-economic status. It is a mixture of the monthly rent for renters and a self-estimate of house value for house owners.

6.

See the review in Groves and Couper (1998).

7.

Svensk telemarknad.

8.

These individuals had either died or moved abroad between the day of selection and the day of the interview.

9.

The group unmarried includes people that are cohabiting but not legally married. A similar table but classified by “singles” and “couples”, where the group couples includes married and cohabiting with common children, gave virtually the same result. In the age group 50+ most couples are married.

10.

Sickness benefits are only paid to people who have not retired.

11.

Sickness benefits were insignificant in the cooperation equation and dropped.

12.

Using the same data as in this study Johansson (2007), Chapter 4, compares the calibration approach applied to an earnings function to a model-based approach.

References

  1. 1
  2. 2
    Modeling Response in a Panel survey, Working Papers of the European Scientific Network on Household Panel Studies. Paper 81
    1. P Brose
    2. A Klevmarken
    (1993)
    Colchester: University of Essex. (Presented at the 8th session of the International Statistical Institute, Cairo.
  3. 3
    Can You Hear Me Knocking: An Investigation into the Impact of Interviewers on Survey Response Rates, Social and Community Planning Research
    1. P Campanelli
    2. S Purdon
    3. P Sturgis
    (1997)
    Can You Hear Me Knocking: An Investigation into the Impact of Interviewers on Survey Response Rates, Social and Community Planning Research, London.
  4. 4
  5. 5
    Nonresponse Rates and Nonresponse Bias in Household Surveys
    1. RM Groves
    (2006)
    Public Opinion Quarterly 70:646–675.
    https://doi.org/10.1093/poq/nfl033
  6. 6
    Nonresponse in Household Interview Surveys
    1. RM Groves
    2. MP Couper
    (1998)
    Hoboken, NJ, USA: John Wiley & Sons, Inc.
    https://doi.org/10.1002/9781118490082
  7. 7
    Essays on Measurement Error and Nonresponse, Ph.d. thesis: Economic Studies 103
    1. F Johansson
    (2007)
    Department of Economics, Uppsala University.
  8. 8
    Culture and Survey Nonresponse
    1. TP Johnson
    2. D O’Rourke J
    3. L Owens
    (2002)
    Chapter 4 in Survey Nonresponse, Culture and Survey Nonresponse, New York, John Wiley & Sons.
  9. 9
    Strategies for reducing nonresponse in a longitudinal panel survey
    1. H Laurie
    2. R Smith
    3. L Scott
    (1999)
    Journal of Official Statistics 15:269–282.
  10. 10
    Nonresponse in the Second Wave of Longitudinal Household Surveys
    1. JM Lepkowski
    2. MP Couper
    (2002)
    New York: Survey Nonresponse, John Wiley & Sons, Inc.
  11. 11
    Non-Response Errors in Sample Surveys, Urval No 16
    1. HL Lindström
    (1983)
    Örebro: Statistics Sweden.
  12. 12
    Survey Nonresponse
    1. P Lynn
    2. J Clarke
    3. J Martin
    4. P Sturgis
    (2002)
    The Effects of Extended Interviewer Efforts on Nonresponse Bias, Survey Nonresponse, New York, John Wiley & Sons, Inc.
  13. 13
    Survey response and survey characteristics: Microlevel evidence from the European Community Household Panel
    1. C Nicoletti
    2. F Peracchi
    (2005)
    Journal of the Royal Statistical Society, Series A 168:763–781.
    https://doi.org/10.1111/j.1467-985X.2005.00369.x
  14. 14
    Estimation in Surveys with Nonresponse
    1. C-E Särndal
    2. S Lundström
    (2005)
    Chichester, UK: John Wiley & Sons, Ltd.
    https://doi.org/10.1002/0470011351
  15. 15
    Introduction. Nonresponse bias in household surveys
    1. E Singer
    (2006)
    Public Opinion Quarterly 70:637–645.
    https://doi.org/10.1093/poq/nfl034
  16. 16

Article and author information

Author details

  1. Fredrik Johansson-Tormod

    National Institute of Economic Research, Stockholm, Sweden
    For correspondence
    fredrik.johansson-tormod@konj.se
    Competing interests
    No competing interests reported
  2. Anders Klevmarken

    Department of Economics, Uppsala, Sweden
    For correspondence
    anders@klevmarken.nu
    Competing interests
    No competing interests reported

Funding

This paper is part of an NIA (R03AG21780) and FAS (2001-2830) funded project (Comparison of Survey and Register Data: The Swedish Case) in collaboration with Arie Kapteyn and Susann Rohwedder (RAND).

Acknowledgements

The authors are grateful for constructive suggestions from two anonymous referees and suggestions on a previous version from Arie Kapteyn, Susann Rohwedder and Jelmer Yeb Ypma. Thanks also to seminar participants at the Department of Information Science, Division of Statistics, Uppsala University.

This essay was previously published in the Journal of Official Statistics, Vol. 24, No 3, pp. 431-449, 2008

The views expressed in the article belong solely to the authors, and not to the National Institute of Economic Research.

Publication history

  1. Version of Record published: April 30, 2022 (version 1)

Copyright

© 2022, Johansson and Klevmarken

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)