FAQs
1. Data Access
1.1 Who can use the data?
1.2 Where can I download the data?
SHARE data is distributed by Centerdata which is located at the Tilburg University campus in the Netherlands. The download procedure and the conditions for data access are described here.
1.3 Which data formats are available?
SHARE data is provided in Stata and SPSS format. easySHARE is additionally available for the software R. For the use with other statistical software, the data has to be transferred by users themselves.
1.4 What can I do when I lost username and/or password for downloading the data?
If you lost your username and/or password for downloading the data, please enter your email address here and you will receive a reminder email with your password. If you do not remember the email address you used for registration, please contact the SHARE Research Data Center.
2. Documentation
2.1 Which documentation files exist?
A set of documentation files is offered to facilitate the use of the SHARE data. The data resource profile published in the International Journal of Epidemiology provides a compact overview on SHARE. Additional to the wave- and country-specific questionnaires, Release Guide 9.0.0 is specifically directed to researchers working with the data.
Except for Wave 2 there are also wave-specific methodology volumes. Methodological changes in Wave 2 are shortly summarized in chapter 8 of the First Results Book (FRB) of Wave 2. Furthermore, a tool to overhaul deviations between Waves 1, 2, 4, 5, 6, 7, 8 and 9 is provided. Table 1 contains the links to the essential documentation files of SHARE.
Table 1: Overview of documentation files for waves 1 to 8
Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 | Wave 6 | Wave 7 | Wave 8 | SCS1 | SCS2 | |
Release Guides | ||||||||||
Questionnaires | X | X | X | X | X | X | X | X | X | X |
Cross-Wave Comparison | X | |||||||||
Methodology | X | X | X | X | X | X | X | X | ||
Scales Manual | Scales and Multi-Item Indicators (PDF) | |||||||||
Data Resource Profile | Börsch-Supan A. et al. (2013): Data Resource Profile: The Survey of Health, Ageing and Retirement in Europe (SHARE), Int J of Epidemiology | |||||||||
Data & Documentation Tool | Web interface for browsing and searching the SHARE (meta)data |
2.2 Which types of questionnaires are used in SHARE?
SHARE applies a concept of ex-ante harmonisation: there is one common generic questionnaire that is translated into the 40 national languages (in some countries more than one language is used) using an internet based translation tool and processed automatically in a common CAPI instrument. The generic questionnaire and the country-specific questionnaire versions can be downloaded from the SHARE website (see the links in Table 1). However, some internationally highly diverse variables require country-specific measurements and ex-post harmonization, for example in the areas of education (ISCED) or occupation (ISCO, NACE).
Apart from generic and country-specific questionnaires there are also special questionnaire types like the coverscreens, drop offs, vignettes and end-of-life questionnaires. The coverscreen is the first module of each interview. It collects basic demographic information about every person currently living in the household. The coverscreen questionnaire is completed by only one member of the household on behalf of all household members. The interview usually ends with the self-completion of a paper & pencil questionnaire, the so-called drop off (see question 4.6). Another special self-completion questionnaire is the vignettes questionnaire (see question 4.7) that was collected in Wave 1 and 2. Vignettes are supposed to improve cross-national comparability. If a respondent deceased between waves, SHARE tries to conduct an end-of-life interview (see question 4.9) with a proxy respondent. The end-of-life questionnaire mainly contains information on life circumstances in the year before the respondent deceased and on the circumstances of death.
3. Methodology
3.1 How are the data collected?
SHARE data collection is based on computer-assisted personal interviewing (CAPI). The interviewers conduct face-to-face interviews using a laptop computer on which the CAPI instrument is installed. Personal interviews are necessary for SHARE because they make the execution of physical tests and the collection of biomarkers possible. Exceptions are the drop off and the vignettes questionnaires which are conducted via paper & pencil as well as the end-of-life interviews that can be conducted via CATI (computer-assisted telephone interview), too. For more details on SHARE data collection see the methodology of Börsch-Supan, A. and H. Jürges (2005).
An additional exception is the SHARE Corona Survey (see 4.11 for further details). Personal interviewing was not possible after the outbreak of the SARS-CoV2 pandemic. After careful considerations of the feasibility of different alternatives for SHARE’s target population, it was decided that SHARE would resume interviewing with a Computer-Assisted Telephone Interview (CATI).
3.2 Who is eligible?
The SHARE target population consists of all persons aged 50 years and over at the time of sampling who have their regular domicile in the respective SHARE country. A person is excluded if she or he is incarcerated, hospitalized or out of the country during the entire survey period, unable to speak the country’s language(s) or has moved to an unknown address. In Wave 1 all household members born 1954 or earlier, are eligible for an interview. Starting in the second wave, for new countries or refreshment samples, there is only one selected respondent per household who has to be born 1956 or earlier in wave 2, 1960 or earlier in wave 4, 1962 or earlier in wave 5, 1964 or earlier in wave 6, 1966 or earlier in wave 7, 1969 or earlier in wave 8, and 1971 or earlier in wave 9. In addition – in all waves – current partners living in the same household are interviewed regardless of their age.
All SHARE respondents who were interviewed in any previous wave are part of the longitudinal sample. If they have a new partner living in the household, the new partner is eligible for an interview as well, regardless of age. Age eligible respondents who participated are traced and re-interviewed if they move within the country and end-of-life interviews are conducted if they decease. Younger partners, new partners and partners who never participated in SHARE will not be traced and are not eligible for an end-of-life interview.
3.3 Why are there different types of respondents?
In order to save time and reduce the respondents' interview burden, the CAPI main questionnaire is designed in a way that not every eligible household member is asked every questionnaire module. Household respondents answer questions on housing, household income and consumption representative for all household members. On behalf of the couple, financial respondents answer financial transfer and asset questions and family respondents answer questions on children and social support – also on behalf of the couple. The respondent types are indicated by the variables hou_resp (household respondent), fin_resp (financial respondent) and fam_resp (family respondent) in the cv_r module as well as in the technical variables module. The SHARELIFE questionnaire does not differentiate between respondent types.
3.4 What are proxy interviews?
If physical and/or cognitive limitations make it too difficult for a respondent to complete the interview her-/himself it is possible that the sample respondent is assisted by a so-called proxy respondent to complete the interview (“partly proxy” interview). If the proxy respondent answers the entire questionnaire in lieu of the respondent, the interview is referred to as a “fully proxy” interview. Examples of conditions under which proxy interviewing is allowed are hearing loss, speaking problems, Alzheimer´s disease and difficulties in concentrating for the whole interview time period. Proxy respondents are also asked for end-of-life interviews in case of a respondent´s decease. Some questionnaire modules are defined as non-proxy sections because those cannot be answered by other persons. Cognitive functioning, mental health (partly), grip strength, walking speed, activities, and expectations modules are non-proxy sections. The other sections contain the information on who answered the section at the end of the respective questionnaire module: (1) respondent only, (2) respondent and proxy or (3) proxy only.
3.5 How are response rates calculated in SHARE?
Depending on the available sampling frame, some countries might need a screening procedure to determine the eligibility status of the respondents while others need no initial screening. Based on this differentiation, there are several ways in which final response rates can be calculated, depending on how cases of unknown eligibility are handled (see AAPOR guidelines for further information):
RR1 = I / (I+P) + (R+NC+O) + (UH+UO)
RR3 = I / (I+P) + (R+NC+O) + e(UH+UO)
RR5 = I / (I+P) + (R+NC+O)
I: | number of completed interviews |
P: | number of partial interviews |
R: | number of refusals and break-offs |
NC: | number of non-contacts |
O: | number of other non-interviews |
UH: | number of cases with unknown eligibility (unknown if housing unit exists) |
UO: | number of cases with unknown eligibility (unknown, other) |
e: | fraction of eligible units among the cases with known eligibility |
Further information on the calculation as well as the final response rates in SHARE on household and individual level by wave, country, and certain subgroups can be found in this Technical Paper.
3.6 How are retention rates calculated in SHARE?
After several waves, various types of retention rates can be calculated conditional on previous participation that might differ between countries due to differences in the sample composition. In SHARE we differentiate between the following concepts:
- Individual-level retention excluding recovery
- Individual-level retention including recovery of former respondents
- Individual-level retention including recovery of former respondents and new/missing partners
More detailed information about the participation of respondents in their first (baseline/refreshment) interview and the longitudinal development of the survey including response and retention rates can be found in this Technical Paper.
3.7 How are issues of attrition dealt with?
Sample attrition means that respondents drop out from the survey over time. The reasons for the drop-out can be multifarious. For a longitudinal sample which was drawn randomly at the beginning of the data collection process, sample attrition would not pose any challenges if the attrition occurs randomly – which is not the case in reality. Besides refreshing the sample in several countries (which is also dependent on funding) the strategy of SHARE to deal with problems of sample attrition is to dedicate special effort into re-interviewing respondents who participated in previous waves and to provide calibrated weights. Under certain conditions, these weights may help to reduce the potential selectivity bias generated by sample attrition and unit nonresponse.
3.8 Sample cleaning rules in SHARE
SHARE makes every effort to interview and recover (panel) respondents, and is quite successful in that (see recovery rates in Bergmann et al., 2019, section 5). However, after several waves of non-participation, the chance to recover any more respondents from households that did not participate for a very long time is negligibly low. Furthermore, a high percentage of unpromising cases is demotivating for interviewers and can affect the overall wave response/retention rates and the fieldwork quality. Since Wave 7, we therefore apply sample cleaning rules similar to most other large panel studies.
Sample cleaning rules in SHARE:
- Households in which none of the eligible members has participated in three or more consecutive waves are dropped from the longitudinal sample. Non-participation can be either non-contact, refusal, unknown address or any other justified reason.
- Eligible individual sample members who have deceased and where it was not possible to find a proxy conducting an end-of-life interview for two waves or longer are dropped from the longitudinal sample.
3.9 How is mortality documented in SHARE?
SHARE classifies the vital status of respondents as either “alive”, “dead” or “unknown”. Note that due to the lack of a national mortality register in most European countries, we cannot reliably ascertain the vital status of non-respondents. More information can be found in this Technical Paper.
3.10 Is there a data set that links administrative data and the SHARE data?
Survey data can cover a wide range of topics. However, a survey cannot cover all topics of interest and information provided by respondents could be incomplete or inaccurate. Administrative data is more accurate but usually limited to a certain topic. Linking survey data with administrative data is a way to combine the best of both worlds. Upon respondents’ written consent, administrative data of the German Pension Fund can be linked to the survey data of the German subsample of SHARE (SHARE-RV). Similar linkage projects have been set up in other SHARE countries as well: Austria, Belgium, Estonia, Denmark, Finland, the province of Girona in Spain, Luxembourg and the Netherlands. For more information on this project see question 6.3.
3.11 How is SHARE ethically approved?
The SHARE study is subject to continuous ethics review. During Waves 1 to 4, SHARE was reviewed and approved by the Ethics Committee of the University of Mannheim. Wave 4 and the continuation of the project were reviewed and approved by the Ethics Council of the Max Planck Society. In addition, the country implementations of SHARE were reviewed and approved by the respective ethics committees or institutional review boards whenever this was required. The numerous reviews covered all aspects of the SHARE study, including sub-projects and confirmed the project to be compliant with the relevant legal norms and that the project and its procedures agree with international ethical standards. Please see overview and summary of the ethics approvals for more information.
4. Structure and Content
4.1 What is the difference between the regular SHARE interview and the SHARELIFE interview?
SHARE is a panel study with a core of topics asked in every wave. To complement the information of respondents' lives from 50 onwards captured with the regular panel interviews, a special interview covering the retrospective life histories of the respondents was conducted in wave 3 and wave 7, the so-called SHARELIFE interview. While wave 3 contained only retrospective SHARELIFE items, wave 7 contains a combination. Every respondent who has not participated in SHARELIFE in wave 3 completed a life history interview (81.8% of respondents). Respondents who already participated in SHARELIFE in wave 3 completed a regular panel interview in wave 7 (18.2% of respondents). Variable mn103_ in the technical variable module contains the information on the type of interview (life history interview y/n). Please note that as a result the amount of missing values is very high for most items from the regular panel interview because only 18.2% of the respondents answered those questions.
Further documents on the questionnaire structure of Wave 7 are listed on our Wave Overview.
4.2 What information does the regular SHARE questionnaire contain?
The SHARE interview consists of various thematic blocks or modules. Prior to the main interview, the coverscreen (cv_r module) is completed by one household member on behalf of the household. The main questionnaire is based on various different CAPI modules that are listed in table 2. To pick up contemporary issues and due to alterations and time constraints not every module was part of every wave.
Table 2: Regular questionnaire modules of Waves 1, 2, 4, 5, 6 and 7
Questionnaire Modules | Wave 1 | Wave 2 | Wave 4 | Wave 5 | Wave 6 | Wave 7 | |
CV_R | Coverscreen on Individual Level | X | X | X | X | X | X |
DN | Demographics and Networks | X | X | X | X | X | X |
SN | Social Networks | X | X | ||||
CH | Children | X | X | X | X | X | X |
PH | Physical Health | X | X | X | X | X | X |
BR | Behavioral Risks | X | X | X | X | X | X |
CF | Cognitive Function | X | X | X | X | X | X |
MH | Mental Health | X | X | X | X | X | X |
HC | Health Care | X | X | X | X | X | X |
EP | Employment and Pensions | X | X | X | X | X | X |
IT | Computer Use | X | X | ||||
MC | Mini Childhood | X | |||||
GS | Grip Strength | X | X | X | X | X | X |
WS | Walking Speed | X | X | ||||
CS | Chair Stand | X | X | ||||
BS | Blood Spot | X | |||||
PF | Peak Flow | X | X | X | |||
SP | Social Support | X | X | X | X | X | X |
FT | Financial Transfers | X | X | X | X | X | X |
HO | Housing | X | X | X | X | X | X |
HH | Household Income | X | X | X | X | X | X |
CO | Consumption | X | X | X | X | X | X |
AS | Assets | X | X | X | X | X | X |
AC | Activities | X | X | X | X | X | X |
EX | Expectations | X | X | X | X | X | X |
IV | Interview Observations | X | X | X | X | X | X |
Special Questionnaire Modules | |||||||
XT | End-of-Life Interview | X | X | X | X | ||
DO | Drop-off | X | X | X | X | X | X |
VI | Vignettes | X | X | ||||
TC | Technical Variables | X | X | X | X | X | X |
4.3 What is the content of SHARELIFE?
The SHARELIFE questionnaire has a different focus than the regular waves. It contains all important areas of the respondents’ life histories, ranging from childhood conditions, partners and children over housing and financial history and employment history to detailed questions on health and health care. Table 3 indicates the questionnaire modules of SHARELIFE. Additionally, some single questions on household income (HH) and present physical health (PH) are included.
Table 3: Retrospective questionnaire modules of waves 3 and 7
W3 module | W7 module | Content of Questionnaire Modules |
CV_R | CV_R | Coverscreen on individual level |
ST | DN | Demographics |
AC | RA | Retrospective Accommodation |
CS | CC | Childhood Section / Childhood Circumstances |
DQ | DQ | Disability |
FS | FS | Financial Section |
GL | GL | General Life and Persecution |
HC | RH | Retrospective Children History |
RE | RE | Retrospective Employment |
RP | RP | Retrospective Partner History |
WQ | WQ | Work Quality |
4.4 What is easySHARE?
easySHARE is a simplified HRS-adapted dataset for student training, and for researchers who have little experience in quantitative analyses of complex survey data. easySHARE stores information of all respondents and of all currently released data collection waves in one single dataset. For the subset of variables covered in easySHARE, the complexity was considerably reduced. easySHARE is stored as long format panel dataset. In addition to the data and the release guide, the download zip files include the Stata programme that was used to extract easySHARE from the regular SHARE data. This allows users to retrace how each variable was extracted and modified and facilitates adding or changing information. It can also be used as an example of how to create an analysis dataset yourself. For more information please click here.
4.5 Does SHARE contain information on race/ethnicity?
No, SHARE does not contain information about race/ethnicity. But it contains the respondents´ country of birth (dn004_ + dn005c) and the respondents´ citizenship (dn007_ + dn008c), both available in the demographics module. From Wave 5 onwards SHARE also includes the country of birth of the respondents´ parents (dn504c + dn505c). The introduction of the latter variables enables the identification of second-generation migrants. Citizenship and country of birth are coded according to ISO 3166-1 (numeric-3).
4.6 What kind of information is provided by the interviewer observation module (IV)?
This module is answered by the interviewer right after finishing the interview. It contains information on the interviewing experience which is important in order to understand the circumstances under which the interview was conducted.
4.7 What is a "drop-off" questionnaire?
The interview ends with the self-completion of a paper & pencil questionnaire. This questionnaire includes additional questions on e.g. mental and physical health, health care and social networks. In waves 1, 2 and 4, drop-off questionnaires were conducted in all countries. Partly the content of the drop off questionnaire is country-specific. Especially the Wave 4 drop off questionnaire contains many country-specific questions aside from a generic part on health and health care. From Wave 5 onwards, drop-off questionnaires were conducted only in some countries with completely country-specific questions.
In the drop off data, the generic variables have variable names starting with “q”, country-specific variables contain the country code as prefix, e.g. “at_” for Austria. The drop offs differ across waves. This is due to new questions added and questions that are not asked anymore. In addition some questions of the Wave 1 drop off are asked in the CAPI in Wave 2 (see appendix of SHARE Release Guide 9.0.0).
4.8 What are “vignettes”?
For the vignettes extra samples were taken in eight countries in Wave 1 (BE, DE, ES, FR, GR, IT, NL, SE) and in eleven countries in Wave 2 (BE, CZ, DK, DE, ES, FR, GR, IT, NL, PL, SE) in order to collect a special self-completion questionnaire with anchoring vignette questions. These are supposed to improve cross-national comparability. Two types were randomly assigned to the respondents. They differ with regard to question order and gender of the people described in the statements. The variable “type” contains information on the vignette type. The labels of the variables show which questions correspond to the other type.
4.9 What kind of physical measurements are included in SHARE?
Physical measurements and biomarkers are part of SHARE since there is promising scientific value to it. Standard health questions are often subject to the respondents´ evaluation or perception. Objective measurements can help (1) to validate respondents´ self-reports, (2) to understand the complex relationships between social status and health and their physiological pathways and (3) to identify pre-disease pathways. SHARE combines self-reports on health with four physical performance measurements: grip strength (GS), walking speed (WS), peak-flow (PF) and chair stand (CS). Additionally, dried blood spots (DBS) samples were collected in SHARE wave 6 in 12 countries: Belgium, Denmark, Estonia, France (subsample only), Germany, Greece, Israel, Italy, Slovenia, Spain, Sweden, and Switzerland. The DBS data is not yet available. Please subscribe for the SHARE users’ newsletter and/or check our homepage to be informed as soon as the data will be available.
4.10 How is mortality assessed?
SHARE requests that the interviewers confirm the decease of a respondent by a proxy-respondent. In case of decease, interviewers try to conduct an end-of-life interview with a proxy-respondent. The proxy-respondent can be a family member, a household member, a neighbour or any other person of the closer social network of the deceased respondent. The end-of-life interview mainly contains information on life circumstances in the year before the respondent deceased as well as the circumstances of death like time and cause of death. The variables are stored in the xt-module of Wave 2 onwards. Apart from the end-of-life interview, the gv_allwaves_cv_r module contains the variables deadoralive, deceased_year, deceased_month and deceased_age.
4.11 How did SHARE react to the outbreak of COVID-19?
The outbreak of COVID-19 hit SHARE in the middle of its 8th Wave of data collection. The fieldwork had to be suspended in all participating countries in March 2020. At this point in time, about 70 percent of all expected interviews in the panel sample across countries had been conducted.
To resume fieldwork, a switch to telephone administered interviews (CATI) was decided and a specific questionnaire was developed, collecting data on the same topics as in the regular SHARE questionnaire but shortened and targeted to the COVID-19 living situation of people who are 50 years and older. Based on methodological considerations in connection with the health protection of respondents and interviewers, the use of CATI was the preferred alternative to face-to-face interviewing. For a more detailed description of the mode switch, see: Scherpenzeel, A., Axt, K., Bergmann, M., Douhou, S., Oepen, A., Sand, G., Schuller, K., Stuck, S., Wagner, M., & Börsch-Supan, A. (2020). Collecting survey data among the 50+ population during the COVID-19 outbreak: The Survey of Health, Ageing and Retirement in Europe (SHARE). Survey Research Methods, 14(2), 217-221.
The first SHARE Corona Survey was fielded in summer 2020. It allows researchers to investigate the life situation of older Europeans in the initial phase of the pandemic in a cross-national perspective. One year later, in summer 2021, interviews for the second round of the SHARE Corona Survey were conducted as part of the SHARE-COVID19 project funded by the European Commission (grant number: 101015924).
5. Handling of the Data
5.1 How can I merge the data?
To merge different modules and/or waves of the SHARE data on individual level mergeid is the key person identifier. mergeid is non-varying across waves. If the data are to be merged on household level one of the hhid`w’ (where `w’ stands for the respective wave) variables should be used as key identifier.
5.2 Why does the case number of the coverscreen not match with the other modules?
The coverscreen (cv_r) includes all members of the household – also ineligible and non-responding household members as well as end-of-life interviews conducted with proxy respondents for respondents who deceased between waves. The case number in the other regular CAPI modules is lower because they only include persons with interview. Household members without interview can be identified by the variable interview in the cv_r module. It takes the value 0 for those who did not do an interview.
5.3 How can I identify partners?
In SHARE, partners can be identified by the mergeidp`w’ (where `w’ stands for the respective wave) which indicates the mergeid of a respondent’s partner. Each couple has a coupleid indicated by the variable coupleid`w’. The coupleid is generated using mergeid of both partners and is therefore unique to each couple as well as fix across waves if the couple stays the same.
5.4 Why do some variables like education or height contain so many missing values?
The reason for this is that time constant variables are only asked in the baseline interview. The baseline interview is the first SHARE interview of each respondent. SHARE’s sample is refreshed from time to time in several countries, which is why the baseline interview is not necessarily the Wave 1 interview.
Height is one example for such time constant variables. If users want to use these variables in later waves than the one in which the baseline interview took place, the information has to be transferred by first merging the waves together and then assigning the information to later waves. Furthermore, some questions for the longitudinal sample are only asked if there was a change since last interview, e.g. marital status. This also leads to a high amount of missing values in the respective variable.
5.5 What are “unfolding brackets”?
When a respondent does not know (DK) or refuses (RF) the answer to a question about amounts of money, usually an unfolding sequence of bracket questions starts. The aim of unfolding brackets is to get at least a range in which e.g. the respondent´s income is located.
There are three entry points, the starting point is chosen randomly. The public release includes the country-specific bracket values (in Euros) and the final respondent´s category. When a DK or RF is given during the unfolding bracket sequence, the value for the final category is set to either DK or RF. The name of unfolding bracket variables contains “ub” after module identifier and question number (see question 5.6 for the general naming format). For more information on unfolding brackets see SHARE Release Guide 9.0.0.
5.6 What is the general naming format of variables?
The naming of variables is harmonised across waves. Variable names in the CAPI instrument data use the following format: mmXXXyyy_LL. “mm” is the module identifier, e.g. DN for the demographics module, “XXX” refers to the question number, e.g. 001, and “yyy” are optional digits for dummy variables (indicated by “d”), euro conversion (indicated by “e”) or unfolding brackets (indicated by “ub”). The separation character “_” is followed by “LL” optional digits for category or loop indication (“outer loop”).
5.7 What is the ado-file sharetom good for?
The ado-file sharetom is a programme that recodes missing values and labels them appropriately. If users want to apply sharetom.ado we recommend executing it immediately after opening the data file or after merging the modules needed. Note that sharetom is updated from time to time. The current version is sharetom5.
5.8 Is longitudinal analysis about respondents´ children possible?
For longitudinal analyses on children users cannot rely on the order of the children in the CH module. It is necessary to match them on gender and year of birth - this will lead to correct merges in most cases. There are a couple of reasons behind this. First, respondents are supposed to report on their children in a defined order, but they may not necessarily do so. Second, partners may change and respondents always are supposed to report on both partners´ children. Third, you can never exclude reporting errors.
5.9 Can the children of the CH module be linked to information on social support (SP) and financial transfers (FT)?
In Wave 4, children named by the respondents in the CH module cannot be linked directly to the SP and FT module. The reason is a change in the so-called list with relations that comprehends all persons of the respondents´ social environment. Information on persons receiving or providing social support or financial transfers from/to the respondents is based on this list. Unlike Waves 1 and 2 in which the list included up to 9 children, the list with relations in Wave 4 includes up to 7 social network members and just one 'other child' option. Only those children named by the respondents as members of their social network are explicitly listed for the interviewer on the screen. It is thus not possible to specify children for questions on social support or financial transfers who are not named as social network members (for whatever reason).
5.10 Why do the variables on whether natural parent is still alive (dn026_1 and dn026_2) contain so many missing values in Wave 4?
The questions DN026_1 and DN026_2 contain the information if the respondents´ natural parents are still alive (dn026_1 for the respondents´ mother and dn026_2 for the respondents´ father). The routing for these variables involves information from previous waves for respondents who already participated and information from the social network module. Similar to other variables in SHARE, the amount of missing values can be reduced by merging the wave 4 data with previous waves. Based on the assumption that persons belonging to the respondent´s social network are still alive, the proportion of missing values can be additionally reduced by using the sn005* variables. Unfortunately the routing for DN026_1 and DN026_2 did not work adequately for all respondents in the wave 4 questionnaire. Not every respondent who should have been asked was indeed asked, so still a high amount of missing values remains.
In order to compensate this shortcoming, all participants with missing information in DN026_1 and DN026_2 have received these questions in Wave 5.
6. Generated Variables
6.1 What is the purpose of generated variables and which generated variables are provided?
To assure an easy and fast entry into cross-national data and high convenience while working with the data, it is necessary that certain variables are readily provided for the SHARE users, especially those that allow a valid comparison between countries, like the International Standard Classification of Education (ISCED). Besides internationally standardized variables, there are further generated variables that ease or enhance working with the SHARE data. Table 4 gives an overview of all generated variable modules.
Table 4: Generated variable module
Generated-Variable-Modules | Content | W1 | W2 | W3 | W4 | W5 | W6 | W7 |
gv_allwaves_cv_r | Coverscreen information across waves | Cross-wave module | ||||||
gv_longitudinal_weights | Longitudinal weights | Cross-wave module | ||||||
gv_weights | Cross-sectional sampling design and calibrated weights | X | X | X | X | X | X | |
gv_imputations | Multiple Imputations | X | X | X | X | X | X | |
gv_isced | International Standard Classification of Education (ISCED-97/since wave 5 additionally ISCED-11) | X | X | X | X | X | X | |
gv_health | Physical and mental health variables and indices like BMI, EURO-D depression scale, etc. | X | X | X | X | X | X | |
gv_housing | Housing and NUTS codes | X | X | X | X | X | X | |
gv_networks | Information on social networks | X | X | |||||
gv_exrates | Exchange rates for all waves, incl. nominal and ppp-adjusted exchange rates | Cross-wave module | ||||||
gv_job_episodes_panel | Labour market status of each SHARELIFE respondent throughout her/his life | Cross-wave module | ||||||
gv_grossnet | Net income measures derived from reported gross incomes | X | ||||||
gv_isco | Classification of occupations via ISCO and of industries via NACE codes | X | ||||||
gv_ssw | Social security wealth | X | ||||||
gv_deprivation | Indices for material and social deprivation | X | ||||||
gv_children | Combined children information | X | X | |||||
gv_linkage | Linkage to Statutory German Pension Insurance data | Cross-wave module | ||||||
gv_dbs | Dried Blood Spots | X | ||||||
gv_big5 | Big Five personality traits | X |
6.2 What is the allwaves-coverscreen good for?
This module is a dataset with merged and enriched information from all waves. In a straightforward way, gv_allwaves_cv_r allows to monitor household composition, changes of status (Is a respondent part of a couple or not? Is he or she dead or alive? etc.) and the type of interviews conducted.
6.3 Can SHARE data be linked to administrative data?
Projects to link SHARE data to administrative data have been set up in several SHARE countries.
Germany: Upon respondents’ written consent, administrative data of the German Pension Fund can be linked to the survey data of the German subsample of SHARE. Beginning in Wave 3, all respondents of the German subsample are asked for consent to link their survey data with administrative data of the German Pension Fund. This longitudinal dataset includes very detailed information on respondents´ employment histories. The module gv_linkage provides first information about who gave consent to link their data with the pension fund. To get access to the administrative data, researchers have to submit an additional form, directly to the data center of the German Pension Fund. Further information on access conditions as well as user guide and codebook for SHARE-RV is available here.
REGLINK-SHAREDK is the second successful linkage project in SHARE and relates to the Danish subsample of SHARE. Further information is available here.
6.4 What is the content of gv_exrates?
This module contains currencies (also pre-Euro) and exchange rates for non-Euro countries. Additionally, the module stores nominal exchange rates as well as exchange rates that adjust for purchasing power parity (ppp-adjusted).
6.5 What is the Job Episodes Panel (JEP)?
The JEP is a generated dataset that rearranges information taken from Waves 1 to 3 of SHARE in order to create a ready-to-use “long panel”. It contains the labour market status of each SHARELIFE respondent throughout her/his life. A detailed description of the methodology and assumptions underlying the construction of the dataset is available in the SHARE Working Paper 11-2013: “Working life histories from SHARELIFE: a retrospective panel”, by Agar Brugiavini, Danilo Cavapozzi, Giacomo Pasini, and Elisabetta Trevisan. When publishing with the SHARE job episodes panel data please use an additional disclaimer as described in the corresponding documentation file (PDF) which is available when downloading the data.
6.6 Which generated health variables are provided?
The gv_health module contains a broad range of generated health variables and health related indices regarding the respondents´ physical and mental health status. The majority of the variables is comparable to the US Health and Retirement Study (HRS). Variables on physical health module are e.g. the US version of self-perceived health (sphus), the body mass index (bmi), the number of chronic diseases (chronic), an index on mobility (mobility) and limitations with instrumental activities of daily living (iadl). Variables on mental health are e.g. the EURO-D depression scale (eurod), a measurement of orientation to date (orienti) and a numeracy score for mathematical performance (numeracy).
6.7 How is education measured in SHARE?
Education is one of the most diverse international variables. Therefore a standard coding is required for international comparisons. The gv_isced module contains the 1997 International Standard Classification of Education (ISCED-97). It is not only provided for respondents´ educational level but also for respondents´ children and former spouses´ as well as interviewers´ level of education (latter only in Wave 1). In Waves 1 and 2 the education of up to four selected children was asked. Wave 4 contains the ISCED-97 values for all children. In 2011, a revision of ISCED was adopted by UNESCO Member States. It takes into account significant changes in education systems worldwide since the last ISCED revision in 1997. From Wave 5 onwards both ISCED versions are provided in the SHARE data. Furthermore also the educational level of the respondents´ parents is included in Waves 5 and 6.
6.8 What information does the gv_isco module contain?
Respondents are asked for their own, their former partner’s and their parents’ occupation. For Wave 1 this information is coded based on the International Standard Classification of Occupations (ISCO-88) provided by the International Labour Organization (ILO). To classify the corresponding industries the gv_isco module additionally contains a version of the Statistical Classification of Economic Activities in the European Community (NACE, version 4 rev. 1 1993) which is slightly modified.
6.9 How are social networks captured in SHARE?
The CAPI module on social network (SN) was implemented in the fourth wave of SHARE as an innovative means to measure the personal social environment. The module was again part of Wave 6 and is based on an approach that goes beyond the more common role-relational method of measuring social networks mostly based on socio-demographic proxies. The SN module contains a detailed description of respondents´ personal social networks. Each respondent can name a maximum of seven persons who she/he considers to be her/his confidants. The module records the role relationship of each social network member and obtains information regarding each named person's gender, residential proximity to the respondent, frequency of contact and level of emotional closeness. Information of the SN module can be linked to the social support (SP) and the financial transfers (FT) module.
The generated variables module “gv_networks” stores variables that summarize information on the different attributes of the network. In Wave 6, the variables additionally summarize panel information and provide full information on each social network member.
6.10 How is deprivation measured?
This module is available in Wave 5 and contains three variables on material and social deprivation: depmat, depsoc and depsev. depmat is an aggregate measure of material conditions of older individuals in Europe using a set of 11 items that refer to two broad domains: the failure in the affordability of basic needs and financial difficulties. depsoc is an index for measuring social deprivation based on 15 items. depsev is a single two-dimensional indicator that identifies those with high levels of deprivation in each dimension. The threshold is the 75th percentile of the total distribution of each deprivation index. Individuals with deprivation measures placing them above the threshold in both dimensions are classified as being “severely deprived”.
6.11 Does SHARE contain a measure for social security wealth?
Since release 5.0.0, SHARE wave 4 includes a new generated module containing two measures of individual accrued social security wealth (SSW). The two variables are SSW_nw and SSW_gw respectively. The former is based on net wages earned by individuals during their working career. The latter is based on their approximately grossed-up wages, and additionally takes into account minimum pension benefits whenever the individual is entitled to that benefit. Note that since no information from the JEP was required to compute the SSW for retirees, the two variables SSW_nw and SSW_gw are equal for this group.
6.12 When do I need gv_grossnet?
In SHARE wave 1 income variables have been collected before taxes and social insurance contributions. In the following waves income variables have been gathered after taxes and social contributions, to capture the notion of take-home pay. To make the different income measures comparable across waves and to facilitate longitudinal analyses, the module gv_gross_net contains net income measures from reported gross incomes for SHARE wave 1. The instrument chosen to carry out this task is EUROMOD, the EU tax-benefit micro-simulation model.
A detailed description of the dataset and the method used is available in the SHARE Working Paper 25-2016.
6.13 What information does gv_children contain?
Information on the respondents’ children is collected in various parts of the SHARE questionnaire. The variables in the gv_children module were generated in an attempt to make this information more easily accessible to SHARE users. The module combines information from the Wave 6 CAPI modules CH, SN, SP and FT. Please be aware that the gv_children variables are an aggregate of information from within wave 6 but not of information from previous waves.
6.14 Which generated variables are stored in gv_dbs?
In addition to the CAPI variables included in the BS Module, some generated variables are already provided in gv_dbs. The most important one is dbs_values_exp (“Expected availability of laboratory results”). Results will only be available if (a) there is proof of written consent by the respondent, (b) the DBS sample is linkable to the CAPI interview via its barcode number, and (c) the DBS filter card contains enough blood material for at least one analysis. Given all these conditions are met, dbs_values_exp= 1. Further variables in gv_dbs are spots_nr (“Number of blood spots collected”), which ranges from 0 to 5, and spots_co (“Number of blood spots filling pre_printed circle”). The latter indicates how many of the blood spots contain the amount of blood covering the size of the pre-printed circle (1 cm in diameter) on the blood collection card.
6.15 What does gv_big5 contain?
In Wave 7, the 10-item Big-Five inventory (BFI-10) was introduced for the first time, an established personality inventory measuring the “Big Five” personality dimensions with two items each. Introduced by Rammstedt and John (2007) the BFI-10 is an ultra-short measure of personality suitable especially for multi-theme surveys in which assessment time and questionnaire space are limited. For further information on the “Big Five” measurement, please see the corresponding chapter in the Wave 7 First Results Book.
7. Weights
7.1 There are many SHARE papers where researchers don't use weights in data analyses. Is there a general strategy with this topic?
It is not easy to give a general strategy for this question. We refer the SHARE users to the recent paper by Solon, Haider and Wooldridge (2013). The authors distinguish between two types of empirical research: (i) research directed at estimating population descriptive statistics, and (ii) research directed at estimating causal effects (e.g. to achieve more precise estimates by correcting for heteroskedasticity, to achieve consistent estimates by correcting for endogenous sampling, and to identify average partial effects in the presence of unmodeled heterogeneity of effects). For the former, weighting is called for to make the analysis sample representative of the target population. The choice of using weighted sample statistics is intuitive and not controversial: population statistics can be consistently estimated by weighted sample statistics. For the latter, the question of whether and how to weight is more nuanced. Researchers have to be clear about the reason for using weighted estimation, think carefully about whether the reason really applies, and double-check with appropriate diagnostics. In situations where researchers might be inclined to weight, it often is useful to report both weighted and unweighted estimates and to discuss what the contrast implies for the interpretation of the results. It is also advisable to use robust standard error estimates.
7.2 Which weights should be used for cross-sectional analyses and which for longitudinal analyses?
SHARE provides calibrated cross-sectional and longitudinal weights. For cross-sectional analyses, the calibrated weight to be used depends on the basic sample unit of analysis. For example, in Wave 4, this is the variable cciw_w4 if the basic sample unit is the individual and cchw_w4 if the basic sample unit is the household.
For longitudinal analyses, the calibrated weight to be used depends on both the wave combination of interest (i.e. the waves used to form the panel) and the basic sample unit of analysis. For example, for the fully balanced panel (wave combination 1-2-3-4-5-6-7-8-9), this is variable cliw_a if the basic sample unit is the individual, and clhw_a if the basic sample unit is the household.
For longitudinal analyses based on different wave combinations, users are required to compute their own calibrated weights. To support users in this nontrivial methodological task, we provide a Stata ado-file called `sreweight.ado’ which implements the calibration procedure of Deville and Särndal (1992), and Stata do-files which illustrate step-by-step how to compute calibrated longitudinal weights at the individual and the household level. Further information is available here.
7.3 In section 8.7 of the Wave 4 Innovations & Methodology it is mentioned that a weighting do-file and a Stata command sreweight.ado are available for SHARE users. Where can I find them?
The ado file sreweight.ado as well as the other files provided for generating longitudinal calibrated weights can be downloaded from the regular SHARE download website (“Generate Calibrated Weights Using Stata”).
7.4 What is the difference between the weights across waves?
Sampling design weights may differ across waves because of changes in the national sampling designs. Calibrated cross-sectional and longitudinal weights are instead computed through the procedure of Deville and Särndal (1992) in all waves. The other main differences with respect to the previous waves are that: (i) we do not distinguish any more between alternative variants of the SHARE sample (i.e. main sample alone, vignette sample alone and the two samples combined); (ii) we do not provide any more calibrated cross sectional weights for non-responding partners because of substantive change in the imputation procedure used in wave 4; (iii) we do not provide any more calibrated longitudinal weights for all possible wave combinations of the panel.
7.5 Can we drop sample observations with missing weights?
Missing data in sampling design and calibrated weights may be due to (i) age-ineligibility (i.e. respondents younger than 50 years), (ii) missing sampling frame information, (iii) missing information on the set of calibration variables (age, gender, NUTS1 regional code), (iv) respondents not belonging to the selected balanced sample (only for calibrated longitudinal weights). Observations with missing weights due to (i) are not problematic if we want to make inference on the 50+ population. Since there are very few observations with missing weights due to (ii) and (iii), these observations can in general be dropped for substantive analysis of the SHARE data. Observations with missing longitudinal weights due to (iv) can be more problematic if the process generating missing observations is not missing-at-random (based on the chosen set of conditioning variables). Notice that, in order to compensate for attrition, users may exploit a larger set of conditioning variables by exploiting the information available from the starting wave. Alternative methods, such as weights based on the propensity score and sample selection models, could also be used to impose weaker assumption on the missing data mechanism associated with attrition.
8. Imputations
8.1 What is the imputation method?
Items can be imputed either sequentially by simple hot-deck method or jointly by the fully conditional specification method (FCS). Hot-deck imputations are carried out separately by country, while FCS imputations are carried out by country and sample type (singles and 3rd respondents, couples with both partners interviewed, and all couples - with and without non responding partners). For each wave and country, the FCS method is used only for the monetary variables that satisfy the requirement of having at least 100 donor observations in sample 1 (singles and 3rd respondents) and 150 donor observations in sample 2 (couples with both partners interviewed) and sample 3 (all couples - with and without non responding partners). Independently of the chosen imputation method, SHARE provides five multiple imputations of the missing values on each variable.
8.2 Can the imputed variables be used for longitudinal analysis?
Yes, but there could be various problems. First, users have to check if the variables of interest have been imputed in all waves of interest and if the underlying information is fully comparable across wave. Second, users should be aware that the imputation model does not include lagged variables from the previous wave as predictors of the missing values in the current wave. This implies that the imputation model could be less general than the model used to analyze the imputed longitudinal data (see Meng 1994 for a discussion of this uncongeniality issue). To address this issue, we plan to use a more general imputation model in the future releases of the SHARE data.
8.3 Why does the imputations module contain so many cases?
SHARE provides multiple imputations of the missing values so that users can account for the additional variability induced by the imputation process when assessing the precision of their estimators (see Rubin 1987). The method of multiple imputation implies that there are m>1 imputed values for each missing value. In SHARE, the number of multiple imputations is m=5. Thus, there are 5 independent imputations indexed by the variable implicat. Notice that the observations differ only with respect to the imputed values, but are identical with respect to the complete cases. Users who want to rely on single imputation methods (despite our warning of taking into account the variability induced by the imputation process) can select only one of the five available implicats. Since they are five independent draws from the estimated distribution of missing values, there is no specific reason to prefer one particular implicat to the others.
8.4 Why are the imputed variables in Release 5.0.0 onwards different from the ones of previous Releases?
As discussed in the documentation, there are differences in the basic raw data as well as important innovations in the imputation procedure. For the latter, there are major differences with respect to the imputation procedure adopted for previous release waves 1 and 2 data: (i) the way of dealing with the problem of non-responding partners, (ii) the use of a smaller set of aggregated variables, (iii) a lower number of predictors, and (iii) the use of two alternative measures of total household income. The aim of these changes is to have a more reliable imputation model, but it is difficult to assess the implications of all these differences in substantive analysis of the SHARE data.
8.5 Why are there two variables for household income?
Since Wave 2, SHARE collects data on two different definitions of total household income: thinc is the sum of individual imputed income for all household components, while thinc2 is the measure of total household income collected through the question HH017.
In our view, the choice between these two alternative measures is not obvious and therefore we let the users decide which of the two measures is more suitable for their research questions. Moreover, our imputation model exploits both measures. In that respect, we strongly encourage users to carry out sensitivity analysis on the two available measures. This may help us to understand which of the two measures can be considered more reliable on the basis of a scientific ground.
8.6 Are monetary amounts in the imputation data set Euro converted?
Yes, all monetary variables are expressed in annual Euro. This implies that when adjusting for the purchasing power parity (PPP) for non-Euro countries monetary amounts have to be converted first into local currency using the variable exrate and then in ppp-adjusted amounts dividing the local currency amount by the PPP-exchange rate.
8.7 What do the abbreviations in the variable htype mean?
1 S | = single household |
1 CNRP | = household with a couple and one non-responding partner |
1 C2R | = household with a couple, both are respondents |
1 CNRP + 1S | = household with a couple including one non-responding partner + one single person |
Multi S | = household that consists of several singles |
1 C2R + 1S | = household with a couple, both are respondents + one single person |
Multi C2R | = household that consists of several responding couples |
Multi CNRP | = household with several couples with one non-responding partner |