IFLS Data Updates, Data Notes, Tips, and FAQs
Important: Read First
IFLS users must read the User’s Guides (both volumes) and review the questionnaires before working with the data due to the complexity of the IFLS. Most questions users will have will be answered by reviewing those documents. Because not all information is repeated from wave to wave in the user’s guides, users should read the guides and documentation from earlier waves, especially if one is looking to exploit the longitudinal nature of the IFLS. Generally, even if one is only focusing on one wave (e.g., IFLS5), one still needs to pull data from earlier IFLS waves since full event histories are not repeated wave to wave.
Users should also look through the IFLS Data Updates, Data Notes, Tips and FAQs section before working with the data as questions that arise may already be answered in that section.
It is important for IFLS users to remember that IFLS files can have different units of observation (e.g., some are per household, some per individual, some per event) and some will have multiple records per unit of observation (e.g., multiple records for the mother in the Book 4 birth history). The User’s Guide appendices contain tables that show the unit of observation and the requisite record identification variables to help users understand what may be needed to merge files with differing units of observation together. In some cases, users may need to collapse multiple records per respondent to one record per respondent, for example, before merging on other data that is one record per respondent.
Users should look at the IFLS2 User’s Guide as it talks about family relationships and how to identify spouses, children, and parents across modules. That information holds for all IFLS waves. Also, chapters 4, 5 and 6 of the IFLS1 User’s Guide may be of interest as well since they talk about identifying and linking data among related individuals.
Also, users must remember that the IFLS utilizes many skip patterns and it is therefore crucial that users examine the IFLS questionnaires for the variables with which they are working to understand how a question is asked and who answers it. If one sees a variable with lots of blank values, it means that the variable is involved in a skip pattern and a large number of respondents did not meet the criteria to be asked that given question. It does not mean there is missing data. If a respondent said “I don’t know” or refused to answer or should have answered but did not, there is a missing value code to indicate those situations.
IFLS Data Updates and Data Notes
For a listing of post-public release updates and data notes to the IFLS databases, see:
General Tips and answers to FAQs
- Tracking HHs and Individuals
- Using PIDLINK: Linking individuals across the waves
- Special codes
- Meaning of variables that end with X
- Identifying geographic location of HH
- Linking HH and community data
- Can I visit IFLS respondents, facilities or communities?
- Are there price indices for IFLS1 and IFLS2?
- How do I sort out medical from educational facilities in the IFLS4 SAR data file?
- What is the answer key for the IFLS3-IFLS5 Book EK cognitive assessment files?
- What are the occupation code definitions used in the IFLS?
- What are the province, kabupaten, and kecamatan codes used in IFLS2?
- Deflators for IFLS waves
- Community Geocodes and BPS Geographic Codes
- What is the difference between COMMID and MKID?
- Why do the numbers of COMFAS Book 1 communities vary across waves?
- What is COMMID=12276 in IFLS4 HTRACK? What is COMMID=9999 in IFLS4 COMFAS? These do not appear in other IFLS waves.
- Which COMMID represent the twin EAS?
- Why does a variable have so many missing values?
- How do I use the unfolding brackets data in IFLS4/IFLS5?
- Are there facility weights for IFLS3 and IFLS4 like there are for IFLS5?
- Why does job status in IFLS4 (2007) differ from recall of 2007 job status collected in IFLS5? (Applicable to other IFLS wave pairings)
- Change in question about recent crops in IFLS4 vs IFLS5 Book 2 Section UT?
- How do I get a PIDLINK for a child in the Book 4 section CH birth history?
- What are Adult Education A, B and C which appear in the educational level codes?
- Are there composite scores for the Book EK cognitive test items?
- Are there composite cognitive scores for Book 3B sections CO and COB?
- How is the “best guess” age computed in PTRACK?
- How were waist and hip measured?
- What are the units for hemoglobin?
- What device and units were used for measuring lung capacity?
- Discrepancies in birth date reports across modules
- What do I do about cases where the person’s gender changes over time? What if age changes are inconsistent over time?
- Rupiah amounts collected in the household survey and COMFAS
- How do I use IFLS weights?
- How do I determine birth order?
- How is rural/urban defined in SC05? Does the migration history have a rural/urban indicator?
- Is there a scale for depression based on Book 3B Section KP?
- Were Book US respondents advised to see a doctor if they were hypertensive?
- Does IFLS have a scoring algorithm for the provider vignettes?
- What were the data collection dates for each IFLS wave?
TRACKING HHs and INDIVIDUALS
HTRACK AND PTRACK
The TRACKING FILES, HTRACK and PTRACK, are provided to facilitate using the longitudinal dimensions of the survey. All variables included in these files are drawn from interviews conducted in IFLS1 (1993) and IFLS2 (1997/8).
HTRACK: HOUSEHOLD TRACKING FILE
HTRACK contains a list of all HOUSEHOLDS that were interviewed in IFLS1; they are identified by the 1993 HOUSEHOLD identifier, HHID93. It is a 7 digit string variable. The first 3 digits are the enumeration area in which the household resided in 1993 and the next two digits are a household sequence number within that enumeration area which uniquely identifies the household. The last two digits are always '00'. (The first 5 digits of HHID93 are the same as the last 5 digits of CASE, the HH identifier variable in the original IFLS1 release.)
In IFLS2, households are identified by HHID97. If an IFLS household was found intact in 1997, it was assigned the same identifier in 1993 and 1997; in this case HHID93=HHID97. If the household had split up between 1993 and 1997, then when the first respondent from that household was re-contacted in IFLS2, that respondent's household was designated the 'original' household and the 1993 household identifier assigned to it. Each additional household that was spawned by that HHID93 was given a new HHID97. The first 5 digits of the new HHID97 are identical to the first 5 digits of HHID93 (and, therefore, all new households in 1997 that are spawned by one 1993 household share the same first 5 digits in their HHIDs). The last two digits of HHID97 are 1 (in column 6) and then a sequence number starting at 1 (in the 7th column); these digits tells us this is a split-off household. Thus the last two digits of HHID97 are '00' for the first household found in 1997 and then '11' for the first split off, '12' for the second split off and so on.
For example, say HHID93 is 2071900. This is household 19 from enumeration area 207. The household split into 3 households between 1993 and 1997. The three households are assigned HHID97 2071900 (for the first HH relocated), 2071911 (for the first split-off HH that was relocated) and 2071912 (for the second split-off HH that was relocated).
There were 7,730 households in the target sample for IFLS1. Of those, interviews were completed with 7,224 households. These households are included in HTRACK. For information on the 506 households that were listed but never interviewed, see Book K in IFLS1.
IFLS2 sought to re-interview all 7,224 IFLS1 households. Around 6% of the target households were not interviewed. The results of our attempts to re-interview all households are summarised in RESULT97.
In addition to the approximately 6,750 'ORIGIN' households that were interviewed in IFLS1 and IFLS2, over 850 'SPLIT OFF' households were interviewed in IFLS2. These are households in which a TARGET respondent who had moved out of an IFLS1 household was interviewed. There are slightly over 7,600 households in IFLS2 that completed a household roster. These households, in combination with the households that were not found in IFLS2 make up the 8,116 households in HTRACK.
In 1997, we discovered 9 of the IFLS1 households had combined with another IFLS1 household. The original household members were interviewed in the new household.
In a small number of households, it was determined that all the members of the household had died by 1997. These were typically one or two member households in 1993 and the members in the 1993 household were typically relatively old. There were, however, a small number of households in which 1993 household members were still alive in 1997 but the household was treated as if all members had died. These cases arose because the TARGET individuals in the household had died by 1997 and the interviewers mistakenly thought they did not need to track the remaining members who had moved away.Back to top ⤴
PTRACK: PERSON TRACKING FILE
PTRACK is a person-level file that tracks all IFLS respondents across waves of the survey. PID93 is a two digit sequence number identifying each individual within a household. The combination of HHID93 and PID93 uniquely identifies every respondent in IFLS1. It may be used to link records within IFLS1. If an IFLS1 respondent was found in the original household, HHID97=HHID93 and PID97=PID93. All new respondents in IFLS2 are assigned PID97 that begins after the highest PID93 for that household. In split-off households, PID97 was assigned starting at 01 for the household head. The combination of HHID97 and PID97 uniquely identifies every respondent in IFLS2. It may be used to link records within IFLS2.
HHID93 and PID93 or HHID97 and PID97 should NOT be used to link respondents across waves of IFLS.
PIDLINK: LINKING RESPONDENTS ACROSS WAVES
Several individuals have moved across households between IFLS1 and IFLS2. In order to link records for a particular individual across waves of the IFLS, use PIDLINK. It is a unique person-level identifier which is the same in IFLS1 and IFLS2 for a particular individual. PIDLINK is a string variable comprising 9 digits. For a respondent in IFLS1 and IFLS2, PIDLINK is made up of HHID93 followed by 00 (denoting an original household member) and then PID93, the person identifier in IFLS1. If the respondent has moved from his or her original household, PIDLINK will retain the information necessary to identify the original household. For a new respondent in IFLS2, PIDLINK is made up of HHID97 and PID97.
PTRACK contains one record for every respondent. Some respondents were interviewed in both IFLS1 and IFLS2, some were interviewed only in IFLS1 and some were interviewed only in IFLS2. Note that in BK_AR1, a respondent may appear more than once in a roster since all 1993 household members are listed in the 'ORIGIN' roster. A respondent who has moved out will be designated thus (AR01A_97=3). If that respondent has been found in a new household, then AR01A_97 will equal 4. Since this respondent is found in 2 different households in 1993 and 1997, HHID93 and HHID97 are different. PID93 and PID97 will also be different in general. PIDLINK, however, remains constant.
Continuing the example of HHID93=2071900, there were 5 members in the household in 1993. In 1997, persons 1, 2 and 4 were still there but persons 3 and 5 had moved out. Person 3 was found in HHID97=2790911 and person 5 was found in HHID97=2790912.
The PTRACK records for this household are as follows:
PIDLINK HHID93 PID93 HHID97 PID97 207190001 2071900 1 2071900 1 ) Original HH 207190002 2071900 2 2071900 2 ) Persons 3 and 207190003 2071900 3 2071911 3 ) 5 have split 207190004 2071900 4 2071900 4 ) off and are found 207190005 2071900 5 2071912 2 ) elsewhere. 207191101 2071911 1 ) First split off 207191102 2071911 2 ) (207190003 is in 207191104 2071911 4 ) this HH.) 207191201 2071912 1 ) Second split off (207190005 is in this HH.)Back to top ⤴
The following values are reserved and have a special meaning:
Numeric Alphanumeric Meaning 5 V Top coded/out of range 6 W Not applicable 7 X Refused to answer 8 Y Don't know 9 Z Missing
Numeric special values are preceded by as many 9s as necessary to fill the field and yield an unambiguous value. For example if a field is 4 digits wide, 9998 indicates the respondent did not know the answer.Back to top ⤴
"X" VARIABLE CONVENTION
Since special values that are embedded in continuous variables are tedious to deal with, in many cases a continous variable, VAR, say, is accompanied by another variable, VARX which contains the special codes. If a valid value of VAR is recorded, VARX is set to 1. The other values of VARX provide information about why there is not a valid value.
In some cases, VARX contains information about the unit VAR is recorded in. This is common, for example, when dealing with distances, times, frequencies and so on.
In many cases, VARX does not appear in the questionnaire because the question was not asked of the respondent in this way. The variables have been created ex post to assist users with the data. In general, if you are interested in VAR you should always check to see if there is an associated VARX and use the variables in combination.Back to top ⤴
VARIABLE "VERSION" indicates DATASET VERSION
The variable VERSION identifies the release version of these data; it will be updated with each revision of the data and can be used to confirm that you are using the most recent version of the data. If you send questions to ifls-supp, please tell us the data set version that you are using.
In SAS: data _null_; set lib.bk_cov; file 'version.not'; if _n_=1 then put version; In STATA: use bk_cov list version in 1/1Back to top ⤴
LOCATION OF HOUSEHOLD
The enumeration area in 1993 (digits 1-3) in the HHID is not the location of residence of the household in 1997 (unless the household has not moved) and should not be used to determine geographic location of the household. 1993 location was built into the 1993 HH identifier, CASE. It is not built into HHID93 or HHID97.
Location information is recorded in module BK_SC in each wave of the survey. A summary is included in HTRACK. Location in 1993 is recorded in SC01_93 through SC05_93; the 1993 location codes are based on the 1993 BPS codes. Some of these codes have been changed by BPS (because the community boundaries have been re- defined, for example). The 1998 BPS codes for the location of each of our respondents are recorded in the revised kabupaten code, SC02_93R, and the revised kecamatan code, SC03_93R. (There are no revised province codes.)The 1997 location of the respondents is recorded in SC01_97 through SC05_97. These locations use the 1998 BPS codes and so may be directly compared with the revised 1993 codes, SC02_93R and SC03_93R.
MOVER97 is intended to summarise the location of the respondent in 1997, relative to the location in 1997. It is defined only for those respondents interviewed in both waves of the survey.Back to top ⤴
LINKING HH AND COMMUNITY LEVEL DATA
Commid is the variable that should be used to link household survey data with the community and facility data. COMMID93 identifies the community of residence of the respondent in 1993. COMMID97 is the 1997 community of residence. COMMID93 will be the same as COMMID97 if the respondent has not moved between the waves of the survey.Back to top ⤴
CAN I VISIT RESPONDENTS, FACILITIES OR COMMUNITIES?
No, the names, addresses, locations and neighborhoods or all IFLS respondents and facilities are strictly confidential. When respondents participate in the survey, they are given an assurance that their answers are confidentially and that their identity will not be revealed to anyone other than through an anonymous code.
The IFLS data are placed in the public domain to support research analyses. As a user of the IFLS public use files, you are expected to respect the anonymity of all our respondents. This means that you will make no attempt to identify any individual, household, family, service provider or community other than in terms of the anonymous codes used in the IFLS.
We take protection of the confidentiality of our respondents very seriously. However, we recognise that for some research questions, it may be necessary to know more about a respondent, facility or neighborhood than is available in the public use files. In such an instance, please send email to firstname.lastname@example.org. briefly explaining what research question you plan to address, why you need the identifying information and what you will do with that information. If your request does not violate our Human Subjects Protection rules, we will describe the process that you have to go through in order to obtain permission for the information to be released to you.Back to top ⤴
ARE THERE PRICE INDICES FOR IFLS1 AND IFLS2?
General CPI by province 1993 to 1997, 1986 is base year.
The source is the Central Bureau of Statistics (BPS) in Indonesia. Contact BPS (www.bps.go.id) for other indices that are available.
Prices are collected in the province capitals only. There are 22 cities included in the series from 1993 to 1997.
Thomas, Frankenberg, Beegle and Teruel discusses some of the problems associated with the BPS prices -- and, in particular, the fact that they are only available for urban areas.
1986=100 provcode CPI year 11 204.9 1997 12 216.2 1997 13 195.7 1997 14 200.6 1997 15 212.2 1997 16 195.8 1997 17 189.6 1997 18 208.2 1997 31 223 1997 32 203.2 1997 33 198.9 1997 34 205.7 1997 35 218.1 1997 51 217.8 1997 52 221.1 1997 53 196.7 1997 54 202.1 1997 61 214 1997 62 201.2 1997 63 220.7 1997 64 220.6 1997 71 205.2 1997 72 198.6 1997 73 192.8 1997 74 216.4 1997 81 272.9 1997 82 206.3 1997 11 192.1 1996 12 198.9 1996 13 182.4 1996 14 192.3 1996 15 202.1 1996 16 184.5 1996 17 180.2 1996 18 196.2 1996 31 207.9 1996 32 190.7 1996 33 190.6 1996 34 199.6 1996 35 204.4 1996 51 211.2 1996 52 208 1996 53 183.3 1996 54 191.4 1996 61 202.4 1996 62 190.3 1996 63 213.8 1996 64 212.1 1996 71 197.4 1996 72 186.8 1996 73 184.4 1996 74 205.8 1996 81 257.2 1996 82 193.2 1996 11 176 1995 12 185.4 1995 13 168.3 1995 14 179.7 1995 15 187.5 1995 16 170.2 1995 17 169.7 1995 18 179.8 1995 31 189.8 1995 32 179.3 1995 33 175.7 1995 34 182.1 1995 35 188.1 1995 51 199.7 1995 52 191 1995 53 171.5 1995 54 177.8 1995 61 187.6 1995 62 174.3 1995 63 196.5 1995 64 193.7 1995 71 175.1 1995 72 171.9 1995 73 169 1995 74 191.9 1995 81 236.9 1995 82 180.6 1995 11 161.5 1994 12 171.3 1994 13 154.8 1994 14 162.7 1994 15 172.7 1994 16 156.1 1994 17 153.6 1994 18 165.7 1994 31 171.7 1994 32 164 1994 33 165 1994 34 167.7 1994 35 173.7 1994 51 185.9 1994 52 175 1994 53 161 1994 54 165.6 1994 61 171.8 1994 62 164.6 1994 63 180.6 1994 64 178.1 1994 71 159.2 1994 72 159.3 1994 73 155.9 1994 74 175.3 1994 81 220.9 1994 82 163 1994 11 147.1 1993 12 156.1 1993 13 141.6 1993 14 148.2 1993 15 158.3 1993 16 143.6 1993 17 139.2 1993 18 152.8 1993 31 155.7 1993 32 149.4 1993 33 150.8 1993 34 152.5 1993 35 157.7 1993 51 169.1 1993 52 160.3 1993 53 147.5 1993 54 156.4 1993 61 158.5 1993 62 152.3 1993 63 167.2 1993 64 165.1 1993 71 144.6 1993 72 146 1993 73 142.3 1993 74 157.3 1993 81 204.2 1993 82 150.3 1993Back to top ⤴
HOW DO I SORT OUT MEDICAL FROM EDUCATIONAL FACILITIES IN THE IFLS4 SAR DATA FILE?
Because the questions were the same, one SAR file was made that combined the medical facilities on page 2 with the education facilities on page 3. In the SAR file, the way to tell the education facility records from the medical facility records is to look at the 5th digit of the FCODE07 variable (page 12 of the IFLS4 users guide vol 2 describes the FCODE07 variable). Codes of 6, 7 and 8 in the 5th digit of FCODE07 represent the education facilities (6=elem,7=jrh,8=srh). Codes of 1-5 and 9 in that 5th digit represent medical facilities (0=traditional healers,1=puskesmas,2=priv doctor, 3=bidan/perawat, 4=posyandu, 5=health post for the elderly, 9=hospitals).
The X14b1 variable gives you the more detailed type of facility once you've controlled for that 5th digit in FCODE07. Note that for medical facilities X14b1 values are on the bottom of page 2 and for the educational facilities, x14b1 values are on bottom of page 3 of the SAR questionnaire.
Facilities in the SAR that were preprinted are those with a blank value in INFORSRC. Facilities that were added to the preprinted SAR are those where INFORSRC has a positive value. INFORSRC is item X03 in the SAR questionnaire. Note that the first page of the SAR questionnaire was for new facilities added and the second page (the one where the items start with J) was the preprinted SAR page.Back to top ⤴
WHAT IS THE ANSWER KEY FOR THE IFLS3-IFLS5 BOOK EK COGNITIVE ASSESSMENTS FILES?
The Answer Key for the IFLS4 and IFLS5 Book EK is the same as that for IFLS3, only that IFLS4 and IFLS5 did not ask a few questions that were asked in IFLS3.
Note that there are variables in IFLS3, IFLS4 and IFLS5 that tell you if the answer is correct, so an answer key is not really needed. in IFLS3 and IFLS4, the EK1 and EK2 file variables that are EKnn show the answer selected by the respondent and the variables that are EKnnX (i.e., end in X) tell you if the person answered the question correctly. Thus, you can easily determine what the correct answers are by looking the value of EKnn when EKnnX=1. An EKnnX=3 value means the respondent answered EKnn incorrectly, an EKnnX=6 means not applicable (i.e., the question was not asked), and an EKnnX=9 means the question was not answered. In IFLS5, the EKnnX is just “able to answer” and it is the EKnn_ANS variables that tell if the question was answered correctly or not.
For those who want to see an answer key for IFLS3 to IFLS5, it is presented below.
BEK_EK1 : AGE 7-14 Question Answer EK0 D, not output in 2014 EK1 E EK2 F EK3 A EK4 D EK5 C EK6 B EK7 E EK8 B EK9 C EK10 B EK11 C EK12 E EK13 B EK14 C EK15 C EK16 B EK17 C BEK_EK2 : AGE 15-24 Question Answer EK0 A, not output in 2014 EK1 E EK2 F EK3 A EK4 D EK5 C EK6 B EK7 not asked in 2007, 2014 EK8 not asked in 2007, 2014 EK9 not asked in 2007, 2014 EK10 not asked in 2007, 2014 EK11 C EK12 E EK13 not asked in 2007, 2014 EK14 not asked in 2007, 2014 EK15 not asked in 2007, 2014 EK16 not asked in 2007, 2014 EK17 not asked in 2007, 2014 EK18 B EK19 D EK20 C EK21 D EK22 BBack to top ⤴
WHAT ARE THE OCCUPATION CODE DEFINITIONS USED IN THE IFLS?
The occupation codes used in all waves of the IFLS are described in the IFLS1 Household Codebook Appendix A. Four new codes were added after IFLS1. Those codes are:
M1 Military (split out from MM in IFLS2 onward) M2 Police (split out from MM in IFLS2 onward) X2 Miscellaneous production labor (added IFLS2 onward) XX Insufficient information to assign to category (added IFLS2 onward)Back to top ⤴
WHAT ARE THE PROVINCE, KABUPATEN, AND KECAMATAN CODES USED IN IFLS2?
The IFLS2 used 1998 BPS codes for province, kabupaten and kecamatan codes. Unlike the other IFLS waves, the IFLS2 documentation did not include a listing of those codes in an appendix. To help users, we have added to the IFLS2 set of download files a file with one record per kecamatan which has the province and kabupaten codes as well. For each code value there is the name of the given geographic unit. Note that to identify a given kecamatan, you need the province and kabupaten codes; to identify a given kabupaten, you need the province code as well. The file is available in Stata and SAS export formats.Back to top ⤴
DEFLATORS FOR IFLS WAVES
While there are regional deflators for IFLS2 and IFLS3 that were used in the aggregate consumption data available on the IFLS data download page, we do not have anything comparable for IFLS1, IFLS4 or IFLS5. Under IFLS Updates/FAQs on the IFLS data download page, we do have a set of 22 provincial capital CPI for 1993 to 1997, but nothing for rural areas.
You should check with Statistics Indonesia (BPS) (www.bps.go.id) for regional deflators. You would want the Consumer Price Indices (CPI)/Indeks Harga Konsumen (IHK) series. There is an urban CPI series and a rural series. The urban series collect prices from 40+ major cities (i.e. provincial capital + other major cities), the respondents were urban households; for the rural series the respondents were farmers/farm workers and it is a province-level series.Back to top ⤴
COMMUNITY GEOCODES AND BPS GEOGRAPHIC CODES
Community coordinates and village-level BPS codes are restricted data and must be separately requested. Information on how to apply and what materials are required to request IFLS restricted data is available here [PFD].
Note that in IFLS5, we have longitude/latitude coordinates for all communities (i.e., EAs defined by MKID) in which IFLS5 interviewed households reside. In all other waves, we only have the coordinates for those original 312 communities (EAs). For households not living in an original IFLS community (EA) in IFLS2, IFLS3 or IFLS4, if the household did not move between that earlier wave and IFLS5, the IFLS5 coordinates should apply to that residence in that earlier wave. For example, if an IFLS3 household did not live in an original IFLS community at IFLS3 but that household never moved between IFLS3 and IFLS5, then the IFLS5 community coordinates should apply to the household’s IFLS3 and IFLS4 communities of residence.
The reference geographic coordinate system that was used for the IFLS5 is the WGS84. The devices used were Garmin Etrex 10 (handheld) and GPS Glonas Ublox 7 (dongle). This likely is the same reference geographic coordinate system used for earlier IFLSes.
In all waves of IFLS public release data, BPS province, kabupaten, and kecamatan geographic codes are provided for locations collected (e.g., current residences, migration histories). Because BPS geographic codes change over time, a cross-walk has been provided under IFLS5 documentation as Volume 14 for these three levels of geographic codes that cover codes across 1998, 1999, 2000, 2007 and 2014 (note this excludes 1993 codes).
The HTRACK file for IFLS2 has the IFLS1 1993 location in 1998 BPS codes to match those use in IFLS2. The province codes did not change so the SC01_93 codes use the same codes as those in the 1998 BPS codes. The SC02_93r and SC03_93r variables have the 1998 BPS codes for the kabupaten and kecamtan of the 1993 IFLS1 location. So for residences in 1993 you have a crosswalk through time using the IFLS2 HTRACK data.
We do not have a formal 1993 to 1998 BPS crosswalk. The crosswalk data recently made available was prepared during IFLS3 for cross-walking to IFLS2’s location codes and carried on from that point which is why it starts with the 1998 BPS codes.
Unlike the higher-level BPS geographic codes, there is no cross-walk for BPS code changes in village-level codes over time. BPS village-level codes are only available as IFLS restricted data for residences at each IFLS wave and for migration histories in IFLS1, IFLS3, IFLS4 and IFLS5. The lack of a cross-walk through time can make these codes difficult to use in matching to other sources of village-level data, such as the PODES. IFLS village names are never released, not even as restricted data.Back to top ⤴
WHAT IS THE DIFFERENCE BETWEEN COMMID AND MKID?
Please read the IFLS user's guides for a discussion of what COMMID and MKID represent and how they were created. As covered in greater detail in that discussion, MKID is an EA (enumeration area) while COMMID is only an EA for those original IFLS sample communities.Back to top ⤴
WHY DO THE NUMBER OF COMFAS BOOK 1 COMMUNITIES VARY ACROSS WAVES
The original IFLS sample of communities was comprised of 321 EAs where 9 EAs were in the same community as another EA so only one COMFAS needed to be done for each of those pairs, which led to 312 communities being administered a COMFAS in IFLS1. By IFLS2, there had been a mass move of households in one EA due to a military base reassignment, so an additional EA was added to reflect that relocation (this is all discussed in the IFLS2 User’s Guide) resulting in 313 communities being administered a COMFAS in IFLS2. In IFLS3, the COMFAS was administered in 311 communities as the village head was not able to be interviewed in two of the 313 communities (this is noted in the IFLS3 User’s Guide).Back to top ⤴
WHAT IS COMMID=1227 IN IFLS4 HTRACK? WHAT IS COMMID=9999 IN IFLS4 COMFAS? THESE DO NOT APPEAR IN OTHER IFLS WAVES.
In IFLS3 the original EA 1218 split into two villages, some households (a few more than a half) remain in the original village, but the rest were in the new village, still in the same kecamatan. The first group of HHs get to keep the COMMID 1218, the second group of HHs, in the new village, were assigned COMMID 12GA. Only one CF interview was done, and that was done in the village corresponding to the COMMID1218, not COMMID 12GA.
In 2007 when we visited the area, we conducted two CF interviews, one in the COMMID 1218, and another one in the COMMID 12GA.The 10 households in 12GA were assigned new COMMID 1227 (not used before). So there was an extra CF interview, but in the CF file instead of changing 9999 (missing) to 1227, it was kept as 9999.
In 2014, we decided that that was not how to treat an original village that splits. Instead, when an original village splits into two, we go back to only conducting one CF interview and it was done in the village where most of the household lives. So in 2014 we also did not conduct CF interview in COMMID 1227/12GA/9999.Back to top ⤴
WHICH COMMID REPRESENT THE TWIN EAS?
As noted in the User’s Guide, COMMID is defined at the enumeration level for the original IFLS sample of communities, except for the nine twin EAs, for whom each of the two EAs are combine into one COMMID as these pairs of EAs were right next to each other and shared the same village head at the time of IFLS1. Below are the COMMID and the two EAs represented by each:
WHY DOES A VARIABLE HAVE SO MANY MISSING VALUES
The IFLS is a complex survey that includes many skip patterns. When a variable appears to have a large number of blank values it is because this variable is not something that is asked of every respondent. You must review the relevant IFLS Book questionnaire to understand who is asked the question and who is not. In the IFLS, blank values are used for questions that are not asked of a respondent. If a respondent said “I don’t know” or refused to answer or should have answered but did not, there is a missing value code to indicate those situations.Back to top ⤴
HOW DO I USE THE UNFOLDING BRACKETS DATA IN IFLS4/IFLS5
In IFLS4 and IFLS5, unfolding brackets were used to get an estimate of items like income where the respondent couldn’t give an exact amount but did have some knowledge of what range the amount might have been fallen. Book 2 is the section in which brackets were most commonly used in those two waves.
The IFLS5 user's guide vol. 2 explains how to work with these types of questions. See pages 13-15.Back to top ⤴
ARE THERE FACILITY WEIGHTS FOR IFLS3 AND IFLS4 LIKE THERE ARE FOR IFLS5?
While only IFLS5 has a facility weight in the COMFAS data, a similar weight can be constructed for IFLS3 and IFLS4.
The facwght14s in IFLS5 is just the inverse of the simple probability of the given facility being selected among the eligible facilities of its type. That list of eligible facilities of a given type is the list of facilities of that type in the IFLS5 COMFAS SAR data file that are currently open in 2014. For example, if the SAR shows 10 schools of a given type (elementary, junior high, or secondary) currently open in the community and 4 schools of that given type were interviewed in that community, then the probability of selection is 4/10 or 0.40 and the inverse probability (i.e. the weight for each interviewed school of that type in that community) is 1/0.4 or 2.5.
So, one could create the same simple weight for IFLS3 and IFLS4 using the list of facilities of a given type currently open in the given community at the time of the given IFLS wave based on the given wave’s SAR and the number of that type of facility interviewed in that wave in that community. This would be done for each of the 3 types of school (elementary, junior high, and secondary).Back to top ⤴
WHY DOES JOB STATUS IN IFLS4 (2007) DIFFER FROM RECALL OF 2007 JOB STATUS COLLECTED IN IFLS5 (APPLICABLE TO OTHER IFLS WAVE PAIRINGS)
Remember that recall is not perfect and becomes more problematic as the years between the event and the recall point increase. Those who worked sporadically for very short periods in 2007 might not remember working when asked in 2014; if the 2007 interview was early in the year, a person not working at the 2007 interview might have gotten a job later in 2007; people with more than one job in 2007 may not be talking about the same 2007 job in 2014 as they talked about in 2007; those who begin to realize if they say they worked in a past year will have to answer more questions, might start saying “no, didn’t work” to speed up the interview process; those who realize they may not have to repeat as much if they say “same as year before”, might just say “same, same” so it may not refer to the actual work done in that year. Many things might result in a difference between retrospective reports and contemporaneous reports from those earlier time periods.
It's up to the analyst on how to deal with retrospective reports that appear to differ with contemporaneous reports in past rounds. We cannot provide any advice as it is up to each analyst to decide how to handle recall issues.Back to top ⤴
CHANGE IN QUESTION ABOUT RECENT CROPS IN IFLS4 VS IFLS5 BOOK 2 SECTION UT
There was a change to the questionnaire in IFLS5 from IFLS4 and only the most recent crop was asked about in IFLS5 as opposed to the three most recent in IFLS4.
The IFLS5 Bahasa questionnaire, which is the one actually administered, specifically asks only about the most recent crop.
The IFLS5 English questionnaire was edited from the IFLS4 English version to save time and effort. Apparently, that question change was accidentally overlooked and the IFLS5 English questionnaire text on that line was not updated to reflect the change.
Generally, questions are shortened in order to reduce respondent burden and to save time.Back to top ⤴
HOW DO I GET A PIDLINK FOR A CHILD IN THE BOOK 4 SECTION CH BIRTH HISTORY?
Starting with the Book 4 CH birth history in IFLS1, for those children living with the Book 4 respondent at IFLS1, you can get the child’s household roster PID from question CH27b. With that PID and the mother’s HHID, you can find the child’s PIDLINK on the IFLS1 household roster. For children not living in the household at IFLS1, there will be no PIDLINK value. For subsequent IFLSes, the Book 4 will only have births since the previous wave in which the Book 4 respondent was interviewed, and the same process can be used to get PIDLINKs for those children who are living with the respondent at the time of interview.
For children who have never been living with a Book 4 respondent at the time of an IFLS interview, there will be no obvious PIDLINK for them. There may be a few cases where a child who lived with the Book 4 respondent moved away and was followed and in their new household they have a sibling that did not live with that child in a previous household. In those cases, there would be a PIDLINK for that sibling as they are now in an interviewed IFLS household. You would have to use birthdate/age/gender information for that sibling to look back at the mother’s Book 4 birth history to see which child that sibling is among those listed in the birth history.Back to top ⤴
WHAT ARE ADULT EDUCATION A, B AND C WHICH APPEAR IN THE EDUCATIONAL LEVEL CODES?
A completion of Kejar Paket A (Adult Education A in IFLS) is the equivalent of a completion of primary school, completing Kejar Paket B (Adult Education B) is equivalent to completing junior high, and completing Kejar Paket C is equivalent to completing senior high.
Individuals with Kejar Paket A certificate are supposed to be eligible to enroll in junior high school or in Kejar Paket B program, those with Kejar Paket B certificate are supposed to be eligible to enroll in senior high school (or in Kejar Paket C program), and those with Kejar Paket C certificate are supposed to be eligible to take national examination to enter universities.Back to top ⤴
ARE THERE COMPOSITE SCORES FOR THE BOOK EK COGNITIVE TEST ITEMS?
There are no final or composite scores for the cognitive test items. The cognitive tests in IFLS3, IFLS4 and IFLS5 are based on the Raven's Colored Progressive Matrices tests and some numeracy questions. Please read the User’s Guides for details. Due to time constraints, the IFLS team wanted to reduce the number of questions but still allow for sufficient scope in difficulty, so questions were selected to run a range from simple to harder. They did do some pretesting to determine the questions selected, but there are no published results for that pretesting. Basically, it's up to users to decide how to make use of these scores.
The Ravens-based test was not used till the 2000 IFLS3 survey. As noted in the IFLS3 users guide, the IFLS3 EK module was completely revamped from the IFLS2 version so comparison between IFLS2 (1997) results and IFLS3/4/5 (2000/2007/2014) results is not really possible.Back to top ⤴
ARE THERE COMPOSITE COGNITIVE SCORES FOR BOOK 3B SECTIONS CO AND COB?
The CO section, as noted in the IFLS5 User’s Guide vol. 1, is based on the Telephone Survey of Cognitive Status (TICS) but it is not the TICS, which has more items. Users will have to decide how they want to score up the responses in the CO section.
For the COB section, there is a composite correct score based on psychometric analysis done by Dr. John McArdle (as explained in the IFLS5 User’s Guide vol. 1). That score is the variable W_ABIL in the B3B_COB data file. The standard error of measurement for that score is in the variable SEM. This is the same score used in the Health and Retirement Survey (HRS). Please review the paper in the Numeracy_Results.zip file that is vol. 12 of the IFLS5 documentation.Back to top ⤴
HOW IS THE "BEST GUESS" AGE COMPUTED IN PTRACK?
Here’s the basic algorithm used for getting an estimate of birth date.
Step 1: If a full birthdate (month, day, year) was available, it was used. The order of hierarchy in checking for a full birth date was to first check Book 3a, then 3b, then 4, then 5, and then US last. The first full birth date found was used.
Step 2: If no full birthdate was found at Step 1, then they checked for month and year being available checking Book 3a then 3b, 4, 5, K, and then US. The first month/year that was found was used. The day was then assigned as 15.
Step 3: If no month and year of birth was found at Step 2, they checked the reported ages in Book 3a, 3b, 4, 5, and K (ar09) for consistency. If they were the same (mean age=ar09), then age was used to estimate a birth date. The basic SAS code structure was bdate_estimate_in_days= interview_date_in_days - (mean_age + .04)*365.25). The .04 adds 2 weeks to the mean age before multiplying by days in a year. That estimate of the birthdate in terms of days since Jan 1, 1960 can then be covered to a month, day and year date.
Step 4: If the ages across book covers were not consistent at Step 3, then they checked for year of birth looking first at book US, then K, then 3a, 3b, 4 and then 5 last. The first birth year found was used. With that birth year, they assigned a month and day of July 1.
Step 5: If for inconsistent ages across book covers there were no birth years reported on any book in Step 4, then mean age was used to estimate a birthdate using the formula listed in Step 3.
Step 6: If there were no ages or birth years reported in any book, then they used the best guess age from the previous wave to estimate a birth date using the interview date from that wave in the formula used in Step 3.Back to top ⤴
HOW WERE WAIST AND HIP MEASURED?
A measuring tape was used for waist and hip circumference. The measurement was taken over clothes, but not heavy clothes, and not done over bare skin.Back to top ⤴
WHAT ARE THE UNITS FOR HEMOGLOBIN?
Hemoglobin (Hb) measurements in the US section of the IFLS household survey are in g/dL (grams/deciliter).Back to top ⤴
WHAT DEVICE AND UNITS WERE USED FOR MEASURING LUNG CAPACITY?
Lung capacity was measured as Peak Expiratory Flow (PEF) in L/min units (liters per minute) by a Personal Best Peak Flow Meter. This device was used in each wave from IFLS2 thru IFLS5.Back to top ⤴
DISCREPANCIES IN BIRTH DATE REPORTS ACROSS MODULES
Please review the information about birth date variables as discussed in the IFLS Users Guide, volume 2.
In the PTRACK file, as discussed in the user's guide, we have included our best guess as to the person's birthdate in an attempt to deal with the fact that people don't always know their birth date and thus they may say something different each time they are interviewed. Also in the roster (the AR section of Book K), the roster respondent reports the birthdate in AR08 and they may or may not know the actual date and give a guess as well. The DOB in the Book 3 cover is supplied by the responded and in Book 5 it may be the mother or the child depending on who answered the module.
You might want to use the PTRACK birth date when there are discrepancies between the roster and the Book 3/5 covers.Back to top ⤴
WHAT DO I DO ABOUT CASES WHERE THE PERSON’S GENDER CHANGES OVER TIME? WHAT IF AGE CHANGES ARE INCONSISTENT OVER TIME?
Some inconsistencies will arise over the years due to typos, some caught and corrected, some not. In the cases of people with gender inconsistencies who became part of a split-off household, preprinted rosters were not used for split-offs, so there was no easy check against the last gender report. Thus a gender typo might infrequently occur that was not subsequently caught. Also, if the gender had been recorded incorrectly previously, it might be corrected in a later roster.
Note that the user can check the Book 4 results to see if any of those gender inconsistency cases were interviewed as an EMW, as those cases would be female. In the case of children who were reported in a Book 4 birth history, one could check the gender listed in that birth history. The user could also see if any of the inconsistent cases are listed in the AR10 (father id) or AR11 (mother id) fields which might help one determine the correct gender code. As for the others where they may not be any cross-check information from other sources, the user have to decide how to deal with the inconsistency.
There will be some cases with age inconsistencies through time as well for a person. Again, the user will need to decide what to do there as well, although the birth date info in the IFLS5 ptrack is the "best guess" estimate, so one could use that to adjust inconsistent ages for those who have an inconsistency over time.Back to top ⤴
RUPIAH AMOUNTS COLLECTED IN THE HOUSEHOLD SURVEY AND COMFAS
Rupiah amounts were collected in nominal terms. Due to high inflation rates that occurred at various times during the 1993 to 2014 period, seemingly large changes may if one compares these nominal values across IFLS waves. For example, it may appear that income has grown tremendously between waves when in real terms (i.e., adjusted for inflation), income may not have risen that much. Thus, one must adjust rupiah amounts for inflation to compare across waves.
Users should also be aware that the units in which an amount is collected might change between waves. For example, an amount in IFLS1 might have been recorded in thousands of rupiah and in IFLS2 might have been recorded in just rupiah. Thus, one should always compare questionnaires across waves to ensure that a given question is the same across waves, and if not, how to revise it to make it the same across waves.Back to top ⤴
HOW DO I USE IFLS WEIGHTS?
The weights are described in the user’s guide and it tells you which to use when you want to correct for attrition. The weights are inverse probability weights. The weighting procedures in the software you are using will generally tell you how to use such weights.
The IFLS weights do not need to be normalized.
Note that in STATA, if you use the weights in HTRACK and PTRACK, you do not need to use the strata or psuid options in the svyset command as the weights already correct for that.
The panel weights are really only for longitudinal analyses of the panel sample, which is just those who were there at IFLS1.
If, for example, you were looking at what happened by IFLS4 for those present at IFLS3, you would likely use the IFLS3 cross-sectional weight even though one is using the longitudinal nature of the IFLS since the focus is not on just those present at IFLS1.
When we say the cross-sectional weights make the IFLS4 data representative of the 2007 Indonesian population in the 13 IFLS provinces, for example, it does not mean the weights will sum to any population size, it just means that the weighted proportions of different types of people (i.e., age/sex/ethnicity or whatever) reflect the proportions of those types in the 2007 SUSENAS, having adjusted for attrition, etc.
Note that one should not use the weights to get population count estimates for a district (province-level may be problematic as well) due to sample size concerns and because the matching done to the Indonesian population for that survey year was not done by individual province but for the population of those 13 IFLS provinces combined.Back to top ⤴
HOW DO I DETERMINE BIRTH ORDER
You can most easily determine birth order for any IFLS respondent (child or adult) whose mother was given a Book 4 as it collected her birth history (which may have to be constructed by looking at more than one wave for women who are in their childbearing years throughout most or all of the IFLS waves). If an IFLS respondent's mother was never a Book 4 respondent, things become trickier.
If the IFLS respondent's mother was in an IFLS household roster and was only ever given a Book 3 because she was over age 49 at her first IFLS interview, one might be able to build an estimated birth history from list of coresident children of hers in the roster and non-coresident children covered in Book 3. If the mother has never been in the IFLS, but the father has, one might be able to use similar information from the roster and his Book 3, but if he has been married more than once, any "birth order" might not be related to the respondent's birth mother. If the IFLS respondent's mother was never in an IFLS household, then one may be able to build an estimated birth history from siblings of the respondent who are in the household with the respondent and from the non-coresident sibling roster, which does try to get at deceased siblings as well. However, there is no non-coresident sibling roster after IFLS3.
For children (i.e., those under age 18) in an IFLS wave, the mother most likely has been a Book 4 respondent and will have a reported birth history from which the child's birth order can be determined. It is for adult children that determining their birth order becomes more problematic, as described above, especially if they are no longer living with their mother at the time they first entered the IFLS.
It will take some work and one will need to look in several places for pertinent information. Remember that there will be cases with conflicting information between parents as to kids born and when and who's alive and who's since deceased and when. Likewise, one may find discrepancies between siblings who have both been interviewed in the IFLS as to who their other siblings are and how old and if they are alive or not and, if deceased, when. Users undertaking such a task will have to decide how best to handle such situations.Back to top ⤴
HOW IS RURAL/URBAN DEFINED IN SC05? DOES THE MIGRATION HISTORY HAVE A RURAL/URBAN INDICATOR?
The rural/urban distinction is the one assigned by the Indonesian Bureau of Statistics (BPS). We do not have a copy of the document that describes the process. However, the basic gist is this: they use a “functional” definition based on a number (it once was like 10 to 20) of indicators collected as part of each regular PODES to score each desa. The indicators include population density, share of population in agriculture, and a number of infrastructure indicators (schools, roads, markets, etc.). Those desas scoring above a set level are categorized as urban. Please contact BPS for the actual specifics on how that rural/urban distinction is set. Note that BPS does not assign a rural/urban designation to a kecamatan, which is comprised of a collection of desas that may differ in their rural/urban designation.
Note that in the IFLS migration histories there is no rural/urban indicator. We only asked respondents if the place they moved to was a “village, small town, or big city” with no prompting as to how those were defined. Users will need to decide for themselves how to assess rurality/urbanicity from that response and any other information collected in the migration history. While BPS desa codes for the migration histories are available as restricted data (see the FAQ on Community Geocodes and BPS Geographic Codes), it may be difficult to link these to past PODES data to try and get a rural/urban status for the desa at the time of the move (arriving or leaving) as PODES only go back to the 1980's and are not done annually, in addition to the problem of BPS desa code changes over time.Back to top ⤴
IS THERE A SCALE FOR DEPRESSION BASED ON BOOK 3B SECTION KP?
Section KP in IFLS4 and IFLS5 contains the questions from the short CES-D scale, which is 10 items. No scale has been constructed but users can easily create their own using the standard procedure to construct a scale from the short CES-D, which requires recoding of the response codes before summing those items into a total score. The recoding for items E and H need to be reversed to that 1 is changed to 3, 2 is kept the same, 3 is changed to 1 and 4 is changed to 0. For the other remaining items, the codes are to be changed by subtracting 1 from the value (e.g., 1 becomes 0, 2 becomes 1, etc). Because there are just 10 questions, if there are more than 2 items missing, one cannot create a meaningful score. Note that a respondent with a computed total score of 10 or more is considered to be depressed.Back to top ⤴
WERE BOOK US RESPONDENTS ADVISED TO SEE A DOCTOR IF THEY WERE HYPERTENSIVE?
Book US respondents with a systolic reading above 140 or a diastolic reading above 90 were given their blood pressure information with a recommendation to see a doctor. We do not know whether such individuals actually followed up and saw a medical professional.Back to top ⤴
DOES IFLS HAVE A SCORING ALGORITHM FOR THE PROVIDER VIGNETTES?
There is no formal scoring algorithm for the IFLS provider vignettes. Such vignettes may be used in different ways to assess quality of care so reviewing the literature is recommended. For example, one method might be to count up the number of “right” answers that were spontaneously provided and divide by the total number of “right” answers for the given scenario, and standardize the scores with a mean of zero and a standard deviation of one, with variations in quality expressed as standard deviations from the mean in order to compare across providers (See Sarah L Barber, Paul Gertler and Pandu Harimurti, “Differences in Access to High-Quality Outpatient Care in Indonesia”, Health Affairs 26, no. 3 (2007) doi: 10.1377/hlthaff.26.3.w352).
With respect to “right” answers in the IFLS4 and IFLS5 vignettes, the following were provided by Sarah Barber:
- Adult with cough and fever: 11a, 12c, 11c, 11d, 11e, 13b, 13c, 13d, 13f, 13a, 14c
- Child with diarrhea and vomiting: 40a, 40b, 40c, 40e, 40f, 42b, 42c, 42i, 42d, 42a, 44b, 44f
- Pregnant woman: 57a, 58c, 59d, 57i, 57f, 57b, 57g, 58a, 58b, 58d, 57k, 57e, 59j, 59b, 60a, 60c, 60i, 60e, 60d
- Adult with blood sugar question: 25a-25p, 26a-26n, 27a-27j, 28b-28e, 28g-28i (note that 26n, 27i, 27j were not asked in IFLS4)
The above is just one example as there may be other approaches and viewpoints on what constitutes quality of care within the presented scenarios.Back to top ⤴
WHAT WERE THE DATA COLLECTION DATES FOR EACH IFLS WAVE?
IFLS1 : Sept 1993 to Feb 1994
IFLS2 : Aug to Dec 1997 with long distance tracking till the end of March 1998
IFLS3 : late June to the end of Oct 2000 with long distance tracking till end of Dec 2000
IFLS4 : late Nov 2007 to end of Apr 2008 with long distance tracking till the end of May 2008
IFLS5 : late Oct 2014 to end of Aprl 2015 with long distance tracking till end of Aug 2015
All updates to IFLS1 have been applied to IFLS1-RR. You are encouraged to use those data. For a list of updates applied to IFLS1, see the fixes files provided with the original release and the updates listed in the FLS Newsletters.
IFLS1-RR Updates and Data Notes
Please read the IFLS1 document (DRU-1195/7) called “Documentation for IFLS1-RR: Revised and Restructured 1993 Indonesian Family Life Survey Data, Wave 1”. It explains how the IFLS1-RR data, which is the version that is on the IFLS data download site, varies from the structure of the original IFLS1 release data. We combined files with the same unit of observation so there are fewer files in IFLS1-RR than in the original release and DRU-1195/7 shows which files were combined into a single new file. However, due to limited funding, we did not redo the codebooks. As explained in the above document (DRU-1195/7), the codebooks are still useful for explaining the code values of variables even though the file structure in IFLS1-RR is different. One must use both the codebook and the IFLS1-RR document together to understand the data structure and contents.
No updates have been made since the data were made public.
The data output by the data entry program used in the field did not have an “IHTYPE” variable in the bkppkh01 file for the Posyandu Section H data like the “DHTYPE” variable in the bukph01 file for the PKK (Association of Family Activities). Users can create an “IHTYPE" variable running 1 to 18 for each record in the EA FACCODE combination in the bkppkh01. In Stata, one way to do that is to do “bysort ea faccode: egen ihtype=seq()”. You would do this on the data obtained from the original zip file so that you know the data is in the original sort order. It has been verified that this process does assign the correct line item number to a given record (i.e, ihtype=1 is indeed “good quality rice”, ihtype=2 is “average quality rice”, etc.).
For those interested in how the expenditure variables in the IFLS1-RR subfile "expend2" were generated, the programs that created those variables are available. They are SAS programs and are not supported by RAND. As noted in the IFLS1-RR documentation, the "expend2" file was created by another project and was given to us to share with other users.Back to top ⤴
IFLS2 Updates and Data Notes
The remaining previously unreleased modules of the IFLS2 data were made available in August 2009, which included the TK, MG, KL, EK and US. The version of the IFLS2 currently on the web now has all the IFLS2 modules.
In the codebook for the COMFAS ADAT2 file, there is a typo in the code values for AP24=3 and AP24=4. AP24=3 is actually the “bride’s place” and AP24=4 is the “groom’s place” and not the other way around as shown in the codebook file. If one checks the ADAT questionnaire, one will see the correct value definitions for the AP24 codes.
- Importing IFLS2 EXPORT files in SAS
- Using FORMAT libraries provided with IFLS2
- Reading ASCII (RAW) data files into SAS
IFLS2 EXPORT FILES
SAS export files in IFLS2 are grouped in modules. The files were created using PROC COPY. The following program. created the HH level data files. An example showing how to read the export files is provided at the end of the code.
*-------------------------------------------------------*; * EXPORT sas datasets for IFLS2 HH *-------------------------------------------------------*; libname lib v612 "LOCATION OF LIBRARY OF SAS DATASETS"; libname library v612 "LOCATION OF FORMATS FOR SAS DATASETS"; libname fmt xport "hh97fmt.xpt"; libname bk xport "hh97bk.xpt"; libname b1 xport "hh97b1.xpt"; libname b2 xport "hh97b2.xpt"; libname b3 xport "hh97b3.xpt"; libname b4 xport "hh97b4.xpt"; libname b5 xport "hh97b5.xpt"; * convert formats into common structure so can be exported; proc format library=library cntlout=lib.hhfmts; proc copy in=lib out=fmt; select hhfmts; * copy each book of modules into a single export file; proc copy in=lib out=bk; select htrack ptrack bk_cov bk_sc bk_ar0 bk_ar1 bk_krk ; proc copy in=lib out=b1; select b1_cov b1_ks0 b1_ks1 b1_ks2 b1_ks3 b1_ks4 b1_pp1 ; proc copy in=lib out=b2; select b2_cov b2_kr b2_ut1 b2_ut2 b2_nt1 b2_nt2 b2_hr1 b2_hr2 b2_hi b2_ge ; proc copy in=lib out=b3; select b3a_cov b3a_dl1 b3a_dl2 b3a_dl3 b3a_dl4 b3a_dlr1 b3a_dlr2 b3a_hr0 b3a_hr1 b3a_hr2 b3a_hi b3a_kw1 b3a_kw2 b3a_kw3 b3a_pk1 b3a_pk2 b3a_pk3 b3a_br b3b_cov b3b_km b3b_kk b3b_ak b3b_ma1 b3b_ma2 b3b_ps b3b_rj1 b3b_rj2 b3b_rn1 b3b_rn2 b3b_pm1 b3b_pm2 b3b_pm3 b3b_ba0 b3b_ba1 b3b_ba2 b3b_ba3 b3b_ba4 b3b_ba5 b3b_ba6 b3p_cov b3p_kw1 b3p_dl1 b3p_dl3 b3p_dl4 b3p_pm1 b3p_pm2 b3p_km b3p_kk b3p_ma b3p_rj1 b3p_rj2 b3p_rn1 b3p_rn2 b3p_br b3p_ch0 b3p_ch1 b3p_cx b3p_ba0 b3p_ba1 b3p_ba2 b3p_ba3 b3p_ba4 b3p_ba5 b3p_ba6 ; proc copy in=lib out=b4; select b4_cov b4_kw1 b4_kw2 b4_br b4_ba6 b4_bx6 b4_bf b4_ch0 b4_ch1 b4_cx1 b4_cx2 ; proc copy in=lib out=b5; select b5_cov b5_dla1 b5_dla2 b5_dla3 b5_maa0 b5_maa1 b5_psa b5_rja0 b5_rja1 b5_rja2 b5_rja3 b5_rna1 b5_rna2 ; * to import use, for example:; proc copy in=bk out=lib; * this will select all files from module bk and place the; * sas datasets in sas library given by ddname=lib;
The export file containing all SAS datasets was created in the same way without the SELECT statement. You may use the SELECT statement when you import the data sets. See PROC COPY in the SAS manual.
SAS FORMAT LIBRARIES
The example above includes code to convert the FORMAT LIBRARY into a structure that allows it to be exported. The FORMAT library stores all "value labels" (or format assignments). If you want to use those value labels, you should make them accessible to SAS using the LIBRARY statement to point to the directory in which they are stored on your computer system.
libname LIBRARY "your directory";
If you do not want to use the formats, you may override them in several ways.
Using options nofmterr statement and not referencing the FORMAT library
The statement _all_ refers to all variables in the dataset and tells SAS to revert to the default (null) format for all the variables. See the FORMAT statement in the SAS manual.
READING ASCII (RAW) DATA FILES
A SAS program to read the IFLS2 HH data files are stored with the zip file containing the data. The program is also available here.
If you wish to make permanent SAS datasets, you will need to set up a LIBNAME statement and give the datasets that are created a two-level name.
For example: libname PERMDATA "your_directory"; data PERMDATA.htrack; infile intrk pad lrecl=141; etc.
In this case, you will need to make a permanent FORMAT library which you access when you load the data. To do this, set up a LIBNAME for the format LIBRARY:
libname LIBRARY "your directory";
and amend the PROC FORMAT statement at the top of the read file:
PROC FORMAT LIBRARY=LIBRARY
When you load the data, ensure the libname LIBRARY is at the top of your program. This will ensure the format library is accessible to the data.
If you do not want to have variables formatted (or assigned value labels), delete (or comment out) the format assignment statements in the read program at the end of the input statement for each dataset.
For example: /* COMMENT THESE VALUE LABEL ASSIGNMENTS OUT... format RESULT93 RES_DONE.; format RES93BK RES_DONE.; format RES93B1 RES_DONE.; ... format MOVER97 MOVER.; */
See, also, the item above on using FORMAT libraries with IFLS2.Back to top ⤴
IFLS3 Updates and Data Notes
The CFS-mini files were updated on Oct 7, 2005. The id variable MKID00 was added to them.
In March 2008, the file B3A_TK3 was updated to correct a problem with TK31AA, TH41A, TK32B, and TK42B. For those who changed jobs, the codes for those items had not been properly merged on—all jobs across the years for a respondent had the same industry and occupation codes. This has now been corrected and the data shows industry/occupation changes over time as reported in B3A_TK3.
On Sept 1, 2009, the HH Book 1 files were updated to correct for a problem with missing HHID00 values. The problem occurred in an update of the data done after 2005 so earlier versions of the Book 1 files would not have this particular problem.
When the HH Book 1 files were updated, a few other identifier issues were updated in the IFLS3 household data. This 2009 updated version of the IFLS3 household data also includes the HHID and PIDLINK changes based on the 2007 field work. These were applied to both the 2007 survey and the 2000 survey. Therefore some users might notice changes to a few records if they re-download the 2000 data and compare it to earlier versions. In particular, for ptrack, we dropped a number of duplicate pidlinks (we dropped one record from each pair) and consolidated the responses to the subsequent questions.
In the proxy respondent’s B3P_MG file, there is no variable MG05D. The variable that should have the information collected in the question MG05D in the Book 3P questionnaire was not output. However, one can infer the response to the question number MG05D (the one that follows question MG08). Since only those who responded “no” or “DK” to question number MG05D would have a value in the variable MG05E, one knows that those with a blank value in MG05E are those who would have had a response of “yes” for MG05D.
In the B4_KL2 contraceptive calendar, it seems that those who were first married after July 1996 (the start of the calendar) have their calendar history start at July 1996 instead of the months/year they were first married. Also, for those respondents, the event_a on the 7/1996 record (i.e., column=1) is their current marital status at IFLS3. As this problem will not be corrected, users will need to review the calendar data for those first married after 7/1996 to see what needs to be revised. The marriage and pregnancy dates in the Book 4 KW and CH sections will help one to figure out what to do for those cases.
There are 84 cases in the B4_CH1 file that have decimals in the value for the length of pregnancy. These cases were ones where survey staff investigated because the originally entered values seemed odd. In the course of reviewing and correcting CH17 computations, it was overlooked that the CH17 value was to be a whole number and the computations should have been rounded to a whole number. Users can decide how to do that rounding.
The COMFAS BK1_D3 file does not have the D2 text field that gives the crop names. As zip file called bk1_d3_d2_bahasa.zip contains a Stata file and a SAS file that contains the D2 variable with the crop names in Bahasa Indonesian. Users can merge the D2 variable onto the BK1_D3 file using COMMID00 D3TYPE D19TYPE as the variables to merge by. Users will need to do their own translations of crop names.
In the IFLS3 B3A_TK2 file, Tk20bb (and TK20ab) have the 2-digit occupation code assigned based on the description of primary duties given in TK20b (and TK20a). That occupation code, assigned after fieldwork ended, apparently was given the same name as the occupation category variable seen in the questionnaire, so the occupation code variable values overwrote the occupation category codes. So TK20bb and TK20ab actually provide more detail than the original 10-category variable. As noted in the IFLS Data update/FAQ link on the IFLS data download page, the occupation codes used in all waves of the IFLS are defined in the IFLS1 Appendix A. I have attached a pdf of those code definitions for your convenience.
In the IFLS3 COMFAS School data, there is no file that has the list of B20 facility types found at the school. The B20/B21 information collected in IFLS3 was inadvertently not output to a data file. In IFLS4 and IFLS5, those data are in the SCHL_B3 file where the variable B3TYPE has the B20 value.
There is an error in the IFLS3 B2_HR1 file where there are insufficient records for HRTYPE=D, which should have 10,269 records instead of just 333. Unfortunately, it seems there was some kind of error back in 2000 and the b2_hr1 data received from Indonesia did not have HRTYPE=D records in it. The 333 in the data are ones that were recoded from “other, specify” responses that were added in later. That error has never been resolved and after all this time it is not likely to be.Back to top ⤴
IFLS4 Updates and Data Notes
The B3A_TK2 data file was revised on June 30, 2009 to correct for a problem with the occupation codes. The old occ2007 has been replaced with occ07tk2 (code for job in TK20a) and occ07tk3 (code for job in TK20b). The codebook for book B3a data was revised as well.
The B5_RJA2 data file was revised on July 2, 2009 to correct for a problem with identifiers. The HHID07 and PID07 variables had been accidentally omitted and were added when the file was updated.
The IFLS4 data was updated on Sept 25, 2009 to correct some identifier variable problems uncovered by users. Some of the files affected were BK_AR1, B3A_MG1, B3A_KW1, and B3A_KW3. In addition, a problem of a few missing records in B2_COV was corrected as well. Users may wish to re- download the full set of IFLS4 data because of the corrections to PIDLINK.
The B3A_PK2 file was updated on Oct 18, 2009 to correct for a problem with the variable PK18. PK18, which was previously blank, now has values for everyone who answered PK18.
The following variables in the files listed below were updated on Oct 24, 2009 to correct a problem where values were inadvertently missing:
The BK_SC file was updated on Oct 24, 2009. The variables HHID07_9, HHID07, and "X" in BK_SC have been renamed so that merging should be less confusing for the user.
The household codebook files were updated on Oct 24, 2009.
The English and Indonesian CF questionaires for School are now included with the CF documentation as of Oct 24, 2009.
The IFLS4 Comfas data was updated on Nov 18, 2009. The revisions include the following:
- The bk1_c1, bk1_c3, and bk1_c4 files have been removed and their contents have been restructured and attached to the main bk1 data file. the main bk1 file. They no longer exist as separate modules.
- CP variables have been added to all files (usually to the "main" book, e.g. - bk1, bk2, etc).
- The minikamades module is now included with the rest of the CF data in the cf07_all data zip files. These are the MKD files. In the near future there will be a separate link to the minikamades data itself.
- Some variable labels have been corrected for inaccuracies. For example, the labels for SC04, SC06, SC09-SC12 in the school file have been updated.
- Comfas file codebooks are now available in the cf07_all_doc zip file. In the near future, there will be a separate link to just the comfas codebooks themselves.
The B3A TK files were updated on May 12, 2010 to fix a problem with the occupation code variables where the values had accidentally been truncated.
The B2_NT2 file was updated on May 12, 2010 to include questions NT04 and NT05 which were inadvertently left off the public release file. The codebook for that file was also updated to reflect that change.
The B5_DLA2 file was updated on May 10, 2011 to add DLA76C1 which notes whether the test scores in DLA76C are from the EBTANAS or the UAN.
The B3A_MG1 file was updated on May 10, 2011 to fill in missing kecamatan, kabupaten, province and country codes where a name had actually been provided in MG03, MG05, MG07 and elsewhere in that file.
The BEK_EK1 and BEK_EK2 files were updated on May 10, 2011 to correct for an error in the EK#X variables. These variables now contain values that show whether the given cognitive test question was answered correctly. In the earlier data version, those variables only indicated whether the question was asked and did not show whether the response was correct or not.
The BUS2_2 file was updated on June 17, 2011. The variables US210A (total cholesterol) and US210B (hdl) were added to the file. Due to meter that was used, the US210A values are limited to the range 100 to 400, so values of 100 could actually be below 100 and values of 400 could actually be above 400. Likewise the US210B values are limited to the range of 15 to 100 with the same caveats about values of 15 and 100.
The B3A_TK1 file was updated on June 17, 2011. The labels on the TK16F* variables were revised to correctly reflect what the variables represent.
The PUSK file was updated on June 17, 2011. The variable LK11 was added to the file after inadvertently being omitted originally.
Volume 2 of the IFLS4 User's Guide was updated on June 17, 2011 to include a discussion of the total cholesterol and hdl measures added to the BUS2_2 file.
The PRA file (private practitioners) in the community/facility data was updated in June 2014. The variable LK13, type of practitioner, was added as it had been inadvertently dropped when the PRA file was initially created.
The IFLS4 CRP file crp_public_use was updated in April 2017 to include plasma equivalents for comparison to the IFLS5 CRP data. Please see the IFLS5 DBS User's Guide for details. The IFLS4 CRP file also now includes the relevant sampling weight. The variable PCT_CV was dropped, as it was not needed.
In the Book 3a English questionnaire and in the b3a_dl1 file, the code definitions in DL01E for values 25, 26, 27 and 28 are incorrect. The codes for those values should be: 25 = Other South Sumatra, 26 = Banten, 27 = Cirebon, 28 = Gorontalo, 29 = Kutai. There is no 24 code for DL01E. Note that the English Book 3a questionnaire and variable codebook have not been updated to correct this typo. The Bahasa version of the Book 3a questionnaire, which is what was fielded, has always been correct. The questions TR23-TR28 in the Book 3a English Questionnaire were incorrectly transcribed from the Indonesian version, which is the one used in the field. Thus the variable labels and value labels for TR23-TR28 in the data are also incorrect. What the questions and responses in the B3A_TR data file actually represent are the following:
TR23 : Taking into account the diversity of religions in the village, I trust people with the same religion as mine more.
|1 Strongly agree||2 Agree||3 Disagree||4 Strongly disagree|
TR24: How do you feel if someone with a different faith from you lives in your village?
|1 Stongly object||2 Object||3 No objection||4 No objection at all|
TR25: How do you feel if someone with a different faith from you lives in your neighborhood?
|1 Stongly object||2 Object||3 No objection||4 No objection at all|
TR26: How do you feel if someone with a different faith from you rents a room from you?
|1 Stongly object||2 Object||3 No objection||4 No objection at all|
TR27: How do you feel if someone with a different faith from you marries one of your close relatives or children?
|1 Stongly object||2 Object||3 No objection||4 No objection at all|
TR28: What do you think if people who have a different faith from you build a house of worship in your community?
|1 Stongly object||2 Object||3 No objection||4 No objection at all|
Note that the above is what was used in IFLS5 so one can look at Section TR of the IFLS5 Book 3a questionnaire to see what the IFLS4 English version should have looked like for questions TR23-TR28.
There is a typo in Appendix A of IFLS4 user’s guide, vol 2. The b4_kw2 file is the marriage history and b4_kw3 is a set of questions related to desired fertility. If one looks at the variables in the b4_kw3 file and in the b4_kw2 file and compare them to the IFLS4 Book 4 questionnaire Section KW, one can see what each file actually represents.
In the English Book 3a questionnaire Section SI, the wording for SI12=1 should be “still picks option 2” and SI12=2 should be “Switches to option 1”. The Bahasa questionnaire, which is the one that was administered, has the correct wording for SI12=1 and SI12=2.
The value labels in the IFLS4 codebook for b3a_si are incorrect for SI01-SI15 as they do not have the correct rupiah amounts listed. Please use the IFLS4 questionnaire to see the correct rupiah amounts read to the respondent.
In files B3A_KW1 and B3A_KW3, the variable PID07 should be HHID07_9 and the variable PID3a should be PID07. Since most users use PIDLINK to link individual respondent records, the incorrect nature of PID07 in these files does prevent users from having an individual respondent identifier for linking data. Users should do this renaming (i.e., rename PID07 to HHID07_9 first then rename PID3a to PID07) themselves if they want to do any merging by HHID07/PID07.
In file BK_AR1, there are two households that appear to have two people listed as head of household (AR02B=1). Below is a description of those households and the correct situations:
In household HHID07=2720700, there is only one head of household in the list of current household members (AR01A=1,2,5). The other person listed with AR02b=1 has AR01A=3. It appears there was a typo when AR02B was entered for that person as they are not the head in the HHID07=2720731 household in which that person is a current household member at IFLS4. That person should have been AR02B=13, not AR02B=1——the AR02=10 value and the age, plus looking back at the IFLS3 roster data shows this is what happened.
In household HHID07=2880742, there’s a typo in AR02B for PID07=5 who is age 11 and has AR01A=2 (returned). Looking at AR10, AR11 and AR12, one sees that the parent of PID07=5 (PIDLINK=288071103) is not a current member of that household. The only other person in the 2880742 household is an adult who is the head. In looking at PTRACK for the child and at the IFLS3 roster, it seems that the child was living with both parents in 2000 but by 2007, the father had died, the mother is in the same household as in 2000 but the child who was living with the mother in the IFLS3 roster now lives with some other family member it seems (possibly a sibling of their parent). Unfortunately, one can’t tell for sure what the relationship is between them to accurately correct the AR02B type for PIDLINK=288071103.
In the IFLS4 Book K Questionnaire, the codes 73, 74, 90, 95, 98 and 99 should be the following:
73. Madrasah Tsanawiyah
74. Madrasah Aliyah
95. Other, specify
98. Don’t know
The IFLS4 Book 4 codebook has the correct value labels.
File BK_AR1 has 3 PIDLINKs that appear as current household members in more than one household: 145104102 is a case where this was the fieldwork pidlink assigned to a person that was supposed to be revised to a different number post-fieldwork but was not, and 271290008 and 276010006 are cases where the same person appears in two different households (note that the pairs have different hhid/pid combinations) which can occur if a family member moves between related households and the interview dates for each household are far apart. By checking the IFLS4 PTRACK file, one sees that these last two people were only interviewed once: 271290008 as hhid07=2712942/pid07=7 and 276010006 as hhid07=2760143/pid07=1.
The variable referred to as PWT93_97_00_07L in the documentation is actually called PW9307L in the IFLS4 PTRACK file.
In the IFLS4 User’s Guide Volume 2 from the June 2011 update, the files listed in Appendix A and Appendix B are from IFLS3, not IFLS4. The earlier version from 2009 had the correct file listing for IFLS4.
In the Book US files, there are 7 PIDLINKS that have duplicates. For 4 of them, they are the same person in two different households (049010003, 073080001, 271290008, and 074194108). For the other three cases (141304105, 145104102, and 314200004), duplicates arise due to a “mis-assigned” PIDLINK. PIDLINK values are finalized after fieldwork, so during fieldwork a “fieldwork pidlink” is assigned. In a few cases, that fieldwork pidlink seems to have not been revised to a new number and thus a PIDLINK already in use is inadvertently assigned to another person. To address this issue in the Book US files, users must use the HHID07/PID07 to link Book US data with data from other IFLS4 files to avoid the “duplicate PIDLINK” notifications. The use of HHID07/PID07 will link the Book US data to the correct person in the IFLS4 household.
In the IFLS4 English Book 3a Section DL questionnaire, the code definitions for DL01E for values above 23 are incorrect and thus those in the codebook are incorrect as well. The Bahasa questionnaire version has the correct code definitions. The correct code definitions are: 25 (other South Sumatra), 26 (Banten), 27 (Cirebon), 28 (Gorontalo) and 29 (Kutai).
In file B3B_MA2, the records with MATYPE=blank can be dropped. They are the ones for MATYPE=CA, CB, and CC when MATYPE=C has MA01=3; for MATYPE=DA and DB when MATYPE=D has MA01=3; for MATYPE=IA, IB and IC when MATYPE=I and MA01=3. Because those records would be blank given MA01=3 on the trigger record, no values were output.
In file B3B_CD2, there appears to be a problem with blank values for CD01TYPE. There are records with CD01=1 (yes, a doctor diagnosed this condition) but CD01TYPE is blank, which is more of a problem than the more numerous cases where CD01TYPE is blank but CD01=3 (didn’t have the condition diagnosed). It is not clear what happened. However, the issue of the blank CD01TYPE values can be resolved by doing the following:
The order of records in the public release data appears to be in the A-I order for each respondent. HOWEVER, to ensure that order remains, the first thing the user must do is create a line number (e.g., in Stata “gen lineno=_n”) and sort the data by PIDLINK and this newly created line number variable. With the data thus sorted, the user can create a record number for each of the 9 records that a PIDLINK has and then fill in CD01TYPE where the 1st record for the PIDLINK is A, the 2nd is B, and so on and the 9th is I.
Occupation codes were not initially assigned to the job description text fields in the following places: BA13a, BA82a1/BA82a2 in Book 3B, BA82 and BX82 in Book 4, and TK20A, TK20B, BA13a and BA82 in Book Proxy. This may not be corrected in the future due to lack of funding.
The IFLS4 User’s Guide on page 22 has a typo regarding the sets of BPS location codes available for 2000 and 2007 locations. The sentences should be: “For 2000 locations we also provide two sets of codes: 1999 and 2000. For 2007 locations we give two sets of BPS codes: 2000 and 2007.”
1In the IFLS4 HTRACK codebook file (ifls2007_hhd_HTRACK.txt) there is a typo regarding the BPS codes used throughout IFLS4. The sentence “We use the 1999 BPS codes as the main set, and these are used consistently throughout IFLS4” should be “We use the 2007 BPS codes as the main set…..” .
In the IFLS4 English Book 3A questionnaire Section SI, there is a typo in the choices for SI12. The choices should be : “1. Still picks option 2 SI21 “, “2. Switches to option 1”. This will then match the Bahasa Indonesian questionnaire, which is the one that was administered.
In the IFLS4 b4_ch1 file there are a few cases where CH27 (child’s roster number) is incorrect. Below are the corrections:
Replace ch27=4 if hhid07==”0090241” & pid07==2 & ch05==2
Replace ch27=9 if hhid07==”1291200” & pid07==8 & ch05==1
Replace ch27=10 if hhid07==”1291200” & pid07==8 & ch05==2
Replace ch27=5 if hhid07==”2700700” & pid07==2 & ch05==3
Replace ch27=13 if hhid07==”3140500” & pid07==10 & ch05==3
Replace ch27=9 if hhid07==”0461000” & pid07==8 & ch05==3
Replace ch27=3 if hhid07==”0641943” & pid07==2 & ch05==1
Replace ch27=3 if hhid07==”1841431” & pid07==2 & ch05==1
Replace ch27=4 if hhid07==”2120641” & pid07==2 & ch05==1
Replace ch27=6 if hhid07==”2571541” & pid07==3 & ch05==1
Replace ch27=6 if hhid07==”2581900” & pid07==3 & ch05==1
In the IFLS4 b3b_co1file, there is no co02 variable which shows if the respondent reported the data correctly and if not, how much did they get correct. Users will need to create their own by cross-checking the co01 day/month/year responses to the correct date in the co02 day/month/year fields.
In the IFLS4 mini-CFS data, there are multiple records with the same MKID07. It appears there was some problem with the MKID07 assignment in the data such that different mover community EAs are represented by the same MKID07 even though some are not even in the same kecamatan. It is not clear what happened in these cases. For some of the “duplicates”, if one uses MKID07, province, kabupaten and kecamatan codes to define the mover community EAs and uses that combination to link to the IFLS mover household, then a unique link can be made for almost all cases. There still remain 17 pairs of MKID07 that have the same province/kabupaten/kecamatan codes for which it is not currently possible to link the mini-CFS and household data.
In the IFLS4 HTRACK file, the MOVER07=2 group appears to have inadvertently included a large set of those households who did not change desas between the 2000 and 2007 interviews.
Those MOVER07=2 households that have HHID07=HHID00 and COMMID00=COMMID07, meaning they are not splits, and the Book 3a migration history shows no moves since 2000 (i.e., MG20B1=1 and MG20C=0) for the household head are households that did not move between IFLS3 and IFLS4 (i.e., MOVER07 should have been 0).
For such households that had just one move (MG20B1=1 and MG20C=1), then these are households that moved within the desa. If there is more than one move in the migration history, it is not clear if the household left and returned to same dwelling or just to the same community, so relative to the IFLS3 residence, these could be MOVER07=0 or MOVER07=1. For splits-offs (i.e., HHID07 differs from HHID00) and COMMID00=COMMID07, those households likely moved within the desa (i.e., MOVER07 should have been 1).
The trickier cases are where the COMMID07 and COMMID00 differ and MOVER07=2 because COMMID for non-original IFLS sample communities represent more of a sub-district level than an EA-level like the original IFLS sample COMMID. Users will have to use their own judgement in these cases.
In the IFLS4 B5_RJ3 file, the variable labels for the RJA28 variables are in error. The rja28a*f is polio 3, rja28a*n is polio 4, rj28a*g is DPT1, rj28a*h if DPT2, rja28a*i is DPT3, rja28a*j is measles, rja28a*k is HepB1, rja28a*l is HepB2 and rja28a*m is HepB3. This is only a problem in IFLS4 as the variable labels for RJA28 are correct in the other waves in which they appear.
The IFLS4 B4_BX6 file is incomplete. The B4_BX6 file only has a handful of the variables it should have. For some reason this file did not get the BX63c-BX70 or BX97-BX90 variables. This issue has never been resolved. Also, the variable BX94 is misnamed and is actually question BX84 as there is no BX94 question.
Blood pressure and pulse measurements in IFLS4 were obtained with an Omron digital measuring device.
There is a problem with the occupation codes in B3A_TK3 as they do not reflect changes in occupation codes for those who did change occupations during the given job history period. We are in the process of correcting the problem and will update the data when ready. At that time a notice of the revised data will be added to the IFLS4 Data Updates section above.Back to top ⤴
IFLS5 Updates and Data Notes
The IFLS5 data was updated on April 14 2016. The HTRACK and PTRACK files were updated to include weights. The MiniKamadas data and codebook were added. The updated HH and CF codebooks have improved layout and formatting. The User's Guide and Questionnaires are now in PDF format instead of Word format. Those who downloaded IFLS5 before April 14 may want to re-download the data and documentation zip files to replace the earlier version.
The IFLS5 data was updated on June 15, 2016 with the following changes/additions:
- Bracketed values in B2 files and B3A files have been extensively rebuilt to correct errors in the original data version. This included better labeling and corrected formats (value labels).
- Interviewer data and questionnaires for IFLS5 and IFLS4 are available as a standalone download.
- The numeracy results document using the adaptive number series test in Book 3b, section COB, is available as a standalone download.
- In file BUS_US, the labels for diastolic and systolic blood pressure (US07a*, US07b* and US07c*) have been corrected as the labels were initially swapped.
- BPS geographic codes for 2014 (province, kabupaten and kecamatan) have been crosswalked to the 2013 codes that were used for the SC variables on the field data. Note that it was not always possible to determine which household should receive the 2014 codes in cases of kecametan splits. In any event, the number of codes that needed revision was tiny – about .24%.
- A problem with the primary occupation code having been overwritten by the secondary code in B3A_TK2 has been fixed.
- An updated version of Volume 2 of the Users Guide is now on the download page.
- In file B3B_AK1, all records with AK01=3 now have non-missing pidlinks.
- Incorrect pidlink/pid combinations in PTRACK have been patched.
- Variable RJ00 has been added to book B3B_RJ0.
- All Stata files now retain value labels when appropriate.
- All codebooks have been updated to reflect the changes above.
The IFLS5 data was updated on April 12, 2017 with the following changes/additions, all of which have also been incorporated into the codebooks:
- Variables B1 to B15K are now included in the SCHL file.
- In the file PUSK_D3, variable D9, which is the name of the puskesmas employee, has been dropped from the public use files.
- The name of the interviewer (PWWCRNM_B) has been dropped from file SCHB_COV.
- Variable labels have been added for S45-S54E, S56, S60a-S60n, and S62a-S62c in the file BK2.
- In the file BK_SC1, the values for SC21X have been recoded to 1, 11-99 for consistency with the questionnaire. The values were originally released as 0, 2-6.
- The file B3B_KK3 now has proper value labels for the variable KK3TYPE for the activities KA, KB and KC.
- The file BK1 now includes the I13 school counts to parallel the J26 count variables for health facilities.
- TK19 industrial sector codes have now been properly assigned in the B3A_TK2 file.
- DBS weights for 2014 and 2007 have been added to PTRACK file.
- Both FASCODE and FCODE now appear in all COMFAS facility files.
- The file PRA now includes the variable LK13.
- Codebooks have been edited to correctly refer to "IFLS5" instead of "IFLS4."
- Bahasa Indonesian versions of the IFLS5 questionnaires are now available.
- A crosswalk file between the province/kabupaten/kecamatan codes for 1998, 1999, 2000, 2007, and 2014 is now available. That file is named kec_9899000714 in the IFLS5_BPS_2014_codes.zip file.
- The IFLS5 Dried Blood Spot (DBS) data is now available. Please read the IFLS5_Dried_Blood_Spot_User_Guide.pdf for details about the data.
The IFLS5 DBS data and documentation were updated on April 18, 2018. The updated DBS data file contains a revised HbA1c whole blood equivalent measure called a1c_rev which replaces the previous measure a1c_wb_equi. The new HbA1c whole blood measure corrects for an upward bias in the assay test detected by a subsequent validation study. Please read the revised IFLS5 DBS User’s Guide for details.
The IFLS5 Book 4 b4_ch1 file was updated on April 18, 2018. An error was discovered in the IFLS5 b4_ch1 birth history file which had resulted in an incorrect merge in combining data collected in the first portion of Section CH and the latter portion. Users can correct the problem themselves or can download the revised data file. Please see the IFLS5 Data Updates and Data Notes on the IFLS Data Update, Tips and FAQs web page.
The IFLS5 data file B3A_DL1 shows codes 24, 25 and 26 in variable DL01E. These codes correspond to the codes 25, 26 and 27, respectively, that are shown in the IFLS5 Book 3A English questionnaire. The value labels in the B3A_DL1 file are correct. Users need to use the codes shown in the Book 3A variable codebook file for DL01E and not the codes shown in the questionnaire itself.
Unlike earlier IFLS waves, as noted in the IFLS5 User’s Guide, in IFLS5 the proxy respondents’ survey responses are included with the regular respondents’ survey responses and are not in a separate set of files. The Book cover files contain a variable that identifies those who are proxy respondents. Users should be sure to include that variable on all their analytic files constructed from IFLS5 data.
In IFLS5 PTRACK, there are 3 pidlink values that appear twice: 017132102, 168284105, and 274090007.
The 2 records for 017132102 are indeed duplicates, so you can just keep one of them.
For 274090007, drop the record with hhid14=2740941. If you look at the roster of 2740941 in the bk_ar1 file, you’ll see that 274090007 is not in that household but is in the hhid14=2740900 household.
For 168284105, these records are for the same person. If you look at the roster data in bk_ar1 for both hhid14=1682811 and 1682841, this person is in both households. However, this kid does not appear in the IFLS5 Book 5 data, even though PTRACK says they were a partial complete, nor is this kid in the Book US data even though PTRACK says they completed Book US. Given the kid was not interviewed under either HHID14 household, you could just pick one of these 2 to keep. I would likely keep record with HHID14=1682811 since the BK_AR1 roster has the kid with an AR01a=2 (was in another household in previous wave) since the kid was in IFLS4. The HHID14=1682841 record in the BK_AR1 roster file has AR01A=5 for the kid, which implies the kid was not around in an earlier wave, which is not true.
In cases with pidlink duplicates in PTRACK with different HHID values, it really helps to look at the rosters as well as at any of the books that the person was said to have done. This can often help sort out which record to use.
In the English Book 3a questionnaire Section SI, the wording for SI12=1 should be “still picks option 2” and SI12=2 should be “Switches to option 1”. The Bahasa questionnaire, which is the one that was administered, has the correct wording for SI12=1 and SI12=2.
In the IFLS5 User’s Guide Appendix B, it lists a file called BK1_I. That file was not created, unlike in IFLS4. Users can go to the IFLS5 SAR file to obtain the correct facility counts.
You can get the count of schools that are listed in the SAS by type (primary, jr high, sr high) from the sar_cov file but this could include schools that are now closed. The individual school records in the SAR tell you if the school is currently open or closed. Note that in the sar_cov file, you need to link the IDW variable to that in, say, bk1b_cov, to get COMMID14, which is not on the sar_cov file it seems. The COMMID14 variable is on the sar file of individual facilities.
For information collected on individual schools in the SAR, to identify schools in the sar file, you need to look at the 4th digit in the FASCODE variable where values of 6, 7 and 8 mean primary, jrh and smh respectively. For those records, the variable X14B1 tells you whether the school is public or private. To see what the X14B1 codes are for schools in the SAR, you need to look at the SAR questionnaire section for schools which shows what the codes 1-11 represent for schools. The SAR variable codebook file only lists the X14B1 code definitions relative to health facilities.
There was a problem in the original IFLS5 B4_CH1 file due to a faulty merge which resulted in the data for variables CH12 to CH41 being attached to the wrong pregnancy records, thus one sees CH25=1 (child is still alive) for records with CH06=1,3, or 4 (non-live pregnancy outcomes). If the user has a file with a creation date in April 2018, then the user has a corrected version of the file. If the user has an older file, the user can either download the revised file from the IFLS data download page (users can just download the IFLS5 Book 4 data zip file only and do not have to re-download all the IFLS5 household data files), or the user can correct the problem themselves by doing the following:
Split the B4_CH1 file into two files: 1) one with just the following variables: PIDLINK CH05 CH06 CH06A CH07_ID CH08 CH09X CH09DAYT CH09MTH CH09YR CH10A CH10B CH17 CH17X AGE; and 2) one with all the remaining variables. In the first smaller file, create CH05A=CH05, then merge the two files together by PIDLINK CH05A. This will match the variables for the CH12-CH41 questions to the correct child in CH05.
In the IFLS5 English Book 3B questionnaire, item KK03kb is missing. However it is present in the Bahasa questionnaire and there is a KK3TYPE=KB record in the b3b_kk3 data file. The KK03kb / KK3TYPE=KB item is “ to use the toilet, including getting up and down” as listed in the Book 3b variable codebook.
In the IFLS5 English Book 3A questionnaire Section SI, there is a typo in the choices for SI12. The choices should be : “1. Still picks option 2 SI21 “, “2. Switches to option 1”. This will then match the Bahasa Indonesian questionnaire, which is the one that was administered. There are 13 sets of duplicate HHID14/PID14 combinations in IFLS5 PTRACK, only one pair of which has the same pidlink and thus is a duplicate (HHID14=0171321/PID14=2). The other 12 sets have two different pidlink values with the same HHID14/PID14 combination. To correct this problem, users need to revise PID14 for the following PIDLINK values in PTRACK. Below is Stata code to make those changes:
replace pid14=4 if pidlink=="085110004" replace pid14=6 if pidlink=="110030007" replace pid14=4 if pidlink=="140180004" replace pid14=2 if pidlink=="182080002" replace pid14=4 if pidlink=="228210004" replace pid14=3 if pidlink=="233300003" replace pid14=4 if pidlink=="234070004" replace pid14=1 if pidlink=="241220001" replace pid14=2 if pidlink=="244100002" replace pid14=5 if pidlink==”095114106” replace pid14=7 if pidlink==”274090007” replace pid14=14 if pidlink=="272070014"
An exit form was introduced in IFLS5 to gather information on those who had died since the last interview. When linked to PTRACK using PIDLINK, one sees cases where AR01A_14=3 (not in household) and not 0 (deceased). When the roster was done, the roster respondent did not know that the individual who was no longer a household member had died since leaving the roster respondent’s household. In the process of tracking the individual, it was determined that the individual was now deceased and needed an exit form to be completed. AR01A=3 was not updated to AR01A=0 in the IFLS5 roster as “deceased” was not the response given by the roster respondent. Thus, for users interested in identifying IFLS respondents who were reported as deceased between the last time the household they were in was interviewed, they cannot use the AR01A alone in the roster or in PTRACK, and must also consult the exit form data.
There is no HHSIZE variable in IFLS5. The HHSIZE variable seen in earlier IFLSes is just the number of records in the BK_AR1 file. It is not the actual household size with respect to the number of people living in the current household. This is why HHSIZE was dropped in IFLS5. As described in the IFLS user’s guides, those currently living in the household are those with values of AR01a=1 and AR01a=5. Where they appear, the additional codes of 2, 4 and 11 are also to be kept as they also indicate current household members. Thus, to get a true “household size” variable, one needs to add up the records with these AR01a values.
In all the IFLS5 COMFAS files where they appear, the variable labels for LK02_14_14 and LK03_14_14 are incorrect. The variable label for LK02_14_14 should read “2014 Kabupaten, 2014 BPS code” and for LK03_14_14 the label should read “2014 Kecamatan, 2014 BPS code”.
In the IFLS5 HTRACK file, there are MOVER14=0 cases that have a different MKID in 2014 than in 2007 (i.e., EA changed) while they have the same COMMID in 2007 and 2014. These may actually be cases where the household moved within the desa but if the COMMID is not one of the original 312, then it might also be a move within the kecamtan since for movers COMMID is more like a subdistrict. There are also cases with MOVER14=0 and the COMMID differs between 2007 and 2014 which suggests those are likely moves outside the kecamatan---one can check the 2007 and 2014 province/kabupaten/kecamtan codes to see what kind of move it might have been.
If the odd MOVER14=0 cases are all split-offs, then it seems that MOVER14 was not reset for them to reflect their move out of the 2007 household.
No cholesterol measurements were done in IFLS5. The reason was because they did not trust the hand-held meter they used in Wave 4 and DBS for cholesterol has had problems in other studies.
In IFLS5 file B4_KW2, the variable label on KW27b is incorrect as it is the number of daughters still desired, not sons. The number of sons still desired is in KW27a as its variable label correctly shows.
In IFLS5 file B3A_MG2, the codes in MG21e do not represent countries. The value 62 means “in Indonesia” and all the other codes (0, 90, 96, 98) all just mean “outside Indonesia”. The actual country names were not released due to privacy considerations.
In IFLS5 English CF questionnaire for PUSKESMAS C has an error. The list of items for C1TYPE for which questions C06-C10b are asked, only lists item 32 (Anti-TBC (short-term, e.g. Rifampicin, Ethambutol, Isoniazid/INH)) when it should have shown that each drug was asked about separately, as seen in the PUSK_C1 data file where the variable C1TYPE has values of 32c, 32d, 32e, and 32h. These item codes are in the Bahasa Indonesian CF questionnaire for PUSKESMAS C as it is the Bahasa version that was administered to respondents. The above item codes mean: 32c=Rifampicin, 32d=Ethambutol, 32e=Isoniazid/INH and 32h is Riafter (Rif + Iso + Pyran).
In the IFLS5 DBS zip file (hh14_dbs_dta.zip and hh14_dbs_tpt.zip), the file called dbs_ifls4_public_use.dta is to be ignored as it is incorrect. The IFLS4 DBS zip file (crp_dta.zip/crp_xpt.zip) has the correct DBS data for IFLS4.
There is a problem with the occupation codes in B3A_TK3 as they do not reflect changes in occupation codes for those who did change occupations during the given job history period. We are in the process of correcting the problem and will update the data when ready. At that time a notice of the revised data will be added to the IFLS5 Data Updates section above.Back to top ⤴