Data source and study population
The Korean National Health Insurance Service (NHIS), which is a single compulsory social medical insurer operated by the government, has established a public database called the National Health Information Database (NHID). The NHID contains all records of healthcare utilization (including information on diagnosis and prescription records), the eligibility database (including sociodemographic variables), and the national health screening database17. The national health screening program consists of a questionnaire about previous history, family history, lifestyle, anthropometric measurements, and laboratory tests, and is provided biennially to all adults older than 40 years18. The NHIS-Senior database is composed of a 10% randomly sampled group of the entire elderly population aged ≥ 60 years in the NHID in 200219. All individuals in the NHIS-Senior cohort were followed up retrospectively from 2002 to 2015, except for those who were not eligible for national health insurance.
We collected data from the NHIS-Senior database for all individuals who participated in the national health screening program from 2002 to 2005. Among the 215,875 participants, we excluded 5798 individuals who died before the index date and 6729 individuals who had received a diagnosis of any type of dementia before the index date. In addition, we excluded 22,061 individuals who had been diagnosed with any type of cancer and 1966 individuals who had a history of stroke before the index date. We then excluded 20,042 participants aged < 65 years in 2006 and 10,745 individuals with missing variables. The final study population consisted of 148,534 individuals (Fig. 1). All participants were followed from the index date, January 1, 2006, to the date of AD diagnosis or December 31, 2015, whichever was earlier.

Flow chart of the study population.
The Institutional Review Board of the Veterans Health Service Medical Center (Seoul, Republic of Korea) approved this study (IRB no. BOHUN 2021-01-059-001), and waived the requirement for obtaining written informed consent because the NHID provides anonymized and de-identified data. All research was performed in accordance with the 1964 Declaration of Helsinki and its later amendments.
Outcome and covariable definitions
AD was diagnosed on the basis of International Classification of Diseases, 10th revision codes (ICD-10) codes F00 or G30. We defined AD in cases where the diagnosis and prescription of anticholinesterases (donepezil, rivastigmine, and galantamine) or N-methyl-d-aspartate (NMDA) receptor antagonists (memantine) were claimed together on the same day20,21,22. To properly claim the prescription of anti-dementia drugs, physicians should document the evidence of cognitive decline according to the following criteria: 1) Mini Mental State Examination (MMSE) score ≤ 26 and 2) either Clinical Dementia Rating (CDR) ≥ 1 or Global Deterioration Scale (GDS) score ≥ 323.
Participants were categorized by BMI (kg/m2) as underweight (< 18.5), normal weight (18.5–22.9), overweight (23.0–24.9), or obese (≥ 25.0) using the WHO Western Pacific Region guideline strata24. The underweight population was further categorized as showing mild (17.0–18.4), moderate (16.0–16.9), or severe (< 16.0) thinness25. Participants responded to questionnaires regarding their past medical history and health behaviors, such as current smoking, current alcohol drinking, and regular exercise (at least 1 time per week) in the national health screening program. Because health insurance premiums are determined by income level or holding property in the NHIS, we defined the low-income population as individuals whose health insurance premiums were less than the lowest decile for the insured or who were medical aid beneficiaries.
Comorbidities such as hypertension, diabetes, and dyslipidemia were defined by prescription of medication for the disease using the respective ICD-10 codes (I10–13 and I15 for hypertension, E11–14 for diabetes, and E78 for dyslipidemia) at least 2 times per year before the index year or if the respective diagnostic criteria were met (systolic blood pressure ≥ 140 mmHg or diastolic blood pressure ≥ 90 mmHg for hypertension; fasting blood glucose ≥ 126 mg/dL for diabetes; and total cholesterol ≥ 240 mg/dL for dyslipidemia) in the results of the national health screening program. Cardiovascular disease (CVD) was identified based on the answers to the self-reported questionnaire for a physician’s diagnosis of heart disease in the national health screening program.
For each participant, the primary outcome was the occurrence of AD between January 1, 2006, and December 31, 2015, and the number of person-years of follow-up was recorded.
Statistical analysis
The baseline characteristics of the participants were compared according to BMI categories using ANOVA for continuous variables and chi-square test for categorical variables. Data are presented as the mean (standard deviation) or number (%). The AD incidence rates were calculated by dividing the number of events by 1,000 person-years (PY). Cox proportional hazards regression analyses were performed to obtain hazard ratios (HRs) and 95% confidence intervals (CIs) of AD based on baseline BMI categories. The risk of AD was analyzed after adjusting for possible confounding factors. Model 1 was adjusted for age and sex, and Model 2 was additionally adjusted for lifestyle factors (smoking status, alcohol consumption, and regular exercise) and low-income status. Model 3 was further adjusted for a history of hypertension, diabetes, dyslipidemia, and CVD. Stratified analysis was performed by dividing the participants into subgroups according to baseline age group (65–74 or ≥ 75 years), sex, low-income status, current smoking, alcohol consumption, regular exercise, underlying hypertension, diabetes, dyslipidemia, and history of CVD to test interactions between subgroups. In addition, a sensitivity analysis was performed using multiple imputation to additionally deal with missing values based on the method of fully conditional specification. Statistical analyses were performed using the SAS Enterprise Guide (version 7.1; SAS Institute, Cary, NC, USA) and STATA software (MP, version 17.0; StataCorp, College Station, TX, USA), and statistical significance was defined as two-sided p < 0.05.