Meta-GWAS identiﬁes the heritability of acute radiation-induced toxicities in head and neck cancer

Background and purpose: We aimed to the genetic components and susceptibility variants associated with acute radiation-induced toxicities (RITs) in patients with head and neck cancer (HNC). Materials and methods: We performed the largest meta-GWAS of seven European cohorts (n = 4,042). Patients were scored weekly during radiotherapy for acute RITs including dysphagia, mucositis


Introduction
Radiotherapy (RT) is often used to treat head and neck cancer (HNC), but it causes acute and late radiation-induced toxicities (RITs). Acute RITs usually appear during treatment and resolve 90 days after commencing RT; the most common acute RITs are dysphagia, mucositis, and xerostomia [1]. Large patient-topatient variability exists regarding RITs. Although some is due to patient-related characteristics (e.g., age) and other treatments (e.g., chemotherapy), underlying genetic susceptibility probably explains much of the variability [2]. Studies suggested a 58 % to 78 % heritability for cell response to irradiation [3][4]. However, genetic susceptibility, heritability, and predictability of for RITs have not been investigated in HNC patients.
An earlier experimental study showed radiation hypersensitivity in individuals with genetic disorders, such as ataxiatelangiectasia, Nijmegen breakage syndrome, Fanconi anaemia and DNA ligase IV deficiency [5]. Two candidate gene studies showed associations between single nucleotide polymorphisms (SNPs) in genes involved in DNA damage response (XRCC1, RAD51, NBN) with mucositis and dysphagia in HNC patients [6][7]. Several genome-wide association studies (GWASs) recently identified SNPs associated with RITs in breast and prostate cancer patients [8]. For HNC, two GWASs found significant associations between two loci on chromosome five with acute mucositis [9] and xerostomia [10]. A Chinese GWAS found 65 genes suggestive mapped to 50 loci associated with mucositis [11]. These studies were small (N < 1,500) bearing insufficient statistical power to detect small effects of SNPs. Here, we aimed to unfold the genetic components of and to identify SNPs associated with acute RITs by performing the first multi-national meta-GWAS of 4,042 HNC patients of European ancestry. We investigated (SNP-based) heritability, the genetic correlations between acute RITs, and built polygenic risk scores (PRSs) for prediction of acute RITs in HNC.

Study design
We included seven Caucasians prospectively collected cohorts in meta-GWAS ( Supplementary Fig. 1), and one East Asian (Singapore) cohort evaluated transferability to an East Asian population (Supplementary Methods & Table1). The cohorts were split into discovery (UMCG-HANS, n = 1,279; DAHANCA, n = 1,183; RADIO-GEN, n = 178) and replication (Ghent, n = 273; RAPPER, n = 457; Head and Neck 5000, n = 672) sets based on the availability of data at the time of conduct. These sets were combined to calculate SNPbased heritability, estimate genetic correlations between RITs, and build PRSs (Supplementary Figure S1). Each contributing study received ethics approval from local institutional review boards, and all participants gave written informed consent.

Study participants
We included 4,042 Caucasians and 180 East Asian HNC patients who received curative-intent RT and were scored prospectively for RITs (see Supplementary Methods).

Assessment of acute RITs
Dysphagia, mucositis, and xerostomia were scored weekly during RT by the treating radio-oncologists. Data were collected before, during and up to six weeks post RT. Physician-rated HNC symptoms were assessed according to the Common Toxicity Criteria of Adverse Events (CTCAE, version 2.0 or 4.0). Patient-rated HNC symptoms were assessed using the EORTC-QLQ-C3020 and LENT SOMA questionnaires in the Head and Neck 5000 cohort (Supplementary Table S1). Supplementary Tables S2 and S3describe the definition, scoring, and availability of acute RITs per cohort.

Multiple imputations of missing value for toxicity
We applied multiple imputation (MI) as implemented in MICE package [12] (details in Supplementary Methods) to impute missing values (Supplementary Table S4).

Outcome modelling
We applied two different scoring systems. First, since risk of acute RIT generally increases during RT, we used area under curve (AUC) to estimate the average burden of an acute RIT during RT (from week one to week seven) per patient (s for formulas). Second, we used the standardized total average toxicity (STAT) score [13] to achieve a composite score describing the general acute RIT (see Supplementary Method and Supplementary Table S2).
Genotyping, quality control, and imputation of non-genotyped variants Germline DNA from the whole blood of 4,042 patients was genotyped using commercially available genome-wide SNP arrays (Supplementary Table S5). Quality control (QC) procedures were performed using standard procedures in each cohort. First, SNPs and samples with call rates below 95 to 98 % and SNPs with MAF < 1 % were removed. We assessed gender mismatch using the actual X chromosome homozygosity index (F) of > 0.8 representing for the male gender, and an F of < 0.2 represents a female gender. Relatedness was evaluated by pairwise identity by descent (IBD) values when duplicate samples were considered by a pihat > 0.8 who were removed, and the remaining pairs were manually checked. We checked the ethnicity of subjects using multidimensional scaling (MDS) clustering of our samples with Hapmap Phase 3 individuals using EIGENSTRAT. Samples that deviated more than 3 SD from the mean of their closest clusters were removed. Furthermore, individuals were assigned to populations based on principal component analysis (PCA). PCA was performed using EIGENSTRAT. The top 10 PCA eigenvectors were included in the final models for all endpoints. Next, strand ambiguous SNPs and duplicate SNPs were removed. Finally, SNPs were imputed either on the Michigan server using the HRC r1.1 2016 reference panel or 1000 Genome Phase 3 with European samples. We performed post-imputation QC involved removing SNPs with an imputation quality (info) score of R [2] < 0.3, or with a MAF of < 0.01 or < 0.05, and SNPs that had a discordant MAF (maximum allowed difference < 0.15) compared to the reference panel, or strand ambiguity AT/CG SNPs, or multi-allelic SNPs (Supplementary Table S5).

Statistical genetic association analysis
We analysed residual of AUCs (individual RITs) and STAT acute using multivariate linear regression for each cohort including an allele's additive effect while adjusting for covariables and the top 10 PCAs to correct for population stratification (see Supplementary Methods). A regression coefficient presents the effect size for one copy/dose of effect allele of the corresponding SNP. A pvalue < 5.0x10 -8 was considered statistically significant and a 5.0x10 -8 < p-value < 1.0x10 -5 as suggestive association. Significance for heterogeneity was denoted when the heterogeneity pvalues (Hetp) was < 0.05. Data preparation and statistical analyses were carried out using PLINK/1.90b3.44 [14] and SNPTEST [15]. Meta-analyses were done using the inverse variance weighted, a fixed-effects method implemented in METAL [16] (version 2011-03-25).

Discovery GWAS meta-analysis
GWAS summary results for each cohort underwent a series of QC checks using the GWASinspector [17] package in R. For each acute RIT, univariate and multivariate GWAS results were meta-analysed. SNPs with p discovery < 1.0x10 -5 and no significant test of heterogeneity, which were available in at least two (out of three) discovery cohorts, were selected for replication.

Replication and combined meta-analysis
A similar QC was conducted for Replication set (Supplementary  Table S5). We meta-analysed summary results from the four replication cohorts. SNPs were considered replicated when p replication < 0.05/number of SNPs tested. The summary meta-GWAS results from the discovery and replication sets were metaanalysed. SNPs were considered genome-wide significant if p combined < 5 Â 10 -8 .

The entire samples analysis
The entire discovery and replication cohorts were jointly metaanalysed to maximize the study power; hereon called whole-meta.
LD score regression (LDSC), SNP-based heritability, and genetic correlation LDSC analysis regressed SNP Chi2 estimates from GWAS with LD scores across the genome. An LDSC intercept = 1 suggests no LD related confounding bias, whereas an intercept > 1 suggests a contribution of LD confounding in Chi2 estimates due to cryptic relatedness (Supplementary Methods). We applied cross-trait LDSC to estimate genetic correlations between RITs [18]. In brief, the cross-product of two GWAS test statistics is calculated at each SNP, and this cross-product is regressed on the LD score. The slope of the regression estimates the genetic relationships between two tested RITs. The dispersion of AUCs and STAT acute of RIT endpoints in HNC patients by discovery and replication cohorts. The Y-axis show the cohorts, and X-axes show the measured value of endpoints. Each color represents a cohort. The lowest line represents the minimum (Q0 or 0th percentile): the top line represents the maximum (Q4 or 100th percentile) data point excluding any outliers; The middle line represents the median (Q2 or 50th percentile); the box represents the interquartile range (IQR) which is the distance between the first quartile (Q1 or 25th percentile; that is the median of the lower half of the dataset) and the third quartile (Q3 or 75th percentile that is the median of the upper half of the dataset).  Fig. 1 shows more details for the distribution of outcomes. Genetic susceptibility to radiation toxicity in HNC Polygenic risk score (PRS) analysis To evaluate the contribution of common variants on prediction of acute RITs, we repeated the whole-meta while excluding the UMCG-HANS cohort for each RITs. The SNP effect sizes were then used to build PRSs for SNPs below different thresholds of association p-values (Supplementary Methods). These PRSs were tested in the UMCG-HANS cohort (n = 1,279) at two different SNP MAFs (<0.01, <0.1) for their ability to predict their corresponding RIT.

In-silico functional analysis
To understand the plausible functional effects of the suggestive SNPs, we performed an in-silico functional analysis using wholemeta summary results (Supplementary Methods). Briefly, we first annotated independent functional genome-wide suggestive SNPs to a set of relevant genes, then we tested the statistical enrichment of these gene sets, and finally we predicted the biological and genetic pathways of significantly enriched gene sets.
Missing data varied per RIT from 2.6 % in week one to 28 % in week six post-RT (Supplementary Table S4). Supplementary  Table S6 presents characteristics per cohort. Fig. 1 and Table 1 describe the distribution of AUC for acute RITs in the discovery and replication cohorts. The adjusted mean AUC of RITs was zero with SD range 0.43-0.82 across RITs.
We found no genome-wide significant SNP associated with any RIT in the discovery stage, nonetheless, identified 393 genomewide suggestive SNPs spanning 118 loci (Supplementary Tables S7 to S14). In replication analysis, 37 out of 393 suggestive SNPs were nominally associated (p replication < 0.05) with the corresponding acute RITs (Supplementary Tables S7 to S14). No SNP passed genome-wide significance threshold. Supplementary Tables S7 to S14 list the most statistically significant SNPs. Twenty-seven of the 37 SNPs achieved a more significant association in combinedmeta (p combined ) with concordant directions in effect sizes (i.e., beta's) across the individual cohorts and in meta-analyses (Table 2).
We found 26 suggestive SNPs across 12 loci associated with xerostomia; the top SNP was rs6458543 with a meta-effect size 0.22 (p combined improved to 9.65x10 -7 , Fig. 2; Supplementary Table S13) and was mapped to nearest TNFRSF21 (Supplementary Figure S2).
Ten of the 176 available suggestive SNPs in Asian cohort (one for mucositis and nine for xerostomia) showed a nominal association (p < 0.05) with the same direction of effect size as the European meta-analysis (Supplementary Tables S7 to S14).
The Q-Q plots for 4,042 HNC patients showed no genomic inflation due to population substructure in whole GWAS. No SNP reached genome-wide significance. There were 525 suggestive SNPs spanning 159 genomic regions (Supplementary Tables S15 to S22; Fig. 2).
For LD score regression, around 1,1 million SNPs were analysed. The LDSC intercept showed minor inflation attributable to confounding bias (Table 3), indicating the observed statistics was due to the polygenicity of RITs. SNP-based heritability was 0.29 (se = 0.09) for dysphagia, 0.09 (se = 0.12) for (adjusted) mucositis and 0.26 (se = 0.08) for STAT acute . The heritability and standard error were too high (up to 0.51, se = 0.19) for xerostomia likely due to few event rates. A strong positive genetic correlation was found between dysphagia and STAT acute (rg 0.65, p = 0.048), but not for the others (Supplementary Table S23). Fig. 3 presents PRSs for acute RITs. The best fit PRS, which explained the highest RIT variance, differed across RITs. In the UMCG-HANS cohort, PRS dysphgia (at MAF > 0.01, p-value < 0.1) explained 3 % of the variance of dysphagia. PRS mucositis (MAF > 0.01) explained 2.5 % of the variance for mucositis. PRS dysphgia (MAF > 0.1, p-value < 0.001) explained a negligible proportion (0.4 %) of the variance in (adjusted) dysphagia, which was similar for PRS statacute for (adjusted) STAT acute . There were insufficient patients to build a meaningful PRS xerostomia .
Knowing our limited power and risk of false-negative results, we performed an in-silico functional analysis to understand the potential pathways underlying RITs using whole-meta summary results (see Supplementary Results). Briefly, the top pathways Abbreviations; N: number of analysed subjects, SNP: Single nucleotide polymorphism, GC: Genomic control, h2: SNP-based heritability, se: standard error, intercept protects from bias from population stratification and cryptic relatedness. The X axis shows location in the genome. Each SNP is plotted as a dot. The Y axis shows À log10 P-values for the association of each of the SNPs to the desired outcome. The red line shows the threshold for genome-wide suggestive (P-value < 1 Â 10-5). QQ plot: The Y axis shows observed À log10 Pvalues, and the X axis shows the expected À log10 P-values. Each SNP is plotted as a dot, and the red line shows null hypothesis of no true association. Deviation from the expected P-value distribution is clear only in the tail area, and along with the estimated lambda coefficients, suggesting that population stratification was adequately controlled. identified were ''3 0 -5'-exoribonuclease activity" (for dysphagia, FDR = 1.64e-10), ''inositol phosphate-mediated signalling" (mucositis, FDR = 2.20e-09), and ''drug catabolic process" (STAT acute , FDR = 3.57e-12).

Discussion
Our study is the first multicentre meta-GWAS investigating the genetic component of acute RITs in HNC patients. We identified 393 suggestive SNPs associated with RITs, of which 27 maintained the effect size direction and became more significant in the replication stage. Whole Meta-analysis identified 525 suggestive SNPs spanning 159 genomic regions. We estimated heritability of acute RIT as 29 % for dysphagia and 26 % for STAT acute . We also showed that genetic susceptibility to RITs (dysphagia, mucositis, and xerostomia) is likely to be independent, as we found no significant genetic correlation between the three studied RITs. We built specific PRS for RITs, which predicted its corresponding RITs. In addition, in-silico functional analysis found several novel genes and genomic pathways not previously associated with acute RITs.
To control for potential confounders, we applied strict criteria on sample selection, quality controls for genotyping. While adjusting for ancestry components, and confounding co-factors, We running conservative meta-GWAS analyses (i.e., a double genomic control, and controlling for heterogeneity). We did additional statistical checks to find any source of statistical inflation in GWAS effect estimates, and post-GWAS analyses. Furthermore, we applied the LDSC method, which is robust to confounding due to cryptic population stratification [24]. Lastly, we used narrow-sense heritability, i.e., the proportion of a trait's phenotypic variance attributable to additive genetic variance. Therefore our analyses might missed part of heritability due to limited samples size, the presence of rare variants with large effects not tagged by SNPs, or imputation and non-additive genetic variation and/or epigenetic factors [25]. We, therefore, estimated a valid but limited proportion of the heritability for RITs. Whole-genome or -exome sequencing are the next likely approach to estimate the large effect of rare variants in heritability estimates for RITs.

PRS for RITs and its predictability
As the first study, we showed the predictability of RITs by SNPs. Though it seems the percent of explained variance was modest, they were similar to findings for other traits and diseases. For example, even using a large sample size to generate PRS for autism spectrum disorder, the best PRS model explained only 2.5 % of the variation in autism [26] and this was 3 % for schizophrenia [27]. By expanding the high-quality RITs data, the performance of PRS can be optimised for RIT, as done for other complex traits.
The first suggestive genome-wide association to RITs Dysphagia We found 37 suggestive SNPs in 15 genomic regions associated with dysphagia in HNC patients. In-silico annotations, Genetic susceptibility to radiation toxicity in HNC identified chr22q12, and chr8p22 (out of 12 regions), were enriched by the prioritised genes related to dysphagia. The top hit, rs1061660, mapped in chr22q12 containing GATSL3, TBC1D10A, SF3A1, CCDC157, RNF215, SEC14L2 genes, while MTFP1 is the nearest gene. The other enriched region, chr8p22, contains ZDHHC2, CNOT7, VPS37A and, MTMR7 genes. The nearest gene, mitochondrial fission process 1 (MTFP1), is a nuclear-encoded protein that promotes mitochondrial fission [28]. The details of functional results are presented in Supplementary Discussion. Our analyses concluded exonuclease activity mechanisms as a potential mechanism in cell response and hypersensitivity to radiation, and in incidence of acute RT induced dysphagia.
Mucositis We found 103 suggestive SNPs in 17 genomic regions associated with mucositis in HNC patients. The top SNP was rs73039018, and the nearest gene was ZNF573 (Zinc Finger Protein 573) which involves nucleic acid binding and DNA-binding transcription factor activity. Our functional analyses (Supplementary Discussion) suggested the dysregulation of the inositol phosphatemediated signalling due to the radiation is likely involved in mucositis.
Xerostomia We found 54 suggestive SNPs in 22 genomic regions associated with xerostomia. The top SNP was rs6458543 mapped to TNFRSF21 (TNF Receptor Superfamily Member 21), also known as DR6, encodes a member of the tumour necrosis factor (TNF) receptor superfamily and induces apoptosis upon overexpression [29]. Our functional analyses highlighted the complexity of mechanism of xerostomia which requires larger studies to be unravelled.
STAT acute is a clinical indicator of accumulated toxicities for which we identified 31 suggestive SNPs in 17 genomic regions. The top SNP was rs137992872, mapped to TCF20 (Transcription Factor 20) that encodes a widely expressed transcriptional coregulator, and TCF20 mutations are associated with autism and intellectual disability [30]. Our functional analysis (see Supplementary Discussion) highlighted chromosome 22q13 region and the arachidonic acid metabolic process are associated with an average of RITs (STAT acute ).

Clinical Implications and future perspectives
The risk of RIT is assessed using normal tissue complication probability (NTCP) models. NTCPs are multivariable prediction models built on radiation dose metrics, clinical factors, and patients' characteristics [31]. One immediate impact is including genetic risk scores in forming a prediction model that explains patients' sensitivity to RT, which may eventually allow a more individualized RT treatment. By this multicentre meta-GWAS, we showed heritability for RIT in HNC patients and adds a piece to the increasing evidence that an individual genetic predisposition is a contributory factor in the development of acute RITs. We found limited but significant predictability for RITs using genetic risk scores.

Limitations
We did not reach to genome-wide significant threshold for top hits, though this is common in GWASs. Generally, testing each SNP an association with the trait in GWAS needs to account for the large number of statistical tests carried out; thus, a very stringent p-value is used. This reduces false positives, but it may mask real associations too, especially if individual SNPs have a negligible effect on the trait. In-silico functional analysis added additional evidence by linking our suggestive regions to interesting relevant biological functions. Future studies should be conducted on extended sample size and define better phenotypes for RITs. We tested common SNP with MAF of more than 1 %; however, exploring the rare variants will be highly instructive.

Conclusion
We showed acute RIT is heritable and predicable by PRS. We identified 393 suggestive SNPs associated with the four RITs, of which 27 SNPs maintained the direction of their effect sizes and showed an improvement in the significance of associations in the replication stage. Furthermore, we showed that the genetic susceptibility of acute RITs is likely independent. In addition, in-silico functional analysis identified exoribonuclease activity, inositol phosphate-mediated signalling, and drug catabolic process with potential roles in RITs. Our work extends the field of radiogenomics by combining cohorts of HNC patients with reliable data for RIT would enable the identification of genetic variants with lower penetrance, possibilities for validation, and eventually, to enrich NTCP models with genetic factors.

Declaration of Competing Interest
The authors declare that there is no conflict of interest regarding the publication of this article.