PTV-based VMAT vs. robust IMPT for head-and-neck cancer A probabilistic uncertainty analysis of clinical plan evaluation with the Dutch model-based selection

Background and purpose: In the Netherlands, head-and-neck cancer (HNC) patients are referred for proton therapy (PT) through model-based selection (MBS). However, treatment errors may compromise adequate CTV dose. Our aims are: (i) to derive probabilistic plan evaluation metrics on the CTV consistent with clinical metrics; (ii) to evaluate plan consistency between photon (VMAT) and proton (IMPT) planning in terms of CTV dose iso-effectiveness and (iii) to assess the robustness of the OAR doses and of the risk toxicities involved in the MBS. Materials and methods: Sixty HNC plans (30 IMPT/30 VMAT) were included. A robustness evaluation with 100,000 treatment scenarios per plan was performed using Polynomial Chaos Expansion (PCE). PCE was applied to determine scenario distributions of clinically relevant dosimetric parameters, which were compared between the 2 modalities. Finally, PCE-based probabilistic dose parameters were derived and compared to clinical PTV-based photon and voxel-wise proton evaluation metrics. Results: Probabilistic dose to near-minimum volume v = 99.8% for the CTV correlated best with clinical PTV-D 98% and VWmin-D 98%,CTV doses for VMAT and IMPT respectively. IMPT showed slightly higher nominal CTV doses, with an average increase of 0.8 GyRBE in the median of the D 99.8%,CTV distribution. Most patients qualiﬁed for IMPT through the dysphagia grade II model, for which an average NTCP gain of 10.5 percentages points (%-point) was found. For all complications, uncertainties resulted in moderate NTCP spreads lower than 3 p.p. on average for both modalities. Conclusion: Despite the differences between photon and proton planning, the comparison between PTV-based VMAT and robust IMPT is consistent. Treatment errors had a moderate impact on NTCPs, showing that the nominal plans are a good estimator to qualify patients for PT.

Radiation therapy (RT) in combination with chemotherapy is widely used in the treatment of head-and-neck cancer (HNC) patients.Highly conformal RT techniques such as intensitymodulated radiation therapy (IMRT) and volumetric-modulated arc therapy (VMAT) have been used to precisely deliver the dose [1].More recently, intensity-modulated proton therapy (IMPT) has also been introduced clinically.Compared to RT, its physics properties allow achieving similar dose conformality around the target with an additional improvement on healthy tissue sparing [2].However, IMPT also comes with higher costs and limited capacity.
In the Netherlands, a standardized model-based selection (MBS) is used to select HNC patients for proton therapy (PT) [3][4][5].The Dutch MBS for HNC patients consists of a plan comparison between an IMPT and a VMAT plan, in which a patient is selected for IMPT if sufficient dosimetric benefit is achieved in modelled normal tissue complication probabilities (NTCPs).The plan comparison is based on the assumption of plan iso-effectiveness in the clinical target volume (CTV), which means an equivalence in CTV dose and tumor control probability (TCP) has to be achieved for both VMAT and IMPT.Treatment errors and their impact on delivered dose are, however, different for photons and protons, and can compromise this iso-effectiveness.First, IMPT is more https://doi.org/10.1016/j.radonc.2023.1097290167-8140/Ó 2023 The Author(s).Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
E-mail address: j.rojosantiago@erasmusmc.nl (J.Rojo-Santiago). 1 Postal address: P.O.box 2040. 2 Street address: Doctor Molewaterplein 40, 3015GD, Rotterdam, The Netherlands.sensitive to beam-and patient-alignment (geometrical) errors and to anatomical variations than VMAT [6].Second, IMPT is also subject to uncertainties in the stopping-power prediction (SPP), i.e., the range [7].Moreover, geometrical and range errors also have an impact on the dose delivered to the organs-at-risk (OARs), and, subsequently, can impact the estimated NTCP values.
To mitigate uncertainties, conventional RT and PT uses fundamentally different approaches.For VMAT, a planning target volume (PTV) is used, assuming dose invariance against shifts (static dose cloud approximation) [8].For IMPT, a scenario-based robust optimization and a voxel-wise evaluation are used instead [9][10][11].Both strategies enlarge the treated volume around the CTV to ensure adequate target coverage, resulting in increased doses to normal tissues.Clinical plan robustness evaluations are usually based on these enlarged volumes i.e., PTV-based and voxel-wise metrics for VMAT and IMPT, respectively, while uniform, more comprehensive and probabilistic plan evaluations on the CTV are lacking.Due to these differences, it is unknown if plan iso-effectiveness on the CTV between the 2 modalities is actually achieved in clinical practice.
In this study, we accurately determined whether isoeffectiveness for the CTV was achieved between the 2 treatment modalities within the Dutch MBS.We used polynomial chaos expansion (PCE) to fast and accurately model the dependence of delivered dose on the geometrical and range errors.With PCE, we simulated 100.000 complete fractionated treatments per plan, obtaining probabilistic distributions of relevant dosimetric parameters for the primary and elective CTVs and main OARs [12][13][14].First, we derived and analyzed consistent probabilistic CTV dose parameters to give a more realistic, physical and statistical meaning to PTV-based and voxel-wise dose clinical evaluation metrics.Second, we evaluated consistency and robustness between VMAT and IMPT in terms of CTV dose, and subsequently, in terms of TCPs.Finally, we assessed the sensitivity of OAR doses and NTCP values against geometrical and range errors to investigate their impact on the MBS for PT patient selection.

Patient data
Thirty oropharyngeal HNC patients, treated at the UMCG with VMAT (10/30) or IMPT (20/30), were included.For each patient, a VMAT and an IMPT plan were available from the planning comparison, leading to a total of 60 plans.Prescribed doses (D pres ) were 70 Gy for the primary CTV (CTV 70.00 ) and 54.25 Gy to the elective lymph nodes (CTV 54.25 ), both delivered in 35 fractions.For VMAT, dose was prescribed to the PTV: PTV-D 98% !95% D pres , while, for IMPT, dose was prescribed to the voxel-wise near-minimum (VWmin) CTV dose: VWmin-D 98%,CTV !94% D pres [11].To avoid hot spots in the target, an additional constraint was included for the PTV near-maximum dose: PTV-D 2% 107% D pres for VMAT and on the voxel-wise near-maximum (VWmax) CTV dose: VWmax-D 1cc,CTV 78 GyRBE for IMPT.A constant radiobiological effectiveness (RBE) of 1.1 was assumed for IMPT relative to VMAT.
To facilitate gross tumour volume (GTV) and OARs delineation, a T2-weighted sequence of magnetic resonance imaging (MRI) and a positron emission tomography (PET) scan were acquired per patient.They were rigidly registered with a single-energy planning computed tomography (SECT) scan, with 0.98 Â 0.98 Â 2 mm 3 CT resolution.The median GTV was 13.49 cc (range 1.37-117.05cc).Main relevant OARs i.a., the parotid and submandibular glands, spinal cord, extended oral cavity and pharyngeal constrictor muscles (PCM) were delineated according to the international consensus guidelines for CT-based delineation of OARs in the head-andneck region [15].

Treatment planning
Both VMAT and IMPT treatment plans were available in RayStation (version 10B, RaySearch, Sweden).For VMAT, dual 6MV arcs were used.Dose was calculated using a collapsed cone (CC) dose algorithm.To handle geometrical errors, a PTV-based optimization and evaluation was performed using an isotropic CTV-PTV margin of 3 mm.When the PTV was adequately covered, the dose optimization for the OARs was guided by NTCP-based cost functions [16].
For IMPT, most plans were generated using an initial arrangement of 4 beam angles, with 2 anterior (40°and 320°) and 2 posterior (160°and 200°) coplanar oblique beams.In 3 patients, additional beams were used to increase robustness against anatomical changes.Subsequently, beam angles were manually optimized per patient to better spare the PCM and the parotid glands.Posterior beams were split into range shifter and nonrange shifter fields, resulting in 5 to 7 beams per plan.Dose of the clinical IMPT plans was calculated using a Monte Carlo (MC) dose engine with 1% MC uncertainty, but also were subsequently re-evaluated with a MC noise of 0.1%.Robust minimax optimization [9,10] was used to handle geometrical and range errors.Based on the clinical evaluation of the first cohort of patients [17], a setup robustness (SR) setting of 3 mm was used.For the relative SPP error, a range robustness (RR) setting of 3% was assumed based on literature [18] and validated in-house [19,20].Finally, a robustness evaluation was performed in line with [11].CTV coverage and OARs doses were respectively ensured using VWmin and VWmax dose distributions from 28 error scenarios.They consisted of the combination of 14 geometrical shifts and 2 range errors based on the SR (3 mm) and RR (3%) clinically used.
To reduce inter-patient and inter-modality variations, all treatment plans were normalized to their clinical VWmin-D 98%,CTV (G s, IMPT ) and PTV-D 98% (G s,VMAT ) values respectively.For IMPT, two normalization levels were considered: (1) 94%, applied in Sections 2.5 and 2.6 to assess treatment plan differences according to clinical protocol and (2) 94.5%, applied in Section 2.4 according to our adjusted protocol based on a re-calibration, in order to consistently derive probabilistic dose metrics on the CTV.The normalization was based on the geometrical scaling factor G s that is described in the Supplementary Material (SM) [SM, Section S1].

Polynomial chaos expansion (PCE): PCE-based robustness evaluation
The impact of geometrical and range errors on CTV and OARs doses was modelled with Polynomial Chaos Expansion (PCE).PCE provides a computationally efficient patient-and treatment planspecific analytical model of the dose engine.PCE approximates the dose D i of each voxel i affected by a geometrical shift and a relative range error q as the series expansion It enables the evaluation (PCE-based robustness evaluation) of 100,000 complete fractionated treatments with proper statistical weighting.A complete VMAT fractionated treatment is simulated by drawing (i) a fixed systematic geometrical error (R) for the complete treatment and (ii) a different random error (r) for each fraction, both characterized by the standard deviations (1 SD) of Gaussian error distributions.For the IMPT treatments, an additional (iii) systematic range (q) error (1 SD) for each treatment is included.
We considered two cases of R and r errors.For the derivation of the probabilistic plan evaluation metrics, values of R = 0.92 mm and r = 1.00 mm for the systematic and random geometrical errors (1 SD) were assumed, consistent with a M = 2.5R + 0.7r = 3 mm Probabilistic uncertainty analysis of VMAT vs. IMPT planning for Head-and-Neck Cancer margin based on van Herk's recipe and clinical experience (error set I).To assess treatment plan differences between VMAT and IMPT, a second combination of systematic and random geometrical errors were determined from clinical measurements of beam-and patient-alignment errors and machine specifications, as tabulated in Table 1 (error set II). From the patient residual setup analysis, systematic values were acquired as the standard deviation (1 SD) of the residual mean setup errors, while the random components were obtained as the root-mean-square value of the residual setup standard deviations.Range uncertainties were based on literature [21].
Patient-and plan-specific PCE models were constructed for both treatment modalities using the clinical RayStation CC/MC dose engine.To further improve PCE accuracy for the IMPT plans, the PCE models were constructed using a MC noise of 0.1% instead of the clinical 1%.Four PCE-based robustness evaluations were done per patient, two for the VMAT and other two for the IMPT plan, depending on the error set used.Patient and treatment plan-specific probability distributions of relevant dosimetric parameters across the sampled treatments (scenario distributions) were derived for the CTV 70.00 , CTV 54. 25 and main relevant OARs.A flowchart of the followed methodology is described in Fig. 1.

Probabilistic plan evaluation metrics to assess CTV coverage
In clinical practice, the lack of consistency between plan evaluation metrics hinders a fair comparison of the robustness between photon and proton treatment plans.Additionally, the volume v of the CTV that is clinically covered by 95% D pres is not well defined since the clinical goal on the PTV was relaxed from the point mindose (PTV-D 100% ), which aimed a CTV coverage of 100% and was used in the derivation of van Herk's margin recipe [8], to the near-minimum dose (PTV-D 98% ) [22].
To this end, we determined plan evaluation metrics (D v,CTV ) to probabilistically assess target dose adequacy, and we compared them with clinical PTV-based and voxel-wise metrics.First, we compared the PTV-and VWmin-based prescription for VMAT and found the equivalence to PTV-based prescription if VWmin-based is prescribed to 94.5% of reference dose [SM, Section S2], in line with [11].Second, we scaled the IMPT plans to the 94.5% VWmin-based prescription.Third, PCE-based robustness evaluations were done assuming geometrical errors (1 SD) consistent with a M = 3 mm margin (error set I) based on van Herk's recipe (R = 0.92 mm, r = 1.00 mm).From the results, we derived the volume v of the CTV that correlates best with clinical VWmin-D 98%,CTV and PTV-D 98% values, satisfying that, in 90% of the sampled treatments, at least 95% D pres on average for all patients was achieved in the CTVs.This volume was then used in the evaluation of the CTV dose iso-effectiveness.

Evaluation of CTV dose iso-effectiveness
CTV dose iso-effectiveness was evaluated with a comparison of relevant CTV dose parameters and modelled TCP values between the 2 treatment modalities.For this part of the analysis, PCEbased robustness evaluations were done using the clinical geometrical systematic and random errors tabulated in Table 1 (error set II) and IMPT plans scaled to the 94% VWmin-based prescription [SM, Section S1].
Target coverage for both CTVs was compared based on the probabilistic D v,CTV metrics derived in Section 2.4.Dose conformality of the plans was determined according to a conformity index (CI) value, defined as the ratio between the volume covered by the prescribed isodose (VRI) and the target volume (TV): CI = VRI/TV.Inhomogeneity of the dose was evaluated by an homo-geneity index (HI) value, defined as the ratio between D 95% /D 5% for the CTV 70.00 and CTV 54. 25 .
TCP modelling was based on the model by Luhr et.al. [23].In the TCP model, the target volume is split up in 3 different nonoverlapping subvolumes (GTV, CTV 70.00 and CTV 54.25 ) with different response to dose [23][24][25].TCP scenario distributions for the GTV, disjoint CTV 70.00 and disjoint CTV 54. 25 were calculated from the dose-volume histograms of the treatments sampled in each of the regions.The total TCP (TCP T ) was calculated as: TCP T =-TCP GTV Â TCP CTV,70 Â TCP CTV,54.25 .For further details, we refer to [SM, Section S3].

Robustness of the MBS for PT patient selection
To assess the robustness of the MBS approach, OARs doses were evaluated with PCE using the clinical geometrical and range errors from Table 1 (error set II) and IMPT plans were scaled to the 94% VWmin-based prescription [SM, Section S1].To this end, scenario D mean distributions for the parotids, submandibular glands, the extended oral cavity and the PCM were determined and compared between the 2 modalities.
NTCP models, clinically used for IMPT patient selection [3][4][5], were also used to facilitate a plan comparison between VMAT and IMPT in terms of tissue toxicities.In this study, we considered the NTCP models for two different toxicities as used for MBS in current clinical practice: (i) patient-rated moderate (grade II) and severe (grade III) xerostomia toxicities and (ii) physician-rated grade II and III dysphagia toxicities.They are based on the D mean doses from: (i) the parotids and submandibular glands for the xerostomia model and (ii) the external oral cavity and the PCM for the dysphagia model.If the DNTCP (DNTCP = NTCP VMAT -NTCP IMPT ) exceeds a threshold for a certain toxicity e.g., 10% and 5% for grade II and grade III complications, respectively, the patient would qualify for IMPT instead of VMAT.NTCP spreads were calculated as the difference between the 5th and the 95th percentile of the scenario NTCP distribution.

Statistical analysis
Statistical significance of the dosimetric differences between treatment modalities was assessed using the Wilcoxon signed rank test (p < 0.05).

Results
Based on the calibration for the probabilistic D v,CTV metric [SM, Section S4], the 10 th percentile of D v for v = 99.8%(D 99.8%,CTV ) correlated best with clinical VWmin-D 98%,CTV for IMPT and PTV-D 98% for VMAT values.As depicted in Fig. 2, the 10 th percentile of the scenario D 99.8%,CTV distributions for both CTVs (CTV 70.00 and CTV 54.25 ) resulted in an underestimation of 0.2% compared to the clinical PTV-D 98% for VMAT, while, for IMPT, a slightly higher overestimation of 0.5% of the clinical VWmin-D 98%,CTV was found.For symmetry, a volume v = 0.2% (D 0.2%,CTV ) is suggested to evaluate maximum CTV dose values, which showed weaker correlations with the clinical VWmax-D 2%,CTV (R 2 > 0.62) and PTV-D 2% (R 2 > 0.94).The correlations for the D 0.2%,CTV are displayed in the [SM, Section S5].
To assess CTV dose iso-effectiveness and robustness of the MBS, median values of relevant dosimetric parameters over all patients are tabulated in Table 2.They are reported with a MC uncertainty of 0.1% to avoid sampling differences between the clinical dose engine and PCE-based evaluations.
The mean OAR doses involved in the NTCP for xerostomia and dysphagia complications are displayed in Fig. 4A and Fig. 4B respectively.An improved dosimetric benefit for the majority of Table 1 Geometrical and range uncertainties taken into account for the PCE robustness evaluation.Systematic and random geometrical errors were determined from clinical measurements of beam-and patient-alignment errors and machine specifications, which were split into a systematic and a random component.This set of errors is referred to as error set II throughout the paper.

Table 2
Comparison of clinical MC 0.1% and PCE-based dose parameters for the CTV and OARs.The values represent the median values of the dose parameters calculated with the clinical dose engine and with PCE.To assess the impact of treatment errors in dose parameters, percentiles of the scenario distributions are also reported (CTV: D 99.8% , D 98% V 95% with the 10 th percentile, D 2% and D 0.2% with the 90 th percentile; D mean of OARs and NTCPs with 5 th and 95 th percentiles).(range: 0.1-2.3)%-point and 0.8 (range: 0.2-4.8)%-point for VMAT were obtained.

Discussion
In this study, plan equivalence between PTV-based VMAT and robustly optimized IMPT was evaluated when the Dutch MBS is used to qualify HNC patients for PT.We compared not only dosimetric CTV and OARs differences but also modelled NTCPs and TCPs outcomes.The impact of geometrical and range errors on the delivered dose was assessed with PCE, in which a fast and accurate robustness evaluation of 100,000 complete fractionated treatments was done per plan.PCE allowed to obtain not only accurate statistics on clinically relevant dosimetric parameters but also probabilistic robustness metrics for the CTV (e.g., scenario D 99.8%, CTV distributions) and main OARs (e.g., scenario D mean distributions) .
Our main finding is that, in line with the 90% population probability aimed at by van Herk [8], a near-minimum volume v = 99.8%probabilistically lead to consistent results compared to clinical plan evaluation metrics used in conventional RT and PT respectively.This volume resulted from the relaxation of the historical clinical goal from a point dose minimum (D 100% ) to the nearminimum dose (D 98% ) [22], which was used in the calibration of the DUPROTON protocol [11].Thus, the 10 th percentile of the scenario D 99.8%,CTV distribution showed the best agreement with clinical PTV-D 98% and VWmin-D 98%,CTV doses.The slight variations found between the clinical metrics and the probabilistic D 99.8%,CTV values indicate that, even after a calibration of the protocol, differ-ences between the robustness approaches in protons (robust optimization) and photons (PTV-based evaluation) remains.However, the introduction of probabilistic metrics allows a fair and unbiased robustness comparison and target dose adequacy assessment between both modalities, directly on the CTV and without the necessity of a prior calibration.Furthermore, they also enable to interpret robustness of CTV dose for non-robust patients -patients for whom clinical goals were deliberately not achieved -while this remains a limitation in clinical PTV-based and voxel-wise dose metrics.
Despite the fundamental physical differences between photons and protons, the planning comparison between VMAT and IMPT leads to reasonably consistent and robust results in terms of CTV dose, TCP and NTCP.Even with the prescription iso-dose correction of 1% from the DUPROTON calibration [11], IMPT plans resulted in systematically higher nominal doses to the CTV, which also resulted in a significant plan superiority in terms of TCP T .This slight increase in TCP T for IMPT comes mainly from: First (i), the impact of MC noise, which was not considered in the DUPROTON calibration.As target dose metrics e.g., D 99.8%,CTV and D 0.2%,CTV , are assessed in the tail of the dose-volume histogram distributions, larger underestimations and overestimations of these metrics are found, when the magnitude of the MC noise is increased [SM, Section S6].This leads to an inherent increase of the robustness in the IMPT plans, which can be corrected with an additional calibration in the clinical voxel-wise values.Second (ii), the normalization of the treatment plans.This could be considered as a limitation of the study, in which variations of target near minimum dose values from the clinical goals were removed.As such, results do not show Probabilistic uncertainty analysis of VMAT vs. IMPT planning for Head-and-Neck Cancer the actual clinical differences between VMAT and IMPT plans in the studied patient cohort, but it shows differences that can be expected when plans are created such that the clinical goals for target near minimum dose are exactly met.Due to the superior target near minimum dose of VMAT plans compared to IMPT in the cohort, the normalization resulted in a reduced D mean dose in GTVs and both CTVs for VMAT compared to IMPT, and consequently inferior TCP T values (Fig. S4).
The moderate impact of geometrical and range errors on NTCPs (NTCP spreads < 3%) showed that the MBS approach is robust.It showed that the nominal plan can be generally used as a good estimator to refer patients for PT.Furthermore, the probabilistic assessment of NTCPs could also potentially improve NTCP-based protocols for PT patient selection.The inclusion of bandwidths to the DNTCP thresholds of the protocol could particularly benefit some VMAT borderline patients, who had a non-negligible fraction of simulated treatments qualifying for IMPT.
PCE has demonstrated to be a powerful robustness evaluation tool compared to other robustness evaluation methods [11,26], as it enables a comprehensive robustness analysis assuming a large number of sampled dose distributions.However, to convert the technical superiority into clinical success might not be a trivial task.Clinically introduced methods to present valuable and intuitive spatial information, such as the slice-by-slice visualization of under-and over-dose areas within the CTV given by traditional PTV-based and voxel-wise evaluations, need to be established for PCE or any other probabilistic plan evaluation approach.A first step could be to compute with PCE voxel-wise doses, in which VWmin and VWmax dose distributions are generated from the 90% of the PCE sampled doses, excluding the most extreme scenarios [SM, section S7].The high correlation shows that such a method would not lead to a systematic bias when used to replace PTV/VWmin evaluations.PCE could also generate probability dose maps -per voxel probability values to have a dose below/above a certain dose constraint -to spatially locate areas with a high under/overdose probability.
Our study is limited to isocentric errors, which were modelled as rigid shifts and SPP errors for IMPT.As our goal was to probabilistically compare initial plans, anatomical variations were not included, although the same methodology could be applied at any time during the course of treatment when diagnostic quality verification CTs are acquired.The combination of PCE-based robustness evaluations with online or offline adaptive strategies could assess the suitability of a plan with the patient anatomy of the day, potentially improving clinical decisions for plan adaptations [27,28].Residual patient rotations were not considered in this study.
As PCE is an analytical approximation of the dose engine, a proper validation of the model is required.The PCE model parameters were selected such that, for sufficiently large errors up to 4.5 mm, PCE dose distributions agrees within 1% with the clinical doses.The large number of spots per energy layer used in the clinical IMPT plans lead to highly heterogeneous dose distributions, which compromised PCE accuracy.Thus, the accuracy of the PCE models for IMPT was increased using a low MC uncertainty of 0.1%.PCE accuracy could also be improved by increasing (i) the order of the polynomial basis vectors and (ii) the amount of clinical doses required during the PCE construction, which however also increases the computational time to construct the model.For more complex treatment sites, the parametrization of uncertainties might also be difficult or even impossible e.g., breathing-related intra-fraction errors, in which a combination of PCE with other robustness approaches could be used [29].

Conclusion
Despite the differences between photon and proton planning, the comparison between PTV-based VMAT and robustly optimized IMPT after the proton plan calibration is consistent, with a slightly higher dose to the CTV for IMPT.The MBS procedure is moderately impacted by the effect of geometrical and range errors on NTCP, showing that the nominal plan is a good estimator to refer patients for PT.Finally, to the best of our knowledge, this study is the first to present comprehensive probabilistic plan evaluation metrics on the CTV to assess robustness between proton and photon treatments, which are consistent with clinical PTV-based and voxelwise metrics and could be used in the future for probabilistic planning.
Radiotherapy and Oncology 186 (2023) 109729 Contents lists available at ScienceDirect Radiotherapy and Oncology j o u r n a l h o m e p a g e : w w w .t h e g r e e n j o u r n a l .c o m

Fig. 1 .
Fig.1.PCE-based robustness evaluation workflow.First, clinical VMAT and IMPT plans were normalized to their prescription doses.This means that VMAT and IMPT treatment plans were normalized to the PTV-D 98% and VWmin-D 98%,CTV , as found in FigureS1 and S2.Second, 100,000 complete fractionated treatments were simulated.Scenario dose-volume parameter distributions were then calculated according to the simulated treatments.Correlations of the dose-volume metrics for the CTV and OARs between both modalities were done considering percentiles of the scenario distributions.

Fig. 2 .
Fig.2.Comparison of clinical VWmin-D 98%,CTV (A) and PTV-D 98% (B) against the scenario D 99.8%,CTV distribution.The points correspond to the 10 th percentile of the sampled treatments with PCE, with an upper error bar representing the median value of the scenario D 99.8%,CTV distribution.The red and blue dashed lines represent the planning constraints on the CTV 54.25 and CTV 70.00 , while the black and green dashed lines correspond to the identity and regression lines respectively.

Fig. 3 .
Fig.3.Population D 99.8%,CTV (A), D 0.2%,CTV (B), Homogeneity (C) and Conformity indexes (D) and TCP T (E) comparison between the VMAT and IMPT plans.The points correspond to the median value of the sampled treatments with PCE, with error bars representing the 10 th percentile for the D 99.8%,CTV , the 90 th percentile for the D 0.2%,CTV , and the 95 th and 5 th percentiles for the HI, the CI and the TCP T .The black dashed line corresponds to the identity line.

Fig. 4 .
Fig. 4. Population D mean for the main relevant OARs (A and B) and NTCP for grade II (C) and grade III (D) xerostomia and dysphagia comparison between the VMAT and IMPT plans.The points correspond to the median value of the sampled treatments with PCE, with error bars representing the 95 th and 5 th percentiles.The dashed lines correspond to the identity line (black) and to the threshold to qualify for IMPT (red).