Advertisement

The TRENDY multi-center randomized trial on hepatocellular carcinoma – Trial QA including automated treatment planning and benchmark-case results

Published:October 16, 2017DOI:https://doi.org/10.1016/j.radonc.2017.09.007

      Abstract

      Background and purpose

      The TRENDY trial is an international multi-center phase-II study, randomizing hepatocellular carcinoma (HCC) patients between transarterial chemoembolization (TACE) and stereotactic body radiation therapy (SBRT) with a target dose of 48–54 Gy in six fractions. The radiotherapy quality assurance (QA) program, including prospective plan feedback based on automated treatment planning, is described and results are reported.

      Materials and methods

      Scans of a single patient were used as a benchmark case. Contours submitted by nine participating centers were compared with reference contours. The subsequent planning round was based on a single set of contours. A total of 20 plans from participating centers, including 12 from the benchmark case, 5 from a clinical pilot and 3 from the first study patients, were compared to automatically generated VMAT plans.

      Results

      For the submitted liver contours, Dice Similarity Coefficients (DSC) with the reference delineation ranged from 0.925 to 0.954. For the GTV, the DSC varied between 0.721 and 0.876. For the 12 plans on the benchmark case, healthy liver normal-tissue complication probabilities (NTCPs) ranged from 0.2% to 22.2% with little correlation between NCTP and PTV-D95% (R2 < 0.3). Four protocol deviations were detected in the set of 20 treatment plans. Comparison with co-planar autoVMAT QA plans revealed these were due to too high target dose and suboptimal planning. Overall, autoVMAT resulted in an average liver NTCP reduction of 2.2 percent point (range: 16.2 percent point to −1.8 percent point, p = 0.03), and lower doses to the healthy liver (p < 0.01) and gastrointestinal organs at risk (p < 0.001).

      Conclusions

      Delineation variation resulted in feedback to participating centers. Automated treatment planning can play an important role in clinical trials for prospective plan QA as suboptimal plans were detected.

      Keywords

      Quality assurance (QA) is essential to clinical trials involving radiation therapy, as protocol violations may seriously impact trial outcome [
      • Weber D.C.
      • Tomsej M.
      • Melidis C.
      • Hurkmans C.W.
      QA makes a clinical trial stronger: evidence-based medicine in radiation therapy.
      ,
      • Fairchild A.
      • Straube W.
      • Laurie F.
      • Followill D.
      Does quality of radiation therapy predict outcomes of multicenter cooperative group trials? A literature review.
      ,
      • Moore K.L.
      • Schmidt R.
      • Moiseenko V.
      • et al.
      Quantifying unnecessary normal tissue complication risks due to suboptimal planning: a secondary study of RTOG 0126.
      ,
      • Ibbott G.
      • Haworth A.
      • Followill D.
      Quality Assurance for Clinical Trials.
      ,
      • Fairchild A.
      • Aird E.
      • Fenton P.A.
      • et al.
      EORTC Radiation Oncology Group quality assurance platform: establishment of a digital central review facility.
      ,
      • Ohri N.
      • Shen X.
      • Dicker A.P.
      • Doyle L.A.
      • Harrison A.S.
      • Showalter T.N.
      Radiotherapy protocol deviations and clinical outcomes: a meta-analysis of cooperative group clinical trials.
      ,
      • Williams M.J.
      • Bailey M.J.
      • Forstner D.
      • Metcalfe P.E.
      Multicentre quality assurance of intensity-modulated radiation therapy plans: a precursor to clinical trials.
      ]. This holds for all trial aspects, including delineation [
      • Joye I.
      • Lambrecht M.
      • Jegou D.
      • Hortobágyi E.
      • Scalliet P.
      • Haustermans K.
      Does a central review platform improve the quality of radiotherapy for rectal cancer? Results of a national quality assurance project.
      ,
      • Steenbakkers R.J.
      • Duppen J.C.
      • Fitton I.
      • et al.
      Reduction of observer variation using matched CT-PET for lung cancer delineation: a three-dimensional analysis.
      ,
      • Brouwer C.L.
      • Steenbakkers R.J.
      • van den Heuvel E.
      • et al.
      3D Variation in delineation of head and neck organs at risk.
      ,
      • Berry S.L.
      • Boczkowski A.
      • Ma R.
      • Mechalakos J.
      • Hunt M.
      Interobserver variability in radiation therapy plan output: results of a single-institution study.
      ] and planning [
      • Li N.
      • Carmona R.
      • Sirak I.
      • et al.
      Highly efficient training, refinement, and validation of a knowledge-based planning quality-control system for radiation therapy clinical trials.
      ,
      • Berry S.L.
      • Ma R.
      • Boczkowski A.
      • Jackson A.
      • Zhang P.
      • Hunt M.
      Evaluating inter-campus plan consistency using a knowledge based planning model.
      ,
      • Nelms B.E.
      • Robinson G.
      • Markham J.
      • et al.
      Variation in external beam treatment plan quality: an inter-institutional study of planners and planning systems.
      ].
      The TRENDY trial (registered as NCT02470533 on clinicaltrials.gov) is an international multi-center clinical trial in which patients with hepatocellular carcinoma (HCC) are randomized between transarterial chemoembolization (TACE) with drug-eluting beads in the standard arm and stereotactic body radiation therapy (SBRT) in the experimental arm. Radiotherapy is delivered in six fractions with a total target dose of 48–54 Gy. The primary endpoint is time to progression. Secondary endpoints include time to local recurrence, response rate (partial or complete), overall survival, toxicity and quality of life. Patients are accrued from eleven centers in four different countries. Radiotherapy is delivered in ten of these centers, all experienced with liver SBRT; patients from one center are referred to Erasmus MC for radiotherapy.
      Due to underlying liver disease (cirrhosis), the patient population in the TRENDY trial is prone to developing hepatic toxicity after radiotherapy, and sparing of the healthy liver is crucial in these patients [
      • Dawson L.A.
      • Normolle D.
      • Balter J.M.
      • McGinn C.J.
      • Lawrence T.S.
      • Ten Haken R.K.
      Analysis of radiation-induced liver disease using the Lyman NTCP model.
      ]. As radiotherapy of the liver is technologically challenging, an extensive QA program, comparable to that of other recent radiotherapy trials on primary liver cancer [

      Radiation Therapy Oncology Group. Randomized Phase III Study of Sorafenib versus Stereotactic Body Radiation Therapy Followed by Sorafenib in Hepatocellular Carcinoma. RTOG 1112 version date 11/30/2012.

      ,

      NRG Oncology. Randomized Phase III Study of Focal Radiation Therapy for Unresectable, Localized Intrahepatic Cholangiocarcinoma. NRG-GI001 version date 6/30/2015.

      ], has been developed. QA guidelines and recommendations are outlined in a QA protocol, included in the supplementary material. The trial QA is coordinated by a QA team, consisting of two radiation oncologists and two medical physicists from different participating centers. The QA team is supported by two radiologists for advice on target definition. Prior to inclusion of patients, centers are supposed to recently have had an external beam output audit. Also, they need to fill out a trial-specific facility questionnaire explaining in detail technical aspects of the applied procedures for treating HCC patients. In addition, centers have to prepare delineation and treatment planning on a benchmark case [
      • Melidis C.
      • Bosch W.R.
      • Izewska J.
      • et al.
      Global harmonization of quality assurance naming conventions in radiation therapy clinical trials.
      ]. During patient accrual, the QA protocol accommodates patient-specific prospective feedback (prior to treatment) on target definition, organ at risk (OAR) delineation and treatment planning. It defines minor and major protocol deviations. When a major deviation is detected during prospective monitoring by the QA team, the participating center needs to first improve the contouring and/or planning before start of treatment. Retrospectively detected major deviations have to be avoided for future patients from the same center. Minor deviations should be avoided when possible. For each submitted patient, a volumetric-modulated arc therapy (VMAT) plan is generated at Erasmus MC on behalf of the QA team, using fully automated multi-criterial plan generation (autoVMAT) [
      • Voet P.W.
      • Dirkx M.L.
      • Breedveld S.
      • Al-Mamgani A.
      • Incrocci L.
      • Heijmen B.J.
      Fully automated volumetric modulated arc therapy plan generation for prostate cancer patients.
      ,
      • Sharfo A.W.
      • Breedveld S.
      • Voet P.W.
      • et al.
      Validation of fully automated VMAT plan generation for library-based plan-of-the-day cervical cancer radiotherapy.
      ]. AutoVMAT planning results are provided to the treating center. The aim is to provide prospective feedback for as many patients as possible, but at the very least for the first three patients from each center. Depending on feedback on the first patients, subsequent patients can be reviewed either prospectively or retrospectively (after start of treatment). To make prospective feedback practically feasible, participating centers are invited to submit at least three days prior to the planned start of treatment all images, contours and the treatment plan.
      This paper contains quantitative analyses of the contouring and treatment planning benchmark-case results and reports on the first experiences with treatment plan feedback based on autoVMAT.

      Materials and methods

      Target definition and contouring guidelines

      The target lesion(s) should be visible on contrast-enhanced CT imaging and are primarily delineated on the arterial contrast phase of the CT scan. Additional diagnostic imaging (MR/PET) is recommended when available. In regions with poor visibility of tumor edges, generous delineation is required to avoid tumor miss. General recommendations for OAR delineation are outlined in the study protocol. Additional published guidelines were supplied to support delineation [
      • Jabbour S.K.
      • Hashem S.A.
      • Bosch W.
      • et al.
      Upper abdominal normal organ contouring guidelines and atlas: a Radiation Therapy Oncology Group consensus.
      ]. OARs that have to be fully delineated in 3D are: liver, kidneys, stomach, heart and gallbladder. For the required calculation of the liver normal-tissue complication probability (NTCP, see below), a healthy liver structure has to be created, defined as the full liver minus the gross tumor volume (GTV), and minus parts of the liver that are not functional such as regions previously treated with radio-frequency ablation (RFA) (if present). Partial delineations are allowed for spinal cord, duodenum, esophagus, and bowel (in case areas of the small and/or large bowel are located close to the tumor or could be located in high dose gradients).

      Planning constraints and objectives

      In treatment planning, the goal is to irradiate the full PTV with at least 48 Gy, or, if possible within planning constraints, up to 54 Gy, in six fractions. General recommendations for treatment planning and planning constraints are outlined in the trial protocol and summarized in the QA protocol, where also minor and major deviations are defined. The healthy liver is an important OAR. Several planning constraints are used to guarantee safe liver dose delivery, in particular, NTCP ≤ 5%. An NTCP between 5% and 10% qualifies as a minor deviation. For NTCP calculation, the healthy liver dose-volume histogram (DVH) is first converted into a DVH for 1.5 Gy/fraction dose delivery in every voxel using α/β = 2.5 Gy. Then the NTCP is calculated using TD50 = 39.8 Gy, m = 0.12, n = 0.97 [
      • Dawson L.A.
      • Normolle D.
      • Balter J.M.
      • McGinn C.J.
      • Lawrence T.S.
      • Ten Haken R.K.
      Analysis of radiation-induced liver disease using the Lyman NTCP model.
      ,
      • Dawson L.A.
      • Eccles C.
      • Craig T.
      Individualized image guided iso-NTCP based liver cancer SBRT.
      ]. A software implementation for NTCP calculation has been developed and distributed among the participating radiotherapy centers. In addition, the average dose to the healthy parts of the liver should be less than 22 Gy [
      • Dawson L.A.
      • Eccles C.
      • Craig T.
      Individualized image guided iso-NTCP based liver cancer SBRT.
      ] and, based on [
      • Son S.H.
      • Choi B.O.
      • Ryu M.R.
      • et al.
      Stereotactic body radiotherapy for patients with unresectable primary hepatocellular carcinoma: dose-volumetric parameters predicting the hepatic complication.
      ] and α/β = 3 Gy, at least 800 cc should receive less than 23.4 Gy.

      Benchmark case

      For the benchmark case, data from a 64-year-old male patient, treated at Erasmus MC, was used. Initially, he had an HCC in segment V-VI, for which he underwent surgery. Three years later, there was a 4 cm lesion visible on CT, compatible with HCC, close to the right branch of the portal vein. There were no enlarged lymph nodes, nor lung metastases. The AFP blood level was 3. MRI showed a 4 cm lesion with wash-out surrounded by a capsule, compatible with HCC. The tumor was located in segment V, transition to VIII and toward segment IV.
      In the delineation round, participating centers were asked to define the target (GTV), and delineate the liver for this patient. Arterial and venous phase contrast CT scans (with 2.5 mm slice separation), diagnostic MR images (with 2 mm and 7.7 mm slice separation), and a description of the relevant medical history of the patient were provided. The centers were asked to delineate the liver and GTV according to the trial protocol. The contours were evaluated based on consensus among the medical doctors in the QA team, supported by a radiologist for the GTV. Quantitative comparisons of 3D volumes generated from submitted and reference contours, as defined by the QA team, were based on; the volume difference, the average distance in 3D, the center-of-mass (COM) shift, and the Dice Similarity Coefficient (DSC). The DSC of two volumes is defined as the overlap, divided by the average volume.
      In the subsequent planning round, the centers generated for the same patient a treatment plan according to the study protocol, based on contours used for the initial treatment of the patient at Erasmus MC. The results of the treatment planning for the benchmark case were primarily evaluated based on the treatment planning protocol, see the dosimetric parameters listed in Table 1. Also, correlations between the achieved target dose and the NTCP and OAR doses were investigated.
      Table 1Benchmark-case results and protocol requirements on treatment planning. Roman numbers (I, II, …) refer to the institutes and revised plans from the same institutes are indicated with an asterisk (*).
      ItemProtocolIIIIIIIII*IVVVIVIIVII*VIIIIXXAuto
      Modalityvmatvmatimrtvmatckvmatckvmatvmatvmatvmatvmatvmat
      Energy [MV]10666610610101010610
      Beams/Arcs22942112122
      Gantry angles180–40180–020,55, 145,170, 195,220, 265,318, 330180–180180–0180–30280–40180–90180–50180–180190–175
      MUs17751640155218891413145913201778248932361938
      Dmax PTV [Gy]≤7270.565.667.569.570.165.068.571.668.569.670.371.470.2
      VPTV ≥ 48 Gy [%]>9510010010010010010010010099.310099.6100100
      VPTV ≥ 54 Gy [%]99.699.699.698.799.498.493.110080.590.193.598.194.9
      Dmean Liver – GTV [Gy]≤ 2215.815.717.014.915.915.417.118.414.217.214.714.615.0
      VLiver-GTV < 23.4 Gy [cc]>800925924897965962934926856948904955962924
      NTCPLiver-GTV [%]≤52.43.27.00.82.11.23.822.20.24.40.50.40.9
      Dmax Stomach/ Bowel [Gy]<3916.216.321.413.918.616.417.418.014.318.315.218.611.9
      V30Gy Stomach/ Bowel [cc]<50000000000000
      Dmax Esophagus [Gy]≤361.51.91.31.51.51.77.34.23.33.31.41.81.6
      Dmax Spinal Cord [Gy]<246.78.714.711.85.17.06.26.05.58.67.417.88.0
      D2/3 right Kidney [Gy]<19.20.90.93.815.40.73.61.41.12.00.71.11.0
      D2/3 left Kidney [Gy]<19.22.93.70.52.72.93.01.63.23.83.82.85.73.6
      Avoidable hotspotsNoneNoneNoneNoneNoneNoneNoneNoneNoneNoneNoneNoneNoneNone

      Automated treatment planning

      Based on the trial constraints and objectives, fully automated plan generation for HCC was implemented in the in-house developed Erasmus-iCycle software for fully automatic, prioritized multi-criteria optimization. Erasmus-iCycle was used to generate an individualized planning template for fully automatic generation of a clinically deliverable coplanar VMAT plan with the Monaco TPS (Elekta AB, Stockholm, Sweden) [
      • Voet P.W.
      • Dirkx M.L.
      • Breedveld S.
      • Al-Mamgani A.
      • Incrocci L.
      • Heijmen B.J.
      Fully automated volumetric modulated arc therapy plan generation for prostate cancer patients.
      ,
      • Sharfo A.W.
      • Breedveld S.
      • Voet P.W.
      • et al.
      Validation of fully automated VMAT plan generation for library-based plan-of-the-day cervical cancer radiotherapy.
      ]. AutoVMAT was primarily used to assess the quality of treatment planning, both for the benchmark case and for clinical trial patients. To this end, fully automatically generated co-planar VMAT plans were compared to 20 treatment plans for 9 patients; 12 for the benchmark case (1 patient, see below), 5 from a clinical pilot study (5 patients), and 3 from the first 3 patients included in the trial so far. AutoVMAT plans were scaled to the clinical target dose (PTV-D95%, dose to 95% of the PTV). The target dose for the benchmark case and one of the trial patients was 54 Gy in six fractions. For the other 7 patients, it was around 48 Gy, also in six fractions.

      Dose escalation with autoVMAT

      Since the trial protocol recommends to cover as much of the PTV as possible with 54 Gy in six fractions, in addition, autoVMAT was also used to assess for how many patients dose escalation to 54 Gy was possible without violating any planning constraint.

      Results

      Benchmark case – GTV and liver delineation

      All ten participating radiotherapy centers completed the contouring. Two institutes primarily contoured on the venous instead of the arterial contrast phase of the CT. All contours were considered acceptable, although variation in both GTV and liver contours was statistically significant (see below). Two axial CT slices with submitted contours are shown in Fig. 1, where also the reference contours are displayed. Several centers delineated parts of the gallbladder, vena cava and diaphragm as liver.
      Figure thumbnail gr1
      Fig. 1Axial slices of the benchmark-case CT scan with liver contours (submitted: yellow, dashed; reference: orange, solid) and GTV contours (submitted: pink, dashed; reference: purple, solid).
      The results of the quantitative comparisons with the reference contours, are shown in Fig. 2. Submitted liver volumes are on average 101.4 cc = 8.0% larger than the reference (range 2.2–12.0%, p < 0.001), while submitted GTVs are on average 12.6 cc = 24% smaller than the reference (range −2.8% to 43.2%, p < 0.001). The average distance in 3D between submitted and reference volumes ranges from 1.7 to 3.5 mm for the liver and from 1.2 to 2.4 mm for the GTV. Center-of-mass shifts range from 0.8 to 2.4 mm and from 1.7 to 4.6 mm respectively. DSCs of submitted and reference volumes range from 0.925 to 0.954 for the liver and from 0.721 to 0.876 for the GTV.
      Figure thumbnail gr2
      Fig. 2Differences between submitted contours and reference contours for all centers, excluding Erasmus MC. Liver: orange, GTV: purple (COM = center of mass). Distances are based on 3D volumes and the DSC of two volumes is defined as the overlap, divided by the average volume. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

      Benchmark case – treatment planning

      Treatment planning results are summarized in Table 1. Two centers (III and VII) did not meet the NTCP constraint initially and re-planned the benchmark case after feedback had been provided. The revised plans are labeled III and VII respectively. For reference, also an autoVMAT plan with a target dose of 54 Gy to 95% of the PTV was generated, see Table 1.
      PTV dose homogeneity and conformity vary, with some institutes aiming at a high target dose allowing for large dose gradients in the GTV-PTV margin, and others optimizing for a smoother, more homogeneous, dose distribution. Within the set of 12 plans for the benchmark case, all based on the same CT and structure set, there is little correlation between PTV-D95% and the achieved NTCP value (R2 = 0.3, R2 < 0.1 for the 10 protocol compliant plans), and hardly any between PTV-D95% and OAR doses (R2 < 0.2 for esophagus, stomach, duodenum and bowel), see Fig. 3a and b. This suggests that the observed differences in dose distribution, especially those within the subset of protocol compliant plans, are largely due to variations in treatment planning and treatment technique.
      Figure thumbnail gr3a
      Fig. 3a: Liver-NTCP [%] versus D95% for the PTV for 12 plans of the benchmark case. Minor protocol deviations are represented in orange, major deviations in red. b: Gastrointestinal OAR doses versus D95% for the PTV for 12 plans of the benchmark case.
      Figure thumbnail gr3b
      Fig. 3a: Liver-NTCP [%] versus D95% for the PTV for 12 plans of the benchmark case. Minor protocol deviations are represented in orange, major deviations in red. b: Gastrointestinal OAR doses versus D95% for the PTV for 12 plans of the benchmark case.

      Plan QA with autoVMAT

      Within the set of 20 plans, two major and two minor deviations were detected, all with respect to the NTCP. With autoVMAT, one major NTCP deviation of 22.2% was reduced to a minor deviation of 7.5%, while the other was reduced from 28.7% to 12.5%, which is still not protocol compliant. For both these plans, the deviation was caused in part by the dose to the PTV being higher than feasible within the NTCP constraint. The autoVMAT plans could be scaled to fully compliant plans with NTCP values of 0.9% and 3.8%, respectively, by reducing the PTV-D95% from 58.8 Gy to 54.0 Gy and from 50.6 Gy to 48.0 Gy. The major NTCP deviations were detected in one plan from the benchmark case and one from the clinical pilot. No patients were treated with these plans. In addition, with autoVMAT, two minor NTCP deviations of respectively 8.3% and 7.0%, one in another plan from the benchmark case and one in another plan from the clinical pilot, could be reduced to protocol compliant values of 2.5% and 3.8%, respectively, with the same PTV-D95%.
      Although some plans using substantially more modulation/monitor units (MUs) and some non-coplanar (Cyberknife) plans outperform the corresponding co-planar autoVMAT plan, in most cases, autoVMAT resulted in a superior plan, with an average NTCP reduction of 2.2 percent point (range: 16.2 percent point to −1.8 percent point, p = 0.03), a lower mean dose to the healthy liver (p < 0.01) and lower doses to gastrointestinal OARs (p < 0.001), see Fig. 4a and b. For 7 of the 20 plans, including 4 with a protocol deviation, autoVMAT achieved a reduction of the liver NTCP by more than 2 percent point without compromising any other OAR. A reduction of the gastrointestinal (GI) maximum dose(s) of more than 5.0 Gy was achieved for 4 additional plans, without sacrificing the NTCP or increasing the dose delivered to other OARs.
      Figure thumbnail gr4a
      Fig. 4a: Liver-NTCP for submitted treatment plans versus autoVMAT. Minor protocol deviations are represented in orange, major deviations in red. b: Gastrointestinal OAR doses for submitted treatment plans versus autoVMAT.
      Figure thumbnail gr4b
      Fig. 4a: Liver-NTCP for submitted treatment plans versus autoVMAT. Minor protocol deviations are represented in orange, major deviations in red. b: Gastrointestinal OAR doses for submitted treatment plans versus autoVMAT.

      Potential for dose escalation analyzed with autoVMAT

      For 7 out of the 9 patients considered here, the clinical target dose was around 48 Gy in six fractions. Using autoVMAT, we investigated if dose escalation to 54 Gy in six fractions was possible within planning constraints for the OARs. For one patient, a target dose larger than 48 Gy was not possible due to an overlap between the PTV and part of the bowel. For another patient, it was not possible to achieve more than 48 Gy without violating the NTCP constraint. For the other 5 patients that were clinically treated to 48 Gy, dose escalation to 54 Gy could be achieved with autoVMAT. In practice, however, clinical considerations other than what can be achieved in treatment planning play a role in deciding on target dose.

      Discussion

      As part of the TRENDY multi-center randomized trial, an extensive QA program has been implemented, including a benchmark case and prospective (prior to treatment delivery) and retrospective feedback on delineation and treatment planning. Prospective and retrospective feedback during patient accrual promotes the quality and uniformity of SBRT treatment in the trial, but has the possible disadvantage of the trial being less representative for normal clinical practice, especially during the start-up phase when revision of delineation and treatment planning is more likely to be required. Since both the trial protocols, and all QA results are available, other centers should, however, be able to reproduce all (future) trial results.
      Submitted benchmark-case delineations were evaluated by comparison with a reference, based on consensus within an expert panel. The submitted delineations of liver and GTV are small in volume compared to the total volume represented by the CT scan, in which case the DSC becomes comparable to kappa values for inter-observer variation in delineation (see supplementary material), as reported for HCC in [
      • Jabbour S.K.
      • Hashem S.A.
      • Bosch W.
      • et al.
      Upper abdominal normal organ contouring guidelines and atlas: a Radiation Therapy Oncology Group consensus.
      ,
      • Hong T.S.
      • Bosch W.R.
      • Krishnan S.
      • et al.
      Interobserver variability in target definition for hepatocellular carcinoma with and without portal vein thrombus: radiation therapy oncology group consensus guidelines.
      ,
      • Gkika E.
      • Tanadini-Lang S.
      • Kirste S.
      • et al.
      Interobserver variability in target volume delineation of hepatocellular carcinoma: an analysis of the working group “Stereotactic Radiotherapy” of the German Society for Radiation Oncology (DEGRO).
      ]. Based on the widely accepted and used interpretation of kappa values in inter-observer studies [
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      ,
      • Allozi R.
      • Li X.A.
      • White J.
      • et al.
      Tools for Consensus Analysis of Experts’ contours for radiotherapy structure definitions.
      ], we conclude that there is almost perfect agreement (DSC between 0.8 and 1.0) between all submitted liver contour sets and six of nine GTV contour sets, and that there is substantial agreement (DSC between 0.6 and 0.8) between the other three GTV contour sets and the reference. The DSCs for the liver contours are comparable in magnitude to the kappa values reported in inter-observer studies [
      • Jabbour S.K.
      • Hashem S.A.
      • Bosch W.
      • et al.
      Upper abdominal normal organ contouring guidelines and atlas: a Radiation Therapy Oncology Group consensus.
      ,
      • Hong T.S.
      • Bosch W.R.
      • Krishnan S.
      • et al.
      Interobserver variability in target definition for hepatocellular carcinoma with and without portal vein thrombus: radiation therapy oncology group consensus guidelines.
      ,
      • Gkika E.
      • Tanadini-Lang S.
      • Kirste S.
      • et al.
      Interobserver variability in target volume delineation of hepatocellular carcinoma: an analysis of the working group “Stereotactic Radiotherapy” of the German Society for Radiation Oncology (DEGRO).
      ]. Both individual feedback and general recommendations, based on the observed delineation variations, have been provided to participating centers.
      Feedback on treatment planning was based both on the trial protocol and results obtained with fully automated treatment planning. The added value of autoVMAT for plan QA is twofold. It contributes to understanding the nature of protocol deviations in treatment planning, which are often due to suboptimal planning combined with (too) much priority being given to target dose, or to some OAR(s) at the cost of others. In such cases, dosimetric autoVMAT results provide a solution approach, thereby reducing both the number of deviations and patient loss in the trial. In addition, autoVMAT results can be used to identify, quantify and monitor variations in protocol compliant plans, even for an individual patient. Although replanning is not mandatory in such cases, this contributes to the objective and systematic evaluation of submitted treatment plans. AutoVMAT results show what can be achieved in treatment planning, and individual feedback based (in part) on these results, may promote the quality and uniformity of treatment planning for future patients from the same center.
      When more patients from multiple centers are included in the trial, automated treatment planning will be applied to identify systematic variations in treatment planning between participating centers and to monitor improvements in treatment planning. Although it is expected that variation in submitted dose distributions is mostly due to treatment planning, at this point it is, strictly speaking, not possible to separately study the impact of treatment planning and treatment technique. In the longer run, dedicated fully automated treatment planning adjusted to treatment delivery techniques in participating centers could be implemented as a part of clinical trials.
      As has been demonstrated previously, knowledge-based [
      • Moore K.L.
      • Schmidt R.
      • Moiseenko V.
      • et al.
      Quantifying unnecessary normal tissue complication risks due to suboptimal planning: a secondary study of RTOG 0126.
      ] and automated treatment planning [
      • Sharfo A.W.
      • Dirkx M.L.
      • Bijman R.G.
      • et al.
      Late toxicity in HYPRO randomized trial analyzed by automated planning and intrinsic NTCP modelling.
      ] may, in retrospect, reveal suboptimal treatment planning in clinical trials. To the best of our knowledge, the present manuscript constitutes the first report on the use of automated treatment planning for prospective and retrospective plan QA within a clinical trial (while patients are accrued). We have demonstrated that, because of the quality and consistency of automatically generated plans, they can assist in evaluating constraint violations in submitted plans (maybe these can be avoided by changing the plan). Automated planning can also identify suboptimal treatment plans that do fulfill all constraints, but with PTV or OAR dose delivery that could be made better with plan adjustments. Due to the automation, fast feedback on plans submitted by participating centers is feasible.

      Conflict of interest

      None declared.

      Acknowledgements

      The TRENDY trial is supported by the Dutch Cancer Society ( KWF , project number EMCR 2014-6973). We thank Coen Hurkmans, Sebastiaan Breedveld, Chrysi Papalazarou, Wilco Schillemans, Yvette Seppenwoolde, András Zolnay, Maarten Dirkx, and Joan Penninkhof for support, feedback and discussions.

      Appendix A. Supplementary data

      References

        • Weber D.C.
        • Tomsej M.
        • Melidis C.
        • Hurkmans C.W.
        QA makes a clinical trial stronger: evidence-based medicine in radiation therapy.
        Radiother Oncol. 2012; 105: 4-8
        • Fairchild A.
        • Straube W.
        • Laurie F.
        • Followill D.
        Does quality of radiation therapy predict outcomes of multicenter cooperative group trials? A literature review.
        Int J Radiat Oncol Biol Phys. 2013; 87: 246-260
        • Moore K.L.
        • Schmidt R.
        • Moiseenko V.
        • et al.
        Quantifying unnecessary normal tissue complication risks due to suboptimal planning: a secondary study of RTOG 0126.
        Int J Radiat Oncol Biol Phys. 2015; 92: 228-235
        • Ibbott G.
        • Haworth A.
        • Followill D.
        Quality Assurance for Clinical Trials.
        Front Oncol. 2013; 3: 311
        • Fairchild A.
        • Aird E.
        • Fenton P.A.
        • et al.
        EORTC Radiation Oncology Group quality assurance platform: establishment of a digital central review facility.
        Radiother Oncol. 2012; 103: 279-286
        • Ohri N.
        • Shen X.
        • Dicker A.P.
        • Doyle L.A.
        • Harrison A.S.
        • Showalter T.N.
        Radiotherapy protocol deviations and clinical outcomes: a meta-analysis of cooperative group clinical trials.
        JNCI. 2013; 105: 387-393https://doi.org/10.1093/jnci/djt001
        • Williams M.J.
        • Bailey M.J.
        • Forstner D.
        • Metcalfe P.E.
        Multicentre quality assurance of intensity-modulated radiation therapy plans: a precursor to clinical trials.
        Australas Radiol. 2007; 51: 472-479https://doi.org/10.1111/j.1440-1673.2007.01873.x
        • Joye I.
        • Lambrecht M.
        • Jegou D.
        • Hortobágyi E.
        • Scalliet P.
        • Haustermans K.
        Does a central review platform improve the quality of radiotherapy for rectal cancer? Results of a national quality assurance project.
        Radiother Oncol. 2014; 111: 400-405
        • Steenbakkers R.J.
        • Duppen J.C.
        • Fitton I.
        • et al.
        Reduction of observer variation using matched CT-PET for lung cancer delineation: a three-dimensional analysis.
        Int J Radiat Oncol Biol Phys. 2006; 64: 435-448
        • Brouwer C.L.
        • Steenbakkers R.J.
        • van den Heuvel E.
        • et al.
        3D Variation in delineation of head and neck organs at risk.
        Radiat Oncol. 2012; 7: 32https://doi.org/10.1186/1748-717X-7-32
        • Berry S.L.
        • Boczkowski A.
        • Ma R.
        • Mechalakos J.
        • Hunt M.
        Interobserver variability in radiation therapy plan output: results of a single-institution study.
        Pract Radiat Oncol. 2016; 6: 442-449https://doi.org/10.1016/j.prro.2016.04.005
        • Li N.
        • Carmona R.
        • Sirak I.
        • et al.
        Highly efficient training, refinement, and validation of a knowledge-based planning quality-control system for radiation therapy clinical trials.
        Int J Radiat Oncol Biol Phys. 2017; 97: 164-172
        • Berry S.L.
        • Ma R.
        • Boczkowski A.
        • Jackson A.
        • Zhang P.
        • Hunt M.
        Evaluating inter-campus plan consistency using a knowledge based planning model.
        Radiother Oncol. 2016; 120: 349-355
        • Nelms B.E.
        • Robinson G.
        • Markham J.
        • et al.
        Variation in external beam treatment plan quality: an inter-institutional study of planners and planning systems.
        Pract Radiat Oncol. 2012; 2: 296-305https://doi.org/10.1016/j.prro.2011.11.012
        • Dawson L.A.
        • Normolle D.
        • Balter J.M.
        • McGinn C.J.
        • Lawrence T.S.
        • Ten Haken R.K.
        Analysis of radiation-induced liver disease using the Lyman NTCP model.
        Int J Radiat Oncol Biol Phys. 2002; 53: 810-821
      1. Radiation Therapy Oncology Group. Randomized Phase III Study of Sorafenib versus Stereotactic Body Radiation Therapy Followed by Sorafenib in Hepatocellular Carcinoma. RTOG 1112 version date 11/30/2012.

      2. NRG Oncology. Randomized Phase III Study of Focal Radiation Therapy for Unresectable, Localized Intrahepatic Cholangiocarcinoma. NRG-GI001 version date 6/30/2015.

        • Melidis C.
        • Bosch W.R.
        • Izewska J.
        • et al.
        Global harmonization of quality assurance naming conventions in radiation therapy clinical trials.
        Int J Radiat Oncol Biol Phys. 2014; 90: 1242-1249https://doi.org/10.1016/j.ijrobp.2014.08.348
        • Voet P.W.
        • Dirkx M.L.
        • Breedveld S.
        • Al-Mamgani A.
        • Incrocci L.
        • Heijmen B.J.
        Fully automated volumetric modulated arc therapy plan generation for prostate cancer patients.
        Int J Radiat Oncol Biol Phys. 2014; 88: 1175-1179
        • Sharfo A.W.
        • Breedveld S.
        • Voet P.W.
        • et al.
        Validation of fully automated VMAT plan generation for library-based plan-of-the-day cervical cancer radiotherapy.
        PLoS ONE. 2016; 11: e0169202https://doi.org/10.1371/journal.pone.0169202
        • Jabbour S.K.
        • Hashem S.A.
        • Bosch W.
        • et al.
        Upper abdominal normal organ contouring guidelines and atlas: a Radiation Therapy Oncology Group consensus.
        Pract Radiat Oncol. 2014; 4: 82-89https://doi.org/10.1016/j.prro.2013.06.004
        • Dawson L.A.
        • Eccles C.
        • Craig T.
        Individualized image guided iso-NTCP based liver cancer SBRT.
        Acta Oncol. 2006; 45: 856-864
        • Son S.H.
        • Choi B.O.
        • Ryu M.R.
        • et al.
        Stereotactic body radiotherapy for patients with unresectable primary hepatocellular carcinoma: dose-volumetric parameters predicting the hepatic complication.
        Int J Radiat Oncol Biol Phys. 2010; 78: 1073-1080
        • Hong T.S.
        • Bosch W.R.
        • Krishnan S.
        • et al.
        Interobserver variability in target definition for hepatocellular carcinoma with and without portal vein thrombus: radiation therapy oncology group consensus guidelines.
        Int J Radiat Oncol Biol Phys. 2014; 89: 804-813https://doi.org/10.1016/j.ijrobp.2014.03.041
        • Gkika E.
        • Tanadini-Lang S.
        • Kirste S.
        • et al.
        Interobserver variability in target volume delineation of hepatocellular carcinoma: an analysis of the working group “Stereotactic Radiotherapy” of the German Society for Radiation Oncology (DEGRO).
        Strahlenther Onkol. 2017; https://doi.org/10.1007/s00066-017-1177-y
        • Landis J.R.
        • Koch G.G.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159-174
        • Allozi R.
        • Li X.A.
        • White J.
        • et al.
        Tools for Consensus Analysis of Experts’ contours for radiotherapy structure definitions.
        Radiother Oncol. 2010; 97: 572-578https://doi.org/10.1016/j.radonc.2010.06.009
        • Sharfo A.W.
        • Dirkx M.L.
        • Bijman R.G.
        • et al.
        Late toxicity in HYPRO randomized trial analyzed by automated planning and intrinsic NTCP modelling.
        Radiother Oncol. 2017; 123: S126