Original Article| Volume 177, P222-230, December 01, 2022

• Author Footnotes
1 The authors Zhenjiang Li and Wei Zhang contributed equally to this work.
Open AccessPublished:November 11, 2022

## Abstract

### Background and purpose

Deep Learning (DL) technique has shown great potential but still has limited success in online contouring for MR-guided adaptive radiotherapy (MRgART). This study proposed a patient-specific DL auto-segmentation (DLAS) strategy using the patient’s previous images and contours to update the model and improve segmentation accuracy and efficiency for MRgART.

### Methods and materials

A prototype model was trained for each patient using the first set of MRI and corresponding contours as inputs. The patient-specific model was updated after each fraction with all the available fractional MRIs/contours, and then used to predict the segmentation for the next fraction. During model training, a variant was fitted under consistency constraints, limiting the differences in the volume, length and centroid between the predictions for the latest MRI within a reasonable range. The model performance was evaluated for both organ-at-risks and tumors auto-segmentation for a total of 6 abdominal/pelvic cases (each with at least 8 sets of MRIs/contours) underwent MRgART through Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (HD95), and was compared with deformable image registration (DIR) and frozen DL model (no updating after pre-training). The contouring time was also recorded and analyzed.

### Results

The proposed model achieved superior performance with higher mean DSC (0.90, 95 % CI: 0.88–0.95), as compared to DIR (0.63, 95 %CI: 0.59–0.68) and frozen DL models (0.74, 95 % CI: 0.71–0.79). As for tumors, the proposed method yielded a median DSC of 0.95, 95 % CI: 0.94–0.97, and a median HD95 of 1.63 mm, 95 % CI: 1.22 mm-2.06 mm. The contouring time was reduced significantly (p < 0.05) using the proposed method (73.4 ± 6.5 secs) compared to the manual process (12 ∼ 22 mins). The online ART time was reduced to 1650 ± 274 seconds with the proposed method, as compared to 3251.8 ± 447 seconds using the original workflow.

### Conclusion

The proposed patient-specific DLAS method can significantly improve the segmentation accuracy and efficiency for longitudinal MRIs, thereby facilitating the routine practice of MRgART.

## Keywords

Magnetic resonance imaging (MRI) guided adaptive radiotherapy (MRgART) is an emerging cancer treatment technology. A linear accelerator integrated with MRI imaging (e.g., MR-linac[
• Tijssen R.H.N.
• Philippens M.E.P.
• Paulson E.S.
• et al.
MRI commissioning of 1.5 T MR-linac systems–a multi-institutional study[J].
]) and specialized treatment planning software (TPS) enables online adaptation to accommodate patient daily setup uncertainties and anatomical changes due to physiological and treatment-related variations[
• Otazo R.
• Lambin P.
• Pignol J.P.
• et al.
].
Elekta MR-Linac (Unity, Elekta AB, Stockholm, Sweden) provides two different workflows for online treatment plan adaptation [
• Winkel D.
• Bol G.H.
• Kroon P.S.
• et al.
], adapt to position (ATP) and adapt to shape (ATS). ATS allows the online TPS (Monaco 5.40.02, Elekta AB, Stockholm, Sweden) to replan based on the online segmented patient structures on the daily images. The ATP workflow assumes that the patient's anatomy keeps unchanged but with an isocenter shift for a new treatment fraction. Yet, there is a trade-off between the time a radiation oncologist spends to re-contour the target volume and adjacent organs-at-risk (OARs) on the daily MRI against the dosimetric gain.
The efficiency of online plan adaptation is critical since the patient is waiting on the couch, holding a steady treatment position. Manual segmentation can take more than 30 minutes[
• Lamb J.
• Cao M.
• Kishan A.
• et al.
], leading to poor patient experience and affecting treatment compliance. Contour propagation methods using rigid or deformable image registration (DIR) can reduce the workload compared with segmentation from scratch[
• Zhang Y.
• Paulson E.
• Lim S.
• et al.
A patient-specific autosegmentation strategy using multi-input deformable image registration for magnetic resonance imaging-guided online adaptive radiation therapy: a feasibility study[J].
]. However, rigid registration has limited accuracies due to complex geometric uncertainties with the patient's internal anatomy and posture changes. The DIR method can correct the anatomy deformations, but the clinical evaluation results are not satisfactory[
• Christiansen R.L.
• Dysager L.
• Bertelsen A.S.
• et al.
Accuracy of automatic deformable structure propagation for high-field MRI guided prostate radiotherapy[J].
,
• Anaya V.M.
• Fairfoul J.
].
Deep learning auto-segmentation (DLAS) models are well recognized to outperform the existing DIR algorithms[
• Liesbeth V.
• Michaël C.
• Anna M.D.
• et al.
Overview of artificial intelligence-based applications in radiotherapy: recommendations for implementation and quality assurance[J].
,
• Sheng K.
Artificial intelligence in radiotherapy: a technological review[J].
]. X. Tang et al[
• Tang X.
• Rangraz E.J.
• Coudyzer W.
• et al.
Whole liver segmentation based on deep learning and manual adjustment for clinical use in SIRT[J].
] proposed a multi-scale convolutional neural network (CNN) for liver segmentation on CT images. Y. Fu et al[
• Fu Y.
• Mazur T.R.
• Wu X.
• et al.
A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy[J].
] implemented a CNN model to segment several abdominal organs. These studies used more than 100 sets of CT or MR images for model training. However, in practice, it is not easy to collect such large number of high-quality annotated datasets, which limits the performance of the general DLAS models[
• Renard F.
• Guedria S.
• De Palma N.
• et al.
Variability and reproducibility in deep learning for medical image segmentation[J].
]. Also, due to the significant intensity variations for MRIs, a segmentation model trained on datasets from one MRI manufacturer/protocol may not work well with datasets from other manufacturers/protocols[
• Yan W.
• Huang L.
• Xia L.
• et al.
MRI manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for MR images acquired with different scanners[J].
]. The performance issue may also exist even for different patient on the same scanner because of intensity inhomogeneity [
• Vovk U.
• Pernus F.
• Likar B.
A review of methods for correction of intensity inhomogeneity in MRI[J].
]. In addition, when designing treatment plans to treat various diseases and stages, it is necessary to segment many organs at risk (OARs) and gross target volumes (GTVs). A frozen DL model not updatable after pre-training may not meet all clinical needs. All the above have presented a barrier to applying DLAS models in MRgART.
One possible way to enhance the model's generalizability is to make online adjustments to the DL model rather than using the pre-trained model directly. Karani N et al[
• Karani N.
• Erdil E.
• Chaitanya K.
• et al.
Test-time adaptable neural networks for robust medical image segmentation[J].
] proposed per-test-image adaptive normalization to tackle the problem of model performance degradation when there is a mismatch between training and test images. Hang W et al[
• Hang W.
• Feng W.
• Liang S.
• et al.
Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.
] proposed a novel structure-aware entropy regularized mean teacher model to address the structure-level consistency. These studies increase the feasibility of DLAS in adaptive radiotherapy. In MRgART, additional images are acquired during the daily treatments. Adjusting the model with added datasets in the time between fractions may further improve auto-segmentation accuracy.
Inspired by the above, we propose a patient-specific daily updated DL model for auto-segmentation. To the best of our knowledge, this is the first attempt to use an updated DL model for auto-segmentation in the MRgART process to accelerate adaptive planning. The feasibility and effectiveness of our method were evaluated through an assessment of segmentation accuracy and time reduction in clinical plan optimization.

## Method

### Dataset acquisition

This retrospective study included 6 abdominal/pelvic cancer patients (2 liver, 2 kidney, 2 cervical) treated with MRgART. All patients received at least eight fraction treatments. The prescribed dose was 23.4 Gy, delivered in 13 daily fractions of 1.8 Gy for one cervical case. The other pelvic patient received 60 Gy in 12 fractions. The two liver cases received 63 Gy in 9 fractions. The prescribed dose was 45 Gy, delivered in 15daily fractions of 3 Gy for two kidney case. The images acquired were T2 weighted (T2W) MRI (sequence parameters for abdomen: TR = 2000 ms, TE = 206 ms, SNR = 1, ACQ matrix M*P = 232*167; sequence parameters for pelvis: TR = 1535 ms, TE = 278 ms, SNR = 1, ACQ matrix M*P = 268*267). For each patient, eight longitudinal T2W MRI scans were collected. Sixteen regions of interest (ROIs), including the GTVs and OARs, were selected. The consensus of three senior radiation oncologists for the contoured ROIs was reviewed by the physician director and considered as ground truth.

### Proposed solution

Fig. 1 illustrates the overview of the proposed solution. After the reference plan is approved in TPS, the reference CT/contours will be automatically acquired and cached by the inference service and used to propagate contours to the first fraction through DIR. Thereafter, a patient-specific prototype DLAS model will be trained using the first factional MRI/contours as inputs. After each treatment, the new fractional MRI/contours will be automatically collected and used to update the prototype model for the specific patient. The updated DLAS model will be used online for the segmentation of next fractional MRI. In addition, if an additional pre-treatment MRI scan and contours can be acquired before the first fraction, the prototype model could be trained and used for the first fraction, eliminating the use of DIR.

### Training strategy

Fig. 2 shows the two different model training strategies used in this study. In the initial training of the prototype model, the training engine only used the first MRI/contour set of the current patient for regular supervised learning. In subsequent daily updates, structure-aware regularization was introduced into supervised learning. Specifically, the training engine maintained the weights of the two models, the prototype $θpro$ and its variant $θvar$. The pointwise segmentation loss $Lseg$ and the consistency constraint $Lcons$ were defined to jointly optimize the variant. We denote all group and latest images of the patient by $Gall$ and $Glat$, respectively. Note that $Lseg$ is applied to the samples of $Gall$ while $Lcons$ is only applied to $Glat$. We denote the model’s prediction as $p∙=θ∙x$, where $x$ is a training image containing $N$ pixels and $pi$ represents the $ith$ pixel of $p$. Given a structure $y$ corresponding to $x$, the pointwise segmentation loss $Lseg$ is denoted as follows:
$Lseg=Lce+Ldice$

where the $Lce$ represents the Cross Entropy:
$Lce=1N∑i=1N-yi∙logpivar+(1-yi)∙log(1-pivar)$

and the $Ldice$ is the DICE Loss:
$Ldice=1-2∙∑i=1Npivar∩yi∑i=1Npivar+∑i=1Nyi$

For a sample of $Glat$, the consistency constraint $Lcons$ is applied as follows:
$Lcons=Lvol+Llen+Lcen$

where the $Lvol$ represents the volume consistency:
$Lvol=∑i=1Npivar-∑i=1Npipro∑i=1Npipro$

where the $∙$ denote the absolute value. We used the Sobel operator to extract features $q∙=Sobelp∙$, and then calculated the length consistency $Llen$:
$Llen=∑i=1Nqivar-∑i=1Nqipro∑i=1Nqipro$

Let $ri∙$ be the coordinate of the $ith$ pixel of the foreground (we simply set the threshold at 0.5 to identify foreground) and $‖a-b‖$ be the Euclidean distance between two points, the centroid consistency is denoted as follows:
$Lcen=‖1N∑i=1Nrivar-1N∑i=1Nripro‖$

To sum up, the combination of loss functions is presented as follows:
$Ljoint=Lseg+Lcons$

The joint loss function $Ljoint$ was only applied to the variant. After the variant completes each step of update, prototype weights were gradually inherited from the consecutive variants using exponential weight averaging by a ratio of $β=0.99$ to form the final model $θpro′$:
$θpro′=β∙θpro+1-β∙θvar$

### Technical details

The dataset was preprocessed before each training, and the preprocessing was the same for both strategies. We adopted the standard processing pipeline recommended in nnU-Net[
• Isensee F.
• Jaeger P.F.
• Kohl S.A.A.
• et al.
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation[J].
]. In the training phase, the engine randomly selected a pair of preprocessed samples from $Gall$ and $Glat$ at each iteration, which was used to optimize $Lseg$ and $Lcons$. The samples were fed into the model after data augmentation. In addition to the default data augmentations proposed by nnU-Net, the Gibbs noise[
• Morelli J.N.
• Runge V.M.
• Ai F.
• et al.
An image-based approach to understanding the physics of MR artifacts[J].
] and k-space spike[
• Graves M.J.
• Mitchell D.G.
Body MRI artifacts in clinical practice: a physicist's and radiologist's perspective[J].
] were randomly superimposed on the image. The data augmentation for both strategies was also the same. The network structure used in this study was slightly modified from nnU-Net 2D. Specifically, instance normalization was replaced with cross normalization[
• Tang Z.
• Gao Y.
• Zhu Y.
• et al.
CrossNorm and SelfNorm for generalization under distribution shifts[C]//proceedings of the IEEE/CVF.
] that advanced model robustness performance. All details about hyperparameter were listed in the Appendix 1.

### Experiment design

Two experiments were conducted to evaluate the effectiveness and feasibility of our method. The performance of the proposed method was compared with DIR and frozen DL model without updates after pre-training. In addition, the data preprocessing, data augmentation, and network structure of the frozen DL model were the same as used in the proposed method.
The first experiment was to compare the segmentation performance. All three methods were tested with the datasets from the second treatment fraction to the last, and the first fraction was used as training data or as a propagation template. For training-based methods, the datasets were only split into training and test sets without a validation set. The final target model was derived after 12,800 iterations. The frozen DL model adopted the cross-validation method for evaluation; for the data set with the same training target, each patient was used as the test set in turn, and the remaining patients were used as the training set. The proposed method used all the previous fractions to train the daily model for each patient, then tested the model on the image of a subsequent fraction. For each test image, DIR propagated the contour of the latest fraction to the current image.
The second experiment was to verify the improvement of the plan adaptation efficiency using the proposed method in an actual clinical setting. In a routine workflow, the plan adaption time was recorded from the TPS receiving the daily MRI scan until the radiation oncologist manually segmented daily anatomy into contours. While for the proposed workflow, it was recorded from the inference service received the daily MRI until the radiation oncologist reviewed and approved the contours. In addition, the total time of the complete daily ART was also recorded and compared.

### Evaluation metrics

DICE Similarity Coefficient (DSC) and Hausdorff Distance (HD) were used as quantitative metrics to assess the performance of auto-segmentation methods. The DSC measures the volumetric overlap of two sets as follows:
$DSCA,B=2A∩BA+B$

where A and B are the ground truth and predicted regions, respectively. Also, $∙$ denote the absolute value. The HD measures how far between any two sets in the metric space and is defined as follows,
$HDA,B=maxhA,B,hB,AhA,B=max(min(‖a-b‖))a∈A,b∈B$

where $hA,B$ represents the maximum distance of a set to the nearest point in the other set. Also, $‖a-b‖$ denote the Euclidean distance between two points. In this study, we used HD95 based on calculating the 95th percentile of the distances. Due to the small amount of data and the failure of the normality test, we adopted a two-sided Wilcoxon test for statistical analysis.

## Results

The quantitative segmentation results using DIR, the frozen DL model, and the proposed model on the image dataset are shown in Table 1. The results were obtained from the statistics of all cases in the test set. From the overall statistics of the three cancer types, the proposed method (mean DSC = 0.90, 95 % CI: 0.88–0.95) outperformed DIR (mean DSC = 0.63, 95 %CI: 0.59–0.68) and frozen DL models (mean DSC = 0.74, 95 % CI: 0.71–0.79). For GTV auto-segmentation, the proposed method yielded a median DSC of 0.95 (95 % CI: 0.94–0.97) and a median HD95 of 1.63 mm (95 % CI: 1.22 mm-2.06 mm), which was comparable to the performance on OARs. While for the other two methods, the GTV auto segmentation was inferior to OARs. All methods showed no significant bias in the performance of the three cancer types. The results of the Wilcoxon test showed that the segmentation accuracy of the proposed method was significantly higher than the other two method in terms of DSC for almost all the ROIs except for the small intestine, and HD95 for most ROIs except for the heart, femoral head and spinal cord. Details of the assessment results for each ROI on three methods were listed in Appendix 2.
Table 1DSC and HD95 for GTV and OARs using the DIR, frozen DL model, and the proposed method.
DSC (mean ± std)HD95(mm) (mean ± std)
DIRFrozenProposedDIRFrozenProposed
Liver Cancer
GTV0.63 ± 0.120.67 ± 0.260.94 ± 0.014.26 ± 1.213.26 ± 1.111.95 ± 0.85
OARs avg0.69 ± 0.180.78 ± 0.170.96 ± 0.0310.94 ± 2.744.55 ± 1.343.09 ± 0.99
Kidney Cancer
GTV0.64 ± 0.120.66 ± 0.150.95 ± 0.026.25 ± 1.164.33 ± 1.121.70 ± 0.65
OARs avg0.59 ± 0.180.72 ± 0.160.91 ± 0.0826.07 ± 9.509.65 ± 2.405.53 ± 1.99
Cervical Cancer
GTV0.61 ± 0.240.69 ± 0.150.97 ± 0.0225.33 ± 6.2510.22 ± 1.231.55 ± 0.85
OARs avg0.64 ± 0.230.80 ± 0.110.93 ± 0.0220.97 ± 5.417.26 ± 2.253.65 ± 1.16
Abbreviations: Frozen, frozen DL model; std, standard deviation.
OARs for liver cancer: liver, kidney L/R, spinal cord, heart, body.
OARs for kidney cancer: small intestine, spinal cord, stomach, liver, kidney L/R, body.
OARs for cervical cancer: bladder, femoral head L/R, rectum, small intestine, body.
Fig. 3 shows the segmentation accuracy (DSC and HD95) for both GTV and OARs contours using the proposed method along with the increased number of fractions. The image set from the first day was used to build the prototype model, so the evaluation started on the second fraction. The performance had minimal improvement after five fractions, therefore not shown here. Despite some numerical fluctuations, the objective metrics of the model were improving, and the standard deviation gradually decreased with the increased number of additional prior images/contours. Representative multi-view images of the model-generated contours of one test case are presented in Fig. 4. The proposed model had better contouring accuracy and smoothness than DIR. In addition, the proposed model was sensitive to tracking the slight anatomical variations between fractions.
To evaluate the effectiveness of the proposed workflow, we compared the time taken in two ATS processes, are shown in Table 2. For a regular ATS workflow with manual contouring, the contouring time gradually decreased with the number of fractions delivered. However, the average time spent on contouring was over 12 minutes, with a maximum of 22 minutes. The proposed method significantly reduced the contouring time with an average of 73.4 ± 6.5 s. The total treatment time using the regular workflow was 3251.8 ± 447 s, vs 1650 ± 274 s by the proposed method. The minimum treatment time was 1189 s for cervical cancer in the fourth fraction. In our center, the average time of the ATP process was 22 minutes, and an ATS process with the proposed auto-segmentation tool only added a few minutes.
Table 2The contouring time and treatment time for different site patients.
Contouring time(s)
Liver cancerKidney cancerCervical cancer
RegularProposedRegularProposedRegularProposed
day210928213677882270
day312547412588068765
day49858110017256268
day8736718577864262
Treatment time(s)
Liver cancerKidney cancerCervical cancer
RegularProposedRegularProposedRegularProposed
day2365219673687175630351367
day3354619013765170528761355
day4346819253518173125781298
day8300218963443171024521189

## Discussion

In this paper, we proposed a practical solution to improve the efficiency of the contouring process in MRgART with auto-segmentation and minimal human editing. Our auto-segmentation method was patient-specific based on a daily updated DL model to account for daily anatomical variation in patients. To our knowledge, this is the first reported patient specific DLAS method that applies timely updated DL models trained with daily added data sets. The quantitative analysis demonstrated that the proposed method significantly outperformed the currently widely used DIR and frozen DL models. The vast majority of DL-based auto-segmentation methods[
• Outeiral R.R.
• Bos P.
• Al-Mamgani A.
• et al.
Oropharyngeal primary tumor segmentation for radiotherapy planning on magnetic resonance imaging using deep learning[J].
,
• Liang Y.
• Schott D.
• Zhang Y.
• et al.
Auto-segmentation of pancreatic tumor in multi-parametric MRI using deep convolutional neural networks[J].
,
• Chen H.
• Qi Y.
• Yin Y.
• et al.
MMFNet: a multi-modality MRI fusion network for segmentation of nasopharyngeal carcinoma[J].
], which we call frozen DL models, can only achieve an average DSC of no higher than 0.8 using more than 100 cases of training data. Although the cancer types and anatomical sites are not identical, the average DSC of our proposed method (higher than 0.9) is significantly better than that of existing frozen DL auto-segmentation models. The main reason for this difference is that the timely update mechanism of the proposed method makes it adaptable to the daily variability of a specific patient, which is not available in the frozen DL model.
As the name suggests, the proposed patient-specific auto-segmentation model was trained on a small number of image sets from the same patient. As shown at Fig. 3, high accurate was achieved for both GTV and OARs even for the second fraction using the prototype model trained with only the first fractional MRI/contours. However, the optimal number of datasets for training a generalizable frozen DL model has yet to be determined. Although current studies did not address the amount of data as a performance-affecting factor, it is a legitimate concern whether models built from small datasets can meet clinical needs. Regarding the generalizability of a DL model, the definitions of the data types it can generalize to are not well defined. The difficulty and amount of data required to generalize a model across modalities, cases, scanners, and fractions vary widely. Our prior study noticed that a patient's imaging differences and anatomical changes were much more minor among treatment fractions than among modalities, different patients, or scanners. It implied that a few data sets might achieve sufficient generalizability among fractions. To this end, we designed the patient-specific strategy to address this problem that the frozen DL model cannot handle. The results in Table 1 show the superior performance of the proposed approach. The ability of our method to generate a practical model with a relatively small training cohort may be attributed to the high standard of contouring and consistent MR sequences used to train the model.
The clinical application of the proposed solution significantly reduced the average time of the entire ATS workflow to be less than 28 minutes on Elekta Unity, which was an impressive good result. The contouring time was reduced to less than 2 minutes. It was a notable breakthrough, as the overall treatment time was essentially 80 % shorter than previous studies (median 64, 46, and 62 minutes in [
• Hal W.A.
• Straza M.W.
• Chen X.
• et al.
Initial clinical experience of stereotactic body radiation therapy (SBRT) for liver metastases, primary liver malignancy, and pancreatic cancer with 4D-MRI based online adaptation and real-time MRI monitoring using a 1.5 Tesla MR-Linac[J].
,
• McDonald B.A.
• Vedam S.
• Yang J.
• et al.
Initial feasibility and clinical implementation of daily mr-guided adaptive head and neck cancer radiation therapy on a 1.5 t mr-linac system: prospective r-ideal 2a/2b systematic clinical evaluation of technical innovation[J].
,
• Paulson E.S.
• Ahunbay E.
• Chen X.
• et al.
4D-MRI driven MR-guided online adaptive radiotherapy for abdominal stereotactic body radiation therapy on a high field MR-Linac: implementation and initial clinical experience[J].
], respectively). Excessive treatment time can cause physical discomfort (such as pain, stiffness, and anxiety), especially in older patients, which was reported to be associated with treatment outcome and survival[
• Martens R.M.
• Koopman T.
• Noij D.P.
• et al.
Adherence to pretreatment and intratreatment imaging of head and neck squamous cell carcinoma patients undergoing (chemo) radiotherapy in a research setting[J].
,
• Moelle U.
• Mathewos A.
• Aynalem A.
• et al.
Cervical cancer in Ethiopia: the effect of adherence to radiotherapy on survival[J].
]. The proposed solution can improve patient compliance and survival by reducing the daily replanning and treatment time.
Due to the lack of electron density information in MR imaging and the effects of magnetic fields during treatment, sophisticated dose calculation methods and dedicated quality assurance (QA) procedures are required[
• Kurz C.
• Buizza G.
• Landry G.
• et al.
Medical physics challenges in clinical MR-guided radiotherapy[J].
]. Furthermore, spatial distortions and artifacts in MR imaging, especially in the presence of motion, place demands on measurement techniques and modeling. In this study, patient QA was performed by measuring all IMRT plans using an ArcCHECK-MR cylindrical phantom with diode array detectors (Sun Nuclear Corporation). The average percentage of points satisfying the gamma criterion of 3 %/3 mm was 98.8 % (standard deviation 1.3 %, minimum pass rate 96.9 %).
There is something else worth discussing about the experimental setup and model hyperparameters. For example, Gibbs and k-space spike artifacts may occur at some fraction of the treatment process due to magnetic noise. Since our model is only trained on patient-specific data, the prediction performance drops when the artifact of current fraction is significantly stronger than previous fractions. Adding such data augmentation can make the model more robust in this situation. Model hyperparameters are determined through several pre-experiments. We tried BatchNorm, InstanceNorm, CrossNorm and SelfNorm as normalization layer of the model respectively. The CrossNorm is the best overall in terms of loss convergence speed and final performance. The exponential weight averaging ratio $β$ is also related to model performance. In this study we simply tried 0.9, 0.99 and 0.999. The ratio of 0.99 was slightly better than others so we chose it. Setting a schedule for the beta parameter may improve the performance.
We acknowledge a few limitations of our study:
• 1.
The patients in this study included only the abdominal and pelvic regions from a single center, and the number of cases was small. More data from other anatomical sites is needed to evaluate the robustness of the proposed method.
• 2.
The stability of the proposed method on data scanned in different protocols or modalities was not validated in this study.
• 3.
The effect of training hyperparameters on the model was not discussed in detail.
• 4.
Some ROIs with considerable volume variation between fractions, such as small intestine, were not well fitted in the first few training sessions.
Our future research will optimize the training strategy for specific ROIs.

## Conclusion

We reported the first clinical experience using a timely updated patient-specific auto-segmentation strategy on MR images for accelerating MRgART workflow. The proposed method can overcome the limitation of image variations and data scarcity for deep learning auto-segmentation on MRIs. We demonstrated that the proposed method achieved superior segmentation accuracy and efficiency in the ATS workflow for abdominal and pelvic cases compared to existing methods. Minimizing contouring time with reliable accuracy provides a vast advantage to facilitate the routine practice of MRgART on the MR-linac platform. Our future studies will focus on investigating the impact of data diversity and site-specific optimization strategies.

## Summary

The study proposed a patient-specific deep learning auto-segmentation (DLAS) strategy through rapid training and online updating the DLAS model for MR-guided adaptive radiotherapy.

## Key results

• 1.
The generalizability of the proposed method was demonstrated using fractional MRI data from patients with different types of cancers.
• 2.
Model updates could be completed in the time between treatment fractions and used to make predictions on the new set-to-treat daily images quickly.
• 3.
Daily update of the patient-specific model with all previous MRI/contour sets effectively improved the segmentation performance.

## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## Acknowledgment

This work was supported in part National Natural Science Foundation of China under Grants 82102173 , 82172072 ; Natural Science Foundation of Shandong under Grants ZR2020LZL001 , 2021SFGC0501 ; Key Research and Development Program of Shandong Province, China ( 2021LCZX04 ); Academic promotion program of Shandong First Medical University ( 2019LJ004 ), China, and by Taishan Scholars Program of Shandong Province, China (Grant NO. ts20120505 ).

## Appendix.

Tabled 1
ValueNote
Learning rate3e-4
Weight decay1e-4
Target spacings(1.0, 1.0)Resampling only for X-axis and Y-axis
EWA ratio0.99Retention rate of prototypes updated from variants at each step
NormalizationCross Norm
• Graves M.J.
• Mitchell D.G.
Body MRI artifacts in clinical practice: a physicist's and radiologist's perspective[J].
Batch size16
Window crop size(320, 320)The window center is selected randomly at each step
Scaling range0.7 ∼ 1.4The scale is randomly sampled from a uniform distribution.
Rotation range−30°∼30°The rotation angle is randomly sampled from a uniform distribution.
Spatial transform prob0.9Probability of performing augmentation at each step
Gaussian noise prob0.3Probability of performing augmentation at each step
Gaussian kernel sigma(0.25, 1.5)The kernel sigma is randomly sampled from a uniform distribution.
Gaussian blur prob0.3Probability of performing augmentation at each step
N segments of nonlinear shift5
Nonlinear shift prob0.5Probability of performing augmentation at each step
Tabled 1
DSC (mean ± std)
DIRfrozen DL modelProposed:

average of first four fractions
Proposed:

the last fraction
Liver Cancer
Liver GTV0.63 ± 0.120.67 ± 0.260.94 ± 0.010.96 ± 0.02
Liver0.71 ± 0.120.75 ± 0.180.97 ± 0.020.98 ± 0.03
Kidney_L0.75 ± 0.140.71 ± 0.190.94 ± 0.020.94 ± 0.01
Kidney_R0.63 ± 0.160.77 ± 0.160.97 ± 0.020.97 ± 0.02
SpinalCord0.81 ± 0.150.74 ± 0.170.93 ± 0.080.93 ± 0.07
Heart0.63 ± 0.220.80 ± 0.130.95 ± 0.020.95 ± 0.03
Body0.59 ± 0.270.89 ± 0.170.97 ± 0.010.96 ± 0.02
Kidney Cancer
Kidney GTV0.64 ± 0.120.66 ± 0.150.95 ± 0.020.97 ± 0.05
Small Intestine0.45 ± 0.220.59 ± 0.220.65 ± 0.350.93 ± 0.06
SpinalCord0.78 ± 0.140.81 ± 0.130.95 ± 0.030.96 ± 0.02
Stomach0.55 ± 0.210.67 ± 0.230.93 ± 0.030.97 ± 0.03
Liver0.59 ± 0.130.65 ± 0.160.98 ± 0.010.98 ± 0.01
Kidney_L0.63 ± 0.160.73 ± 0.120.94 ± 0.030.98 ± 0.01
Kidney_R0.59 ± 0.210.77 ± 0.140.95 ± 0.050.98 ± 0.02
Body0.52 ± 0.220.83 ± 0.110.96 ± 0.060.98 ± 0.01
Cervical Cancer
Cervical GTV0.61 ± 0.240.69 ± 0.150.97 ± 0.020.97 ± 0.02
Bladder0.71 ± 0.250.73 ± 0.220.98 ± 0.020.96 ± 0.02
Left Femoral head0.71 ± 0.220.88 ± 0.020.96 ± 0.020.97 ± 0.02
Right Femoral head0.75 ± 0.220.90 ± 0.030.97 ± 0.010.97 ± 0.01
Rectum0.69 ± 0.310.71 ± 0.020.96 ± 0.010.96 ± 0.03
Small Intestine0.47 ± 0.240.69 ± 0.220.77 ± 0.020.79 ± 0.04
Body0.51 ± 0.130.89 ± 0.150.94 ± 0.030.96 ± 0.01
HD95(mm) (mean ± std)
DIRfrozen DL modelProposed:

average of first four fractions
Proposed:

the last fraction
Liver Cancer
Liver GTV4.26 ± 1.213.26 ± 1.111.95 ± 0.851.58 ± 0.62
Liver12.35 ± 3.656.53 ± 1.182.73 ± 1.122.40 ± 0.71
Kidney_L10.25 ± 2.555.55 ± 1.324.04 ± 0.962.40 ± 1.05
Kidney_R6.24 ± 1.853.25 ± 0.891.77 ± 0.861.20 ± 0.93
SpinalCord2.25 ± 0.861.89 ± 1.280.84 ± 0.820.56 ± 0.35
Heart8.93 ± 2.216.55 ± 2.117.69 ± 1.254.56 ± 1.22
Body25.63 ± 5.323.53 ± 1.231.48 ± 0.940.99 ± 0.52
Kidney Cancer
Kidney GTV6.25 ± 1.164.33 ± 1.121.70 ± 0.651.01 ± 0.65
Small Intestine46.86 ± 15.6622.24 ± 6.2515.44 ± 3.661.57 ± 1.22
SpinalCord3.26 ± 1.112.15 ± 0.921.19 ± 0.880.71 ± 0.45
Stomach52.12 ± 20.5515.21 ± 3.268.30 ± 3.653.64 ± 1.35
Liver36.23 ± 12.5210.21 ± 1.592.73 ± 1.331.43 ± 0.85
Kidney_L10.26 ± 3.216.52 ± 1.313.32 ± 1.221.20 ± 0.62
Kidney_R11.37 ± 4.225.99 ± 1.683.32 ± 1.320.71 ± 0.35
Body22.37 ± 9.255.21 ± 1.824.38 ± 1.841.87 ± 0.68
Cervix Cancer
Cervical GTV25.33 ± 6.2510.22 ± 1.231.55 ± 0.851.18 ± 0.82
Bladder12.54 ± 3.2610.26 ± 2.254.41 ± 1.869.04 ± 2.15
Left Femoral head2.93 ± 0.871.29 ± 0.850.94 ± 0.250.83 ± 0.32
Right Femoral head2.57 ± 0.921.69 ± 0.970.91 ± 0.360.83 ± 0.33
Rectum16.85 ± 4.884.66 ± 1.281.21 ± 0.571.00 ± 0.48
Small Intestine60.53 ± 15.6520.43 ± 6.5213.45 ± 3.668.57 ± 2.12
Body30.39 ± 6.855.22 ± 1.650.98 ± 0.280.95 ± 0.55

## Appendix A. Supplementary material

• Supplementary data 1

## References

• Tijssen R.H.N.
• Philippens M.E.P.
• Paulson E.S.
• et al.
MRI commissioning of 1.5 T MR-linac systems–a multi-institutional study[J].
• Otazo R.
• Lambin P.
• Pignol J.P.
• et al.
• Winkel D.
• Bol G.H.
• Kroon P.S.
• et al.
Clin Transl Radiat Oncol. 2019; 18: 54-59
• Lamb J.
• Cao M.
• Kishan A.
• et al.
Cureus. 2017; 9
• Zhang Y.
• Paulson E.
• Lim S.
• et al.
A patient-specific autosegmentation strategy using multi-input deformable image registration for magnetic resonance imaging-guided online adaptive radiation therapy: a feasibility study[J].
• Christiansen R.L.
• Dysager L.
• Bertelsen A.S.
• et al.
Accuracy of automatic deformable structure propagation for high-field MRI guided prostate radiotherapy[J].
• Anaya V.M.
• Fairfoul J.
Med Eng Phys. 2019; 64: 65-73
• Liesbeth V.
• Michaël C.
• Anna M.D.
• et al.
Overview of artificial intelligence-based applications in radiotherapy: recommendations for implementation and quality assurance[J].
• Sheng K.
Artificial intelligence in radiotherapy: a technological review[J].
Front Med. 2020; : 1-19
• Tang X.
• Rangraz E.J.
• Coudyzer W.
• et al.
Whole liver segmentation based on deep learning and manual adjustment for clinical use in SIRT[J].
Eur J Nucl Med Mol Imaging. 2020; 47: 2742-2752
• Fu Y.
• Mazur T.R.
• Wu X.
• et al.
A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy[J].
Med Phys. 2018; 45: 5129-5137
• Renard F.
• Guedria S.
• De Palma N.
• et al.
Variability and reproducibility in deep learning for medical image segmentation[J].
Sci Rep. 2020; 10: 1-16
• Yan W.
• Huang L.
• Xia L.
• et al.
MRI manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for MR images acquired with different scanners[J].
Radiol Artif Intell. 2020; 2: e190195
• Vovk U.
• Pernus F.
• Likar B.
A review of methods for correction of intensity inhomogeneity in MRI[J].
IEEE Trans Med Imaging. 2007; 26: 405-421
• Karani N.
• Erdil E.
• Chaitanya K.
• et al.
Test-time adaptable neural networks for robust medical image segmentation[J].
Med Image Anal. 2021; 68
• Hang W.
• Feng W.
• Liang S.
• et al.
Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.
Springer, Cham2020: 562-571
• Isensee F.
• Jaeger P.F.
• Kohl S.A.A.
• et al.
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation[J].
Nat Methods. 2021; 18: 203-211
• Morelli J.N.
• Runge V.M.
• Ai F.
• et al.
An image-based approach to understanding the physics of MR artifacts[J].
• Graves M.J.
• Mitchell D.G.
Body MRI artifacts in clinical practice: a physicist's and radiologist's perspective[J].
J Magn Reson Imaging. 2013; 38: 269-287
• Tang Z.
• Gao Y.
• Zhu Y.
• et al.
CrossNorm and SelfNorm for generalization under distribution shifts[C]//proceedings of the IEEE/CVF.
Int Conf Comput Vision. 2021; : 52-61
• Outeiral R.R.
• Bos P.
• Al-Mamgani A.
• et al.
Oropharyngeal primary tumor segmentation for radiotherapy planning on magnetic resonance imaging using deep learning[J].
Phys Imaging Radiat Oncol. 2021; 19: 39-44
• Liang Y.
• Schott D.
• Zhang Y.
• et al.
Auto-segmentation of pancreatic tumor in multi-parametric MRI using deep convolutional neural networks[J].
• Chen H.
• Qi Y.
• Yin Y.
• et al.
MMFNet: a multi-modality MRI fusion network for segmentation of nasopharyngeal carcinoma[J].
Neurocomputing. 2020; 394: 27-40
• Hal W.A.
• Straza M.W.
• Chen X.
• et al.
Initial clinical experience of stereotactic body radiation therapy (SBRT) for liver metastases, primary liver malignancy, and pancreatic cancer with 4D-MRI based online adaptation and real-time MRI monitoring using a 1.5 Tesla MR-Linac[J].
PloS one. 2020; 15: e0236570
• McDonald B.A.
• Vedam S.
• Yang J.
• et al.
Initial feasibility and clinical implementation of daily mr-guided adaptive head and neck cancer radiation therapy on a 1.5 t mr-linac system: prospective r-ideal 2a/2b systematic clinical evaluation of technical innovation[J].
Int J Radiat Oncol* Biol* Phys. 2021; 109: 1606-1618
• Paulson E.S.
• Ahunbay E.
• Chen X.
• et al.
4D-MRI driven MR-guided online adaptive radiotherapy for abdominal stereotactic body radiation therapy on a high field MR-Linac: implementation and initial clinical experience[J].
Clin Transl Radiat Oncol. 2020; 23: 72-79
• Martens R.M.
• Koopman T.
• Noij D.P.
• et al.
Adherence to pretreatment and intratreatment imaging of head and neck squamous cell carcinoma patients undergoing (chemo) radiotherapy in a research setting[J].
Clin Imaging. 2021; 69: 82-90
• Moelle U.
• Mathewos A.
• Aynalem A.
• et al.
Cervical cancer in Ethiopia: the effect of adherence to radiotherapy on survival[J].
Oncologist. 2018; 23: 1024-1032
• Kurz C.
• Buizza G.
• Landry G.
• et al.
Medical physics challenges in clinical MR-guided radiotherapy[J].