Can patient-reported outcomes be used instead of clinician-reported outcomes and photographs as primary endpoints of late normal tissue effects in breast radiotherapy trials? Results from the IMPORT LOW trial

Highlights • Few patients reported moderate/marked normal tissue effects (NTE) irrespective of assessment method used.• Patients reported more NTE than clinician-reported outcomes or photographs.• Concordance was poor on an individual patient level.• Treatment comparisons were consistent between assessment methods.

In the current era of low local relapse rates after adjuvant breast radiotherapy [1,2], the risks of radiotherapy-related late normaltissue effects (NTE) need to be carefully balanced against the benefits of treatment, requiring detailed collection of NTE data in breast radiotherapy trials. Furthermore, with improvements in breast radiotherapy techniques, including the introduction of intensity-modulated [3] and partial-breast radiotherapy [1], the NTE event rate has also fallen substantially. Consequently, measuring NTE is becoming increasingly challenging.
NTE have been variously assessed in breast radiotherapy trials using clinician-reported outcomes (CRO), photographs and patient-reported outcome measures (PROMs) [2,3]. The optimal NTE data collection method is unclear and there is no gold standard. The methodology of each assessment type differs. For example, patients may be asked to assess changes in their treated breast since their breast cancer treatment, whereas clinicians compare the patient's treated and contralateral breasts. Also, the scales used for scoring the different assessments vary.
Irrespective of differences between the methods, the priorities for breast radiotherapy trials are that the method used to detect NTE should be able to differentiate between randomised treatment groups (if a difference exists), and that the information obtained is clinically relevant to patients. Data  demonstrate that PROMs are able to differentiate between dose/ volume regimens [1] and between small dose differences in hypofractionated regimens [2]. PROMs also provide the patients' perceptions of the impact of their cancer and the consequences of treatment [4] within the framework of the question asked. This analysis investigates within the context of the IMPORT LOW partial-breast radiotherapy trial, (i) the degree of concordance on an individual patient level between PROMs and CRO or photographs, (ii) whether results for the randomised comparisons obtained from PROMs are consistent with those using CRO or photographs and (iii) the influence of baseline characteristics on concordance, with the overall aim of assessing whether PROMs could be used as primary NTE endpoints in future breast radiotherapy trials.

Patient population
IMPORT LOW (ISRCTN12852634) is a multicentre randomised phase III non-inferiority trial comparing safety and efficacy of standard whole-breast radiotherapy with two experimental schedules (reduced-dose and partial-breast radiotherapy) in women with low-risk breast cancer after breast conserving surgery [1].
IMPORT LOW included a comprehensive and systematic investigation of NTE including CRO in all participants, and PROMs and photographs in a subset of patients for which full details of patients and procedures have been published [1]. All centres were invited to participate in the PROMs and photographic sub-studies (until sufficient accrual was achieved). All patients at these centres were invited to participate in the sub-studies until the designated sample size for each sub-study was obtained.

Procedures
Patients who consented to the PROMs sub-study completed the EORTC QLQ-C30 core questionnaire and QLQ-BR23 breast-specific module [5,6] and 10-item Body Image Scale (BIS) [7], all of which asked patients to consider their symptoms during the past week. The Hospital Anxiety and Depression Scale [HADS] [8] and protocol-specific questionnaire items relating to 'change in breast appearance', 'breast hardness/firmness', 'reduction in size of breast', 'change in skin appearance', 'is the position of the nipple of your affected breast different from the other side', 'problem getting a bra to fit' and 'shoulder stiffness' which may have resulted from any prior breast cancer treatments [32] were also completed. All items (with the exception of HADS) were scored on a four-point scale: none, a little, quite a bit, very much (interpreted as none, mild, moderate, marked). Questionnaires were completed at baseline (pre-radiotherapy) and 6 months, 1, 2 and 5 years after radiotherapy. Patients completed PROMs questionnaires alone with no help from clinicians.
For patients participating in the photographic sub-study, photographs were taken at baseline (post surgery but preradiotherapy), year-2 and year-5. Change in photographic breast appearance of the ipsilateral breast was assessed at 2 and 5 years compared with the baseline photograph. Breast size and surgical deficit were scored from the baseline photographs on a 3-point scale (small, medium, large). At 2 and 5 years after radiotherapy, breast appearance change (none/mild/marked) was scored on a pair of photographs (one with the patients' hands on the hips and one with hands raised) in comparison with the baseline photograph. A panel of observers blinded to patient identity, treatment allocation, and radiotherapy centre scored the photographs, the methodology having been validated in the START pilot trial [9]. CRO including breast shrinkage, breast induration, telangiectasia and breast oedema were scored using the contralateral breast as a comparator with a four-point graded scale (none, a little, quite a bit, very much; interpreted as none, mild, moderate, marked) at 1, 2 and 5 years following radiotherapy in all patients. The CRO items were established and validated in the START trials [10]. Clinicians were not blinded to treatment group.

Statistical analysis
PROMs were paired with the relevant CRO or photograph at 2 and 5 years for the analyses ( Table 1).
The ''quite a bit" and ''very much" categories were combined for PROMs and CRO as few NTE were scored as ''very much". This resulted in a 3-point scale corresponding to none, a little (mild), quite a bit/very much (moderate/marked). This also enabled direct comparison with photographs, also scored on a 3-point scale.
Agreement between the data ascertainment methods on an individual patient level was assessed using percentage agreement (with 95% confidence interval), weighted kappa statistic (with 95% confidence interval) and Bowker's test of symmetry [11]. Guidelines for interpreting the value of weighted kappa in terms of the strength of agreement are <0.20: poor; 0.21-0.40: fair; 0.41-0.6: moderate; 0.61-0.8: good; 0.81-1.00: very good [12]. A significance level of 0.005 was used to account for multiple testing in all analyses.
Risk ratios comparing each test radiotherapy schedule with the control group were calculated for each NTE endpoint at year-5 and presented in forest plots for the different assessment methods. Results for breast oedema were not included in this comparison as so few events were reported using PROMs and CRO at year-5.
The influence of baseline patient characteristics on concordance was investigated using stratified analyses, and formally assessed in logistic regression models defining a binary outcome as 1 = concordant (same scores for PROMs and CRO/photographs) versus 0 = discordant (different scores). Baseline factors found to be statistically significantly associated with concordance on univariate analysis were tested together on multivariate analysis. Baseline characteristics tested included age, treatment group, breast size and surgical deficit (assessed from baseline photographs), HADS anxiety and depression subscale scores and body image scores. All analyses were carried out using STATA version 14 based on a database snapshot taken on June 15th 2016 (as per the primary endpoint analysis).

Results
2018 patients were recruited to IMPORT LOW from 71 centres. 2 patients requested exclusion from analyses. In the 41 centres participating in the PROMs sub-study, 1265/1333 (95%) patients consented to PROMs, and 1318/1466 (90%) patients consented to the photographic sub-study from 37 participating centres. 1095 patients consented to both sub-studies (Fig. 1a).
In 1095 patients who consented to both, PROMs were available at 2 and/or 5 years for 976 patients of whom 909 had CRO and 844 had photographs. PROMs, CRO and photographs were available for 651 and 518 patients at year-2 ( Fig. 1b) and year-5 respectively (Fig. 1c). Separate analyses were conducted in patients with PROMs and CRO, and PROMs and photographs, at year-2 and year-5. Data regarding baseline characteristics [1] and PROMs questionnaire return rates [13] have been published.

Overall prevalence of NTE
The overall prevalence of patients with NTE was low, with most scored as none or mild by all three data ascertainment methods ( Table 2). Few patients had NTE scored as moderate or marked. NTE which were commonly reported included breast shrinkage, induration and breast appearance change. At year-5, 19% patients and 9% clinicians reported moderate/marked breast shrinkage. With respect to breast induration, 7% patients and 5% clinicians reported moderate/marked changes. For breast appearance change, 18% patients reported moderate/marked changes and photographic assessment reported marked changes in 4%.

Reporting of NTE by patients compared with either CRO or photographs
Patients reported a higher prevalence of breast changes than CRO and photographs for all NTE assessed, except for more clinically reported mild breast shrinkage compared with patient-reported bra fitting at both time-points (Figs. 2 and 3). Patients and clinicians reported similar prevalences of breast oedema, with very few events at 2 and 5 years. Concordance between PROMs and CRO or photographs of corresponding NTE on an individual patient basis was generally poor (Table 2).
For breast shrinkage at year-5, patients reported more effects than clinicians (Fig. 3); percentage agreement was 48% and concordance was poor as evidenced by the low weighted kappa (0.17, Table 2). Bowker's test of symmetry was also highly significant (p < 0.001) indicating discordance, with patients reporting more effects than clinicians ( Table 2). With regard to 5-year breast appearance change, patients reported more NTE than scored on photographs (Bowker's test of symmetry <0.001) [ Table 2]. Agreement was poor (35%), as was concordance (weighted kappa 0.09) [ Table 2].
In contrast, for breast induration at year-5, PROMs and CRO appeared better aligned with similar levels of effects reported by both (Fig. 3) and a higher % agreement (61%, Table 2), but concordance remained poor (weighted kappa 0.12, Table 2). In addition, Bowker's test for symmetry was no longer significant (p = 0.025), implying similar effects reported by PROMs and CRO ( Table 2).

Comparison of radiotherapy schedules using PROMs, CRO and photographs
On comparison of the risk ratios for the radiotherapy schedules, similar effect sizes were seen for breast shrinkage and breast appearance change when the analogous question was asked of the patient, or ascertained from either CRO or photographs (Fig. 4). There was some evidence of differing effect sizes between the assessment methods for breast induration, but the confidence intervals overlapped (Fig. 4).

Associations between baseline characteristics and concordance
On stratified analyses, there was little evidence that concordance varied according to baseline characteristics at 2 or 5 years (Appendix Table A1 & Table A2). Some baseline factors were significantly associated with concordance of PROMs and either CRO or photographs for certain NTE in logistic regression models, but pre- Fig. 1. Summary of whole trial population consenting to PROMs and photographs, and data available at 2 and 5 years ( * Two patients withdrew consent for any of their data to be used in the analysis).
dominantly on univariate analysis only and not across both timepoints (Appendix Table A3). For example, larger surgical deficit was associated with discordance of breast shrinkage at year-5 only Table A3).

Discussion
This analysis in the context of a randomised trial of partialbreast radiotherapy found few patients had moderate/marked There were no clinically significant associations found between baseline characteristics and concordance of NTE. The low overall prevalence of moderate/marked NTE, irrespective of the data ascertainment method used, has been reported in a number of adjuvant breast radiotherapy trials [1,2,13]. It is therefore increasingly important, in an era of improving radiotherapy techniques to monitor NTE using sufficiently sensitive methods. Within IMPORT LOW, patients reported more NTE compared with clinicians or photographs; this has been previously documented in the literature [14][15][16][17][18][19][20][21]. This suggests NTE may be underestimated if only clinician-reported or photographic outcomes are used. In contrast, the Cambridge IMRT trial [22] found clinicians reported a higher prevalence of breast changes than patients which may be related to the Cambridge study being a single-centre study with assessments conducted by one individual.
Concordance was poor on an individual patient level in IMPORT LOW. This could be explained by, firstly, the methods not being designed to be interchangeable given the different comparators used. Secondly, each method is also asking a slightly different question; when patient-reported bra fitting was compared with clinician-reported breast shrinkage, patients were deciding what a reasonable fit is in general, whereas clinicians reported degree of breast shrinkage. Thirdly, each method has its own scoring sub-scale which may be worded and categorised differently. Poor concordance has been consistently reported in the literature to date [14][15][16][22][23][24][25]. Furthermore, it has been argued that some variation is 'quite acceptable and comprehensible' due to the methodological differences between toxicity scoring by patients and clinicians [26].
Although concordance was poor on an individual patient level, the three methods generated similar estimates of effect sizes in terms of comparisons between the randomised treatments, suggesting it is reasonable to use any method. These findings are consistent with those from the START trials [14]. Within IMPORT LOW there also appeared to be a higher sensitivity of PROMs to treatment volume, although the effect sizes obtained from PROMs remained consistent with CRO and photographs. It should be noted that the PROMs investigated in this analysis and the START trials were the protocol-specific items, which were specifically developed to capture late radiotherapy effects [32], rather than generic PROMs related to general quality of life [5].
With respect to the influence of baseline characteristics on concordance, findings were not consistent across NTE or years of assessment and most associations found were significant on univariate analysis only. It is therefore not possible to draw any firm conclusions from these data. The START [14] and Cambridge IMRT [22] trials found no evidence of associations between baseline factors and concordance of NTE assessment methods.
In relation to which NTE assessment methods to use in future breast radiotherapy trials, each has advantages and disadvantages.   Clinicians are able to assess the breast with a 3-D view whereas this is not possible with standard photographs (unless taken from various angles providing an overall composite of the breast, although limited resources may prevent this). However, there is a risk of 'bias reporting', as clinicians cannot be blinded to the allocated radiotherapy treatment. Also, varying thresholds of experience in grading toxicity between clinicians can lead to interobserver variability; there was no formal training protocol for clinicians assessing NTE in IMPORT LOW. Furthermore, changes in UK working practices including earlier discharge of patients back to primary care make hospital-based follow-up challenging [27].
Obtaining photographs is also becoming increasingly challenging. Firstly, despite consenting to participate in a photographic sub-study, patients may not attend for photographs. There is a risk of 'informative censoring' where patients may choose not to attend for photographs either (1) because they do not think there is a problem with their treated breast or (2) they may have experienced NTE and feel uncomfortable about having photographs, resulting in a self-selected population. Of note there was no evidence of change in attendance for year-5 photographs based on year-2 photograph scores in IMPORT LOW. Additionally, workforce changes including closure of medical photography departments make it harder to schedule photographs. It should be noted that photographs provide the only unbiased comparison of NTE between randomised treatment groups [1][2][3]22] as the panel of clinicians scoring photographs are blinded to treatment allocation. Photographs also provide a permanent record at a fixed time point and can be filed and stored for future use. Scoring can also be validated by repeat scoring from different observers [9]. However, in IMPORT LOW, there was a large discrepancy in rating overall change in breast appearance between photographs and PROMs (% agreement = 26% and 35% at year-2 and 5 respectively). Patients reported significantly more NTE at both time-points, suggesting photographs may not capture the changes which are important for patients.
PROMs provide an opportunity to understand the patients' own perception of NTE within the framework of questions asked. We know that patients report more NTE than clinicians [14][15][16][17][18][19][20][21] or photographs and therefore, without the use of PROMs, the prevalence of NTE may be underestimated. Furthermore, PROMs are able to distinguish between treatment groups [1,2]. Within the START trials, all three data ascertainment methods were able to differentiate between randomised treatment groups [2,28,29] whereas in IMPORT LOW it was found that only PROMs were able to distinguish between randomised comparisons [1]. This difference in findings is likely related to the NTE event rate being lower in IMPORT LOW than in the START trials. In future breast radiotherapy trials (with expected low NTE rates), PROMs may have better capability in differentiating between treatment groups.
However, there are a number of issues related to PROMs. Firstly, certain patient groups may not wish to participate in a PROMs study resulting in a trial population unrepresentative of the general population. Secondly, obtaining complete datasets can be challenging [4] as questionnaires may not be returned and individual questions may not be completed. Thirdly, there is a risk of bias related to questionnaire return as patients who return questionnaires may have different characteristics to those who don't and may report either more or fewer side-effects. In IMPORT LOW, women who declined participation in the PROMs sub-study were slightly older than those who did consent [13]. There were no significant differences in the majority of baseline characteristics in those who did or did not return questionnaires at 5 years, with the exception of higher baseline HADS anxiety and depression subscale scores in patients who did not return their year-5 questionnaire [13]. Also, patients who reported more adverse effects at year-2 were more likely to return questionnaires at year-5 [13]. The prevalence of NTE at individual time-points may therefore be overestimated. Finally, irrespective of missing data, there is also risk of 'bias reporting', as patients cannot be blinded to treatment group in radiotherapy trials. Although the risk of bias reporting cannot be avoided, strategies can be implemented to reduce missing data, including collecting data electronically, such as via smart phone/email. Reducing numbers of questions in PROM questionnaires to include only the most salient and discriminating questions may also improve return rates. As well as obtaining complete and unbiased data-sets for PROMs, improvements in the standardisation of analysis, interpretation and reporting of PROMs data in clinical trials are also required to enable crosscomparison of data between trials [30].
We have discussed whether PROMs could potentially replace either CRO or photographs to assess NTE. Broadly, patients rate their subjective satisfaction with an experience of a range of breast changes, whilst clinicians seek objective adverse treatment effects. Therefore, the differences and agreements found by the methods contribute to the overall trial evaluation from multiple perspectives, affecting both the individual patient and randomised trial population. We acknowledge CRO are still widely supported and an alternative viewpoint is that both PROMs and CROs may be necessary as they measure differing aspects of disease experience and are complementary [31].
The main limitation of this analysis is that the IMPORT LOW trial was not designed to address the specific question of concordance between the data ascertainment methods therefore methodological issues regarding data ascertainment exist. These include each of the methods asking a slightly different question and using different comparators, with various subscales. The lack of standardisation between the methods may limit comparability between PROMs and either CRO or photographs.
Few patients had moderate/marked NTE irrespective of method used. Patients reported more NTE than CRO and photographs, therefore NTE may be underestimated if PROMs are not used. Despite poor concordance between assessment methods, effect sizes from PROMs were consistent with CRO and photographs, suggesting PROMs can be used as primary NTE endpoints in breast radiotherapy trials.   The IMPORT Trialists' Group consists of the Trial Management Group, Trial Steering Committee, Independent Data Monitoring Committee and the principal and main co-investigators at the participating centres.

Conflicts of interest
IMPORT LOW is sponsored by The Institute of Cancer Research, London.