Comparison of myocardial fibrosis quantification methods by cardiovascular magnetic resonance imaging for risk stratification of patients with suspected myocarditis

Background Although the presence of late gadolinium enhancement (LGE) using cardiovascular magnetic resonance imaging (CMR) is a significant discriminator of events in patients with suspected myocarditis, no data are available on the optimal LGE quantification method. Methods Six hundred seventy consecutive patients (48 ± 16 years, 59% male) with suspected myocarditis were enrolled between 2002 and 2015. We performed LGE quantitation using seven different signal intensity thresholding methods based either on 2, 3, 4, 5, 6, 7 standard deviations (SD) above remote myocardium or full width at half maximum (FWHM). In addition, a LGE visual presence score (LGE-VPS) (LGE present/absent in each segment) was assessed. For each of these methods, the strength of association of LGE results with major adverse cardiac events (MACE) was determined. Inter-and intra-rater variability using intraclass-correlation coefficient (ICC) was performed for all methods. Results Ninety-eight (15%) patients experienced a MACE at a medium follow-up of 4.7 years. LGE quantification by FWHM, 2- and 3-SD demonstrated univariable association with MACE (hazard ratio [HR] 1.05, 95% confidence interval [CI]:1.02–1.08, p = 0.001; HR 1.02, 95%CI:1.00–1.04; p = 0.001; HR 1.02, 95%CI: 1.00–1.05, p = 0.035, respectively), whereas 4-SD through 7-SD methods did not reach significant association. LGE-VPS also demonstrated association with MACE (HR 1.09, 95%CI: 1.04–1.15, p < 0.001). In the multivariable model, FWHM, 2-SD methods, and LGE-VPS each demonstrated significant association with MACE adjusted to age, sex, BMI and LVEF (adjusted HR of 1.04, 1.02, and 1.07; p = 0.009, p = 0.035; and p = 0.005, respectively). In these, FWHM and LGE-VPS had the highest degrees of inter and intra-rater reproducibility based on their high ICC values. Conclusions FWHM is the optimal semi-automated quantification method in risk-stratifying patients with suspected myocarditis, demonstrating the strongest association with MACE and the highest technical consistency. Visual LGE scoring is a reliable alternative method and is associated with a comparable association with MACE and reproducibility in these patients. Trial registration number NCT03470571. Registered 13th March 2018. Retrospectively registered.


Background
Myocarditis is a frequent cause of dilated cardiomyopathy [1]. Myocarditis has diverse pattern of clinical signs and symptoms at presentations [2,3] and its diagnosis may be challenging, as the sensitivity of the main diagnostic tools may vary greatly [1,4,5]. Cardiovascular magnetic resonance imaging (CMR) has become the primary imaging tool for establishing the diagnosis by the use of late gadolinium enhancement (LGE) imaging [2,[5][6][7][8]. We recently showed an incremental prognostic value of LGE in risk stratifying patients with suspected myocarditis [9]. However, an optimal method of LGE quantification in patients with suspected myocarditis is currently unknown. Better characterization of LGE pattern is important given that the heterogeneity in presence, localization and intensity of LGE extent in myocarditis [6] and that may improve clinical decisions. We therefore sought to compare LGE quantification techniques including thresholding by 2, 3, 4, 5, 6 or 7 standard deviations (SDs) above remote myocardium, the full width at half maximum (FWHM) technique, and visual quantification, as well as their respective association with clinical outcome in a post-hoc analysis of patients with suspected myocarditis [9].

Study population
The study included consecutive patients referred for CMR at our center for "suspected myocarditis" as the primary clinical question between December 2002 and December 2015 [9]. In real-world clinical practice, diagnosis of myocarditis is challenged by a lack of reference standards including the use of noninvasive measures, serum biomarkers, and even tissue pathology from invasive biopsies [10]. We included consecutive patients with suspected myocarditis raised as a referral indication with presenting signs/symptoms from either one of 2 groups: 1) acute chest pain syndromes with symptom onset < 2 weeks before CMR; 2) subacute (onset ≥2 weeks) of dyspnea, signs of left ventricular (LV) dysfunction, or ventricular arrhythmias syncopal spells or abnormal electrocardiogram (ECG). Exclusion criteria included: 1) any evidence of coronary artery disease (CAD) by either previous documented medical history, any previous or newly detected relevant CAD in non-invasive or invasive imaging or subendocardial LGE consistent of CAD in a territory subtended by a coronary vessel 2) any evidence of hypertrophic cardiomyopathy, arrhythmogenic right ventricular (RV) cardiomyopathy, cardiac sarcoidosis, or cardiac amyloidosis; and 3) any evidence of Takotsubo cardiomyopathy, constrictive pericarditis, Loeffler endocarditis, ventricular noncompaction, cardiac tumor, pulmonary embolism, or severe valvular disease. Seven hundred forty-four patients were included. Fifty-nine (7.9%) patients were excluded based on CMR findings consistent with: myocardial infarction (N = 35), biopsy-proven cardiac amyloidosis (N = 6), ventricular non-compaction (N = 3), Takotsubo cardiomyopathy (N = 4), constrictive pericarditis (N = 2), cardiac sarcoidosis (N = 2), Loeffler endocarditis (N = 2), and 1 each for arrhythmogenic cardiomyopathy, hypertrophic cardiomyopathy, pulmonary embolism, cardiac tumor, and severe valvular disease. Fifteen patients were excluded due to technical reasons (claustrophobia with incomplete CMR scans or non-diagnostic LGE images) [9]. Takotsubo cardiomyopathy was defined as previously published with apical ballooning [11], elevated troponin, absence of CMR features suggesting of myocarditis (absence of LGE) and absence of coronary artery disease. Clinical data including medication, laboratory tests including cardiac biomarkers, and ECG before CMR scanning were analyzed. Abnormal troponin I was defined < 0.10 ng/mL and troponin T < 14 ng/l. Normal values of creatine phosphokinase (CPK) were < 145 U/l in women and < 170 U/l in men. Clinical data, cardiac biomarkers and ECG were analyzed at baseline by a cardiologist.

CMR imaging protocol and image post-processing
The CMR systems included a 3T (Tim Trio, Siemens Healthineers, Erlangen, Germany) and a 1.5 T (Aera, Siemens Healthineers). All patients underwent cine balanced steady-state free precession (bSSFP) imaging and an LGE imaging protocol (TR, 4.8 ms; TE, 1.3 ms; inversion time, 200 to 300 ms), using a segmented inversion-recovery pulse sequence starting 10 to 15 min after a weight-based injection (cumulative dose 0.15 mmol/Kg) of gadolinium diethylenetriamine pentaacetic acid (Magnevist, Bayer HealthCare Pharmaceuticals Inc., Berlin, Germany). In 122 (21%) patients, Multihance (Bracco Diagnostic, Milan, Italy) was used instead of Magnevist. In patients with an estimated glomerular filtration rate (eGFR) < 60 mL/min/1.73m 2 , contrast dose was restricted to 0.1 mmol/kg or 20 ml, whichever was lower in volume in compliance with our institutional policy [12]. A commercially available software (MASS v15, Medis, Leiden, The Netherlands) was used to post-process and quantify all CMR images.
LGE quantification Semi-automated quantification was performed as follows: Epicardial and endocardial LV contours were carefully placed manually on all LGE images. The remote non-LGE reference region of each LGE slice was placed adjacent to the region of LGE so that the reference region is at approximate equal distance from the anterior receiver coils. Therefore, we believe this method minimizes any modifying effect from LGE location to robustness of the LGE quantitation.
LGE mass was then quantified by semi-automatic methods using a signal intensity threshold of > 2,3,4,5,6,7-SD, respectively above a reference region of remote myocardium (adjacent to the region of LGE and approximately equal in distance to anterior receiver coils) in the same slice, and using regions defined as above 50% of maximal signal intensity of the enhanced area for the FWHM approach (see Fig. 1) [13,14]. Artifacts were manually erased. In all methods LGE mass (in grams), was then expressed as a percentage of total LV mass determined by bSSFP cine images [15]. LGE extent was also determined visually by the 17 segments model by using two different scores: [1] LGE being present or not in the segment defined the visual presence score (LGE-VPS) (maximum score 17), and [2] the visual transmurality score (LGE-VTS) summed the transmural extent of LGE per segment, assessed by a five-point scale (0 = no LGE, 1 = < 25% transmurality, 2 = 26-50% transmurality, 3 = 51-75% transmurality, 4 = 76-100% transmurality) (maximum score 68). VPS sizing aimed to include any abnormal enhancement on LGE images including regions of intermediate signal intensity.
LGE image quality was graded as: 1 = poor image quality, 2 = fair image quality, 3 = good image quality, and 4 = excellent image quality. LGE images were evaluated by the consensus of two American College of Cardiology Core Cardiovascular Training Statement (COCATS) level III experienced readers (CG and RYK) and inter-rater reproducibility testing was performed by an independent experienced CMR investigator (KK).

Follow-up of clinical endpoints
We reviewed all available electronic medical records including mortality status from Social Security Death Index for all subjects. Subjects were then sent a standardized checklist-based patient-questionnaire by mail and/or followed-up by a scripted telephone interview Fig. 1 Example of the different late gadolinium enhancement (LGE) quantification methods in a patient with suspected myocarditis. a LGE-image with endocardium and epicardium is demarcated, b 2-SD (LGE: 28.9 g, 24.9% of total left ventricular (LV) mass); c 3-SD (19.4g, 16.8%); d 4-SD (12.2 g, 10.5%); e 5-SD (8.1g, 7.0%); f 6-SD (5.2 g, 4.5%); g 7-SD (3.3 g, 2.9%); h full width half maximum (FWHM) (14.7 g, 12.6%). Total LV mass was 116 g. The fibrosis is outlined in yellow. For 2 to 7-SD a region of interest (ROI) 1 is identified in the reference remote myocardium (yellow arrow/yellow contour). For FWHM, an automated ROI 2 is identified in the affected myocardium (pink arrow/pink contour). Of note, only the midventricular slice is represented, however, total LGE quantification includes mass and percentage of the entire left ventricle. SD = standard deviation based on the same standardized checklist 2 weeks later. Subjects were given an option to refuse to be contacted by telephone whether or not they decided to return the questionnaire by mail. Subjects who we were not able to establish contact for at least 6 months after the CMR were considered lost to follow-up. We defined the primary endpoint being a composite major adverse cardiac events (MACE): a) all-cause death, heart failure decompensation requiring hospital admission as defined in prior trials [16,17], heart transplantation, documented sustained ventricular arrhythmia (> 30 s), recurrent acute myocarditis based on clinical biomarkers and CMR Lake-Louise definition [6]. Time-to-MACE was determined from the CMR study date to the occurrence of the first MACE or censorship at end of the follow-up period. All study procedures were reviewed and approved by our Institutional Review Board in accordance with institutional guidelines.

Statistical analysis
Categorical variables were presented as percentages of the entire cohort or as percentage of the corresponding group if relevant data were missing. Continuous variables were expressed as mean ± standard deviation or as median values with interquartile range [IQR] depending on normality of distributions. Categorical variables were compared using the Chi 2 or Fisher exact test in low field numbers, whereas comparisons for continuous data were performed using 2-sample Student t-test or Wilcoxon rank-sum test, when appropriate. A two-sided p-value of < 0.05 was deemed significant. Univariable associations with MACE and chi-square were determined by Cox proportional hazards regression. Multivariable (simultaneous entry, i.e. enter method) associations of risk covariates with MACE were determined by Cox proportional hazards regression. Multivariable models were built by including LGE extent from the various quantification methods and clinical variables in order to test for the prognostic association of LGE extent incremental to common clinical risk markers. For each multivariable model, the assumption of proportional hazards was checked and confirmed valid. For inter-rater and intra-rater reliability analyses, we randomly selected 20 patients with LGE presence and compared the measurement of LGE extent using all 7 quantitative methods (2-SD through 7-SD, FWHM) by two independent readers. Each CMR scan was independently contoured by the 2 readers and automated region of reference (ROI) was selected to generate the LGE extents for comparison. Furthermore, the LGE-VPS and LGE-VTS visual scores by two independent readers were assessed for Intra-and inter-reader agreement using intraclasscorrelation coefficient (ICC). SAS (version 9.4, SAS Institute Inc., Cary, USA) and SPSS 22.0 (International Business Machines Corporation, Armonk, New York, USA) software packages were used for analysis.

Results
A total of 670 patients represented our study group with 2 (0.3%) patients lost to follow-up and a median follow-up of 4.7 [IQR 2.3-7.3] years. Mean age was 48 ± 16 years and 392 (59%) patients were male. CMR studies were performed using a 3 T scanner in 535 (80%) patients and the remaining using a 1.5 T scanner. 350 (52%) patients presented with acute chest pain syndromes (< 2 weeks) and the remaining 320 (48%) were subacute presentations (> 2 weeks). LGE was present in 294 (44%) patients. Among these 169 (57%) were in the acute presentation group and 125 (43%) were in the subacute presentation group. Baseline and CMR characteristics are depicted in Tables 1 and 2. In 24 (3.5%) patients, LGE image quality was poor, in 92 (13.7%) image quality was fair, in 356 (53.1% image quality was good and in LGE extent is presented in Fig. 2.
LGE image quality did not have an impact on the predictive value of LGE quantification methods.

Discussion
Our current study represents a systematic assessment of different semi-automated quantification methods of LGE extent in a large cohort of patients LGE presence 292 (44%) LGE-VPS 1.7 ± 3.4 LGE-VTS 4.2 ± 8.9 LGE mass (g) - with suspected myocarditis. Our results based on adjusted Cox regression analyses and measurements of the degrees of intra-and inter-rater measurement reproducibility showed that FWHM demonstrated the highest technical consistency and the strongest prognostic association with MACE amongst all LGE quantification methods in patients with suspected myocarditis. On the other hand, visual qualitative assessment of LGE extent using the LGE-VPS score, represents a reliable alternative prognosticating method with an excellent intra-and inter-rater variability.
LGE extent and outcome association As consistent with other studies including myocardial infarction and hypertrophic cardiomyopathy patients, different LGE extent quantification methods result in a different extent of myocardial fibrosis, suggesting that the different methods are not interchangeable [13,18]. This may have practical implications: thresholds that used lower SD cutoff may overestimate the extent of the LGE but possess higher sensitivities in detecting abnormal LGE.
LGE-assessed myocardial fibrosis has been shown to be a predictor for outcome in patients with  [2,8]. In fact, we could recently demonstrate that beside LVEF, LGE presence was associated with MACE [9], which is in line with a study including high risk myocarditis patients, where presence of LGE was an independent predictor of all-cause mortality [2]. To date, LGE extent proved to be an outcome predictor in patients with myocardial infarction, hypertrohpic cardiomyopathy and non-ischemic cardiomyopathy [14,[19][20][21][22]; however, no data based on larger studies were available on prognosis of different LGE extent quantification methods in suspected myocarditis patients. Interestingly, in the multivariable model in our study only FWHM and 2-SD showed an independent association with outcome, but not using the 3 to 7-SD methods. Compared to myocardial infarction patients where higher SDs are used, lower SD might capture more substrate at risk (such as edema, fibrosis and diffuse scar) in myocarditis patients. Further, in the setting of myocarditis, the areas of LGE usually demonstrate lower signal intensity compared with LGE in patients with myocardial infarction. Therefore, based on the mean signal of the remote myocardium, and the difference between remote and pathological myocardium being smaller, higher SDs (especially 4 to 7-SD) might shift the threshold for LGE far from the remote myocardium and consequently underestimate the severity and extent of LGE. In studies including myocardial infarction patients, where LGE is commonly found bright, and apart from FWHM [13,15], only thresholds higher than 2-SD (i.e. 3-SD to 6-SD) were proposed as accurate methods [23,24]. However, in the setting of infarction, the different methods were not addressed with outcome association [25]. Interestingly our visual scoring showed strong association with outcome. Similar results were obtained from a small study only including 41 patients, where LGE extent (quantified by visual transmurality score, similar to our visual scores) remained an independent predictor of MACE with HR 1.42 [26]. Likewise, in another very small study from Barone-Rochette et al.  including 28 patients with suspected myocarditis, a simplified quantitative score (SQS), similar to our LGE-VTS score, a trend towards worse outcome in those with higher initial SQS score (p = 0.08) could be shown [27]. It seems that visual assessment of LGE offers a rapid alternative for risk stratification in a diffuse or patchy disease such as myocarditis, where in contrast to myocardial infarction, more myocardium at risk might be involved than only territories subtended by a certain vessel.

Inter-rater and intra-rater variability
Some authors used manual quantification as a reference standard to compare different quantification methods of LGE from myocardial infarction. In our opinion, manual contouring of LGE extent might be less consistent than semi-automatic quantitative methods in sizing LGE of patients with myocarditis. Compared to myocardial infarction, LGE from myocarditis tends to show more blurry contours and less intense enhancement. For example, in HCM patients presenting with rather patchy or diffuse LGE, manual delineation had poorer inter-and intra-rater variability compared to infarction patients [13], proving the manual technique not to be applicable in diseases with other LGE patterns than infarction. The heterogeneous nature of LGE presentations leads to a broad intra-and inter-rater variability and FWHM is more robust than the lower SDs in our study. In the quantification techniques associated with outcome, inter-rater and intra-rater variability showed highest reproducibility in the semi-automated FWHM technique, which was also shown in the mentioned study and by others likewise in different cardiac diseases [13,18,28,29]. The FWHM method is less prone to over-or underestimation in myocarditis patients, where LGE extent also might be represented by inflammation. This highlights the fact that different LGE quantification techniques are not necessarily equally applicable between different diseases since signal intensities are not comparable with those in infarction patients [13]. Another issue in myocarditis patients is that a high proportion present with subepicardial fibrosis [30] and in these cases epicardial contouring and delineation of epicardial LGE from epicardial fat, which also presents with a bright signal in LGE images, might pose issues for reproducibility. Therefore, even a small change in the epicardial contour might have a large impact on the intra-and inter-rater variability. Consequently, our visual scores -LGE-VPS and LGE-VTS (presence or absence of LGE or LGE transmurality)may be less prone to variation and showed the highest inter-rater and intra-rater variability, which is consistent with prior literature [27].

Limitations
There are several limitations for our study. This study is a retrospective observational design at a single center. Including patients evaluated on both 1.5 and 3.0 Tesla systems may potentially affect the LGE quantification; however, to the best of our knowledge, this effect is minimal on LGE sizing and in our multivariable model this had no significant impact on the results. There is no clinical reference standard for the assessment of LGE, so accuracy across the methods cannot be determined. In that regard, we chose to determine association with outcome as an alternative yet clinically relevant analysis. Our initial hypothesis was to evaluate which semi-quantitative method was best associated with the outcome. As an alternative method, we also evaluated the visual scoring. Our observations showed that both the visual LGE score and LGE quantitation using FWHM demonstrated robust prognostic values and measurement reproducibility, thus suggest that visual LGE score is a reliable clinical reporting parameter whereas FWHM be the signal intensity criteria of choice should LGE quantitation be needed. Our study was not designed to determine if image quality should be an additional factor in this decision in choosing these parameters, but our general impressions is that visual LGE scoring is more consistent. Similar to many clinical studies, details surrounding the immediate cause of death was not possible to obtain in a minority of patients, so we needed to rely on the use of all-cause mortality. Further, the use of immunosuppressant medication like steroids was not assessed, and therefore the influence of such medication on the outcome cannot be evaluated in this study. Finally, the natural course of acute myocarditis usually is followed by a decline in inflammation and LGE signal intensity changes during the stages of tissue healing. Although the inclusion of the variable acute/subacute presentation to the multivariable model did not change the results, future prospective studies will need to determine the need for optimization of the quantitative thresholds at different clinical settings (acute, subacute, delayed, chronic).

Conclusions
LGE presence is a strong risk marker in patients with suspected myocarditis but quantitative methods of LGE sizing can offer a complementary and objective risk assessment. Amongst quantitative methods, LGE extent using FWHM criteria offers the highest prognostic value and high measurement reproducibility. Visual LGE scoring method is a reliable alternative. Fig. 4 Reproducibility of different LGE quantification methods. Intra-rater and inter-rater variability for each LGE quantification method (calculated as 1 -intraclass correlation coefficient [ICC]) of the different methods is lowest in visual scores and FWHM. Intra-rater variability is less marked than inter-rater variability, as would be expected. Of the quantifications methods significantly associated with MACE, FWHM, LGE-VPS and LGE-VTS showed the best inter-and intra-rater variability. FWHM = Full width half maximum, SD = Standard deviation; LGE-VPS = visual LGE presence score; LGE-VTS = visual LGE transmurality score; MACE = Major adverse cardiovascular event