T1 mapping performance and measurement repeatability: results from the multi-national T1 mapping standardization phantom program (T1MES)

Captur, Gabriella; Bhandari, Abhiyan; Brühl, Rüdiger; Ittermann, Bernd; Keenan, Kathryn E.; Yang, Ye; Eames, Richard J.; Benedetti, Giulia; Torlasco, Camilla; Ricketts, Lewis; Boubertakh, Redha; Fatih, Nasri; Greenwood, John P.; Paulis, Leonie E. M.; Lawton, Chris B.; Bucciarelli-Ducci, Chiara; Lamb, Hildo J.; Steeds, Richard; Leung, Steve W.; Berry, Colin; Valentin, Sinitsyn; Flett, Andrew; de Lange, Charlotte; DeCobelli, Francesco; Viallon, Magalie; Croisille, Pierre; Higgins, David M.; Greiser, Andreas; Pang, Wenjie; Hamilton-Craig, Christian; Strugnell, Wendy E.; Dresselaers, Tom; Barison, Andrea; Dawson, Dana; Taylor, Andrew J.; Mongeon, François-Pierre; Plein, Sven; Messroghli, Daniel; Al-Mallah, Mouaz; Grieve, Stuart M.; Lombardi, Massimo; Jang, Jihye; Salerno, Michael; Chaturvedi, Nish; Kellman, Peter; Bluemke, David A.; Nezafat, Reza; Gatehouse, Peter; Moon, James C.

doi:10.1186/s12968-020-00613-3

Research
Open access
Published: 07 May 2020

T₁ mapping performance and measurement repeatability: results from the multi-national T₁ mapping standardization phantom program (T1MES)

Gabriella Captur^1,2,3,
Abhiyan Bhandari⁴,
Rüdiger Brühl⁵,
Bernd Ittermann⁵,
Kathryn E. Keenan⁶,
Ye Yang⁷,
Richard J. Eames⁸,
Giulia Benedetti⁹,
Camilla Torlasco¹⁰,
Lewis Ricketts⁴,
Redha Boubertakh¹¹,
Nasri Fatih^1,2,
John P. Greenwood¹²,
Leonie E. M. Paulis¹³,
Chris B. Lawton¹⁴,
Chiara Bucciarelli-Ducci¹⁴,
Hildo J. Lamb¹⁵,
Richard Steeds¹⁶,
Steve W. Leung¹⁷,
Colin Berry¹⁸,
Sinitsyn Valentin¹⁹,
Andrew Flett²⁰,
Charlotte de Lange²¹,
Francesco DeCobelli²²,
Magalie Viallon²³,
Pierre Croisille²⁴,
David M. Higgins²⁵,
Andreas Greiser²⁶,
Wenjie Pang²⁷,
Christian Hamilton-Craig²⁸,
Wendy E. Strugnell²⁸,
Tom Dresselaers²⁹,
Andrea Barison³⁰,
Dana Dawson³¹,
Andrew J. Taylor^32,33,34,
François-Pierre Mongeon³⁵,
Sven Plein¹²,
Daniel Messroghli^36,37,
Mouaz Al-Mallah³⁸,
Stuart M. Grieve³⁹,
Massimo Lombardi⁴⁰,
Jihye Jang⁴¹,
Michael Salerno⁴²,
Nish Chaturvedi²,
Peter Kellman⁴³,
David A. Bluemke⁴⁴,
Reza Nezafat⁴¹,
Peter Gatehouse⁴⁵ &
James C. Moon^1,46
on behalf of the T1MES Consortium

Journal of Cardiovascular Magnetic Resonance volume 22, Article number: 31 (2020) Cite this article

6725 Accesses
22 Citations
8 Altmetric
Metrics details

Abstract

Background

The T₁ Mapping and Extracellular volume (ECV) Standardization (T1MES) program explored T₁ mapping quality assurance using a purpose-developed phantom with Food and Drug Administration (FDA) and Conformité Européenne (CE) regulatory clearance. We report T₁ measurement repeatability across centers describing sequence, magnet, and vendor performance.

Methods

Phantoms batch-manufactured in August 2015 underwent 2 years of structural imaging, B₀ and B₁, and “reference” slow T₁ testing. Temperature dependency was evaluated by the United States National Institute of Standards and Technology and by the German Physikalisch-Technische Bundesanstalt. Center-specific T₁ mapping repeatability (maximum one scan per week to minimum one per quarter year) was assessed over mean 358 (maximum 1161) days on 34 1.5 T and 22 3 T magnets using multiple T₁ mapping sequences. Image and temperature data were analyzed semi-automatically. Repeatability of serial T₁ was evaluated in terms of coefficient of variation (CoV), and linear mixed models were constructed to study the interplay of some of the known sources of T₁ variation.

Results

Over 2 years, phantom gel integrity remained intact (no rips/tears), B₀ and B₁ homogenous, and “reference” T₁ stable compared to baseline (% change at 1.5 T, 1.95 ± 1.39%; 3 T, 2.22 ± 1.44%). Per degrees Celsius, 1.5 T, T₁ (MOLLI 5s(3s)3s) increased by 11.4 ms in long native blood tubes and decreased by 1.2 ms in short post-contrast myocardium tubes. Agreement of estimated T₁ times with “reference” T₁ was similar across Siemens and Philips CMR systems at both field strengths (adjusted R² ranges for both field strengths, 0.99–1.00). Over 1 year, many 1.5 T and 3 T sequences/magnets were repeatable with mean CoVs < 1 and 2% respectively. Repeatability was narrower for 1.5 T over 3 T. Within T1MES repeatability for native T₁ was narrow for several sequences, for example, at 1.5 T, Siemens MOLLI 5s(3s)3s prototype number 448B (mean CoV = 0.27%) and Philips modified Look-Locker inversion recovery (MOLLI) 3s(3s)5s (CoV 0.54%), and at 3 T, Philips MOLLI 3b(3s)5b (CoV 0.33%) and Siemens shortened MOLLI (ShMOLLI) prototype 780C (CoV 0.69%). After adjusting for temperature and field strength, it was found that the T₁ mapping sequence and scanner software version (both P < 0.001 at 1.5 T and 3 T), and to a lesser extent the scanner model (P = 0.011, 1.5 T only), had the greatest influence on T₁ across multiple centers.

Conclusion

The T1MES CE/FDA approved phantom is a robust quality assurance device. In a multi-center setting, T₁ mapping had performance differences between field strengths, sequences, scanner software versions, and manufacturers. However, several specific combinations of field strength, sequence, and scanner are highly repeatable, and thus, have potential to provide standardized assessment of T₁ times for clinical use, although temperature correction is required for native T₁ tubes at least.

Introduction

T₁ mapping aids clinicians in the assessment and diagnosis of myocardial disease. However, measurement needs to be stable over time with transferable values. Knowledge of normal reference ranges would benefit from not requiring local healthy subject scanning, and the pooling of multi-scanner datasets would have advantages such as increasing available sample sizes for the detection of subtle effects or subgroup analysis and increasing result robustness and generalizability, lowering the chance of unforeseen bias when compared to single-center data [1]. Combining results however introduces sequence, magnet, and field strength bias [2]. The field of T₁ mapping would therefore benefit from a “T₁ standard” to enable cross-center T₁ mapping data pooling and delivery [3]—like the international normalized ratio (INR) which makes it possible to adjust the dosing of vitamin K antagonists regardless of which laboratory has performed the test [4].

The T₁ Mapping and Extracellular volume (ECV) Standardization (T1MES) phantom program was established to explore T₁ mapping quality assurance at 1.5 T and 3 T and understand the feasibility of delivering a “T₁ standard” [5]. The first step toward the goal was development and mass-production of a phantom [5] and its European Union Conformité Européenne (CE) and United States Food and Drug Administration (FDA) regulatory clearance. In September 2015, cardiovascular magnetic resonance (CMR) centers worldwide joined the T1MES consortium and committed to submit a minimum of 12 months of center-specific T₁ mapping data. Data submitted have now been analyzed to explore phantom performance.

We report phantom data at 1 and 2 years using various T₁ mapping sequences, temperature sensitivity, and include platform performance, although we emphasize that comparison of different T₁ methods and systems was not the main aim, rather investigating long-term stability towards the “T₁ standard”. This involved modeling some of the potential sources of the T₁ variation longitudinally and between T1MES centers to identify the most influential factors.

Methods

The development and description of the T1MES phantom (Fig. 1) has been previously reported [5]. Briefly, the T1MES phantom was designed to be field-strength specific (i.e., separate 1.5 T and 3 T models). Each phantom contains four tubes representing human native blood/myocardial T₁ and T₂ values (i.e., pre-gadolinium-based contrast agent [GBCA] values) and five tubes representing human post-GBCA blood/myocardial values. While the main aim of the present study was the collection and analysis of the multi-center data (see the “Methods part 2—Multi-center phantom testing” section), some other tests were applied to a small number of the phantoms during the 2 years to explore the utility of T1MES as a quality assurance device, and these tests are described here first (“Methods part 1—Evaluation of the phantom”). Imaging biomarker terms used follow the recommendations of the Quantitative Imaging Biomarkers Alliance (QIBA) of the Radiological Society of North America (RSNA) [6].

Methods part 1—Evaluation of the phantom

Structural integrity

Gel integrity and aging were checked at each submission time point for participating sites through the manual inspection of localizers that formed part of the minimum dataset requirement for participation. In addition, a high-resolution, isotropic, three-dimensional (3D) gradient echo sequence (0.42 mm³) was run on four phantoms (three 1.5 T phantoms; one 3 T phantom) at baseline (October 2015) and at 2 years post manufacturing in each case using a 3 T MAGNETOM Skyra (Siemens Healthineers, Erlangen, Germany; software syngo MR D13C). The sequence acquired two overlapping slabs (due to scanner software constraints), each with two directions of phase encoding, a slow repetition time (repetition time, TR = 17 ms), and narrow sampling bandwidth (250 Hz/pixel) for better signal-to-noise ratio (SNR). This sequence had weak T₁ and T₂ image contrast and was only for structural examination.

“Reference” rT₁ and rT₂ data

Baseline (October 2015) “reference” T₁ and T₂ values (rT₁, rT₂) were acquired at the Royal Brompton Hospital CMR Unit using basic single-slice TR = 10 s inversion recovery spin echo (IRSE, 8 inversion times [TI] from 25 to 3200 ms) and single-slice repetition time (TR) = 10 s SE (8 echo times [TE] from 10 to 640 ms) [5] respectively. These sequences were identically repeated at 2 years on the same three 1.5 T phantoms and on the same three 3 T phantoms sampled from the production batch. The identifying serial numbers of the three 1.5 T phantoms were 15E031, 15E033, and 15E034, and these phantoms were scanned on a 1.5 T MAGNETOM Avanto [Siemens Healthineers; software syngo MR B17A]. The three 3 T phantoms were 30E001, 30E017, and 30E018, and they were scanned on a 3 T MAGNETOM Skyra [Siemens Healthineers; software syngo MR D13C].

Separate rT₁ and rT₂ data were acquired on the same 3 T phantom (30E021) at the German National Metrology Institute, Physikalisch-Technische Bundesanstalt (PTB) over a period of 1041 days (64 scans) commencing September 2015 (3 T MAGNETOM Verio (Siemens Healthineers; software syngo MR B17A). Sequences used for rT₁ and rT₂ were respectively basic single-slice TR = 8000 ms IRSE (IRSE, 7 TI from 25 to 4800 ms) and single-slice TR = 3000 ms SE (5 TE from 24 to 400 ms).

Temperature sensitivity

The following three methods were used:

First, controlled-temperature experiments over the range 10–30 °C were conducted at the United States National Institute of Standards and Technology (NIST) on six loose T1MES tubes at 1 year (Fig. 2i). T₁ and T₂ were measured at 10, 17, 20, 23, and 30 °C on an VnmrJ4 small-bore scanner operating at 1.5 T (Varian Medical Systems, Palo Alto, California, USA) in a temperature-controlled environment using a fiber optic temperature probe. T₁ was measured by IRSE (TR = 10 s, TI = 50–3000 ms) and T₂ by SE (TR = 10 s, TE = 15–960 ms).

Second, controlled-temperature experiments at 19, 21, and 25 °C were conducted at the PTB laboratory on T1MES phantom 30E012 at 1 year (also Fig. 2i). T₁ and T₂ were measured on a 3 T MAGNETOM Verio scanner (Siemens Healthineers; software syngo MR B17A) using a Pt100 resistance thermometer. T₁ was measured by IRSE (TR = 8000 ms, TI = 25–4800 ms) and T₂ by SE (TR = 3000 ms, TE = 24–400 ms).

Third, for each T1MES phantom scan at all centers, temperature was measured using liquid crystal thermometers adhered to every phantom. These measurements were pooled and analyzed to derive temperature-correction algorithms (see Statistical Analysis).

B₀ and B₁ uniformity

These uniformities and the fundamental distortion of B₁ by water dielectric permittivity especially at 3 T had been tested at baseline (October 2015, previously reported [5]). These uniformities were mapped later to check against “cracking” of the gel and subsequent impact of air gaps on B₀ in particular, while potential “clumping” of the plastic beads over time might in theory affect the B₁ [5]. We therefore considered it prudent to check whether anything unexpected occurred over the long term.

B₀ uniformity was therefore mapped at 2 years in six phantoms, in the transverse slice, midway along the length of the tubes, using a multi-echo gradient echo sequence, based on the phase difference between known TEs [7]. A frequency range of ± 50 Hz across the phantom was considered acceptable, based on published T₁ mapping off-resonance sensitivity [8]. B₁ homogeneity was similarly evaluated using flip angle (FA) maps (double angle method using FA 60° and 120° [θ1, 2 × θ1] with long TR [8 s], and 4 ms sinc [− 3π to + 3π] slice excitation profiles to minimize error due to FA variation through the slice).

Methods part 2—Multi-center phantom testing

Serial, multi-center T₁ mapping data

The T1MES user manual (https://doi.org/10.6084/m9.figshare.c.3610175_D1.v1) defined strict scanning instructions (scanning and shim volume strictly at isocenter, use of same supporting materials, etc.). Each contributed T1MES dataset (localizers, sets of inversion recovery images, and inline scanner-generated T₁ maps, Fig. 3) underwent initial quality assurance, checking orientation, and isocenter (through visual inspection of localizers and maps and semi-automatically by inspecting metadata contained in Digital Imaging and COmmunications in Medicine [DICOM] headers “ImagePositionPatient” and “ImageOrientationPatient”) and to exclude image artifacts. All Siemens sequences except MyoMaps product variant and all Philips (Philips Healthcare, Best, the Netherlands) sequences except CardiacQuant product variant were prototypes. Any tubes with artifacts detected by operator inspection of the submitted T₁ maps were excluded from the analysis. Software version changes were captured automatically from DICOM headers (“StationName” and “SoftwareVersion”).

The T₁ measurements from T1MES datasets (directly using only the parametric maps submitted, not by any T₁ fitting applied centrally to the submitted sets of T₁ recovery images) were carried out using a bespoke MATLAB pipeline (The MathWorks Inc., Natick, Massachusetts, USA, R2012b) assembled in collaboration with the US National Institutes of Health. From the data, T₁ for each of the nine tubes was measured in identically sized regions of interest (ROI) occupying the central 50% by area of each tube (accommodating ~ 40 independent pixels) and collated in a dedicated research electronic data capture instrument (REDCap [9, 10]).

Methods part 3—Statistical analysis

Analysis was performed using R (version 3.0.1, R Foundation for Statistical Computing, Vienna, Austria). Descriptive data are expressed as mean ± standard deviation (SD) and standard error of the mean (SEM) as appropriate. Distribution of data were assessed on histograms and using the Shapiro-Wilk test.

Temperature sensitivity

Linear regression equations were used to relate temperature (predictor variable in degrees Celsius) and the response variable, phantom T₁, by the formula: T1 = Intercept + (β ∗ [Temperature − 21 ° C]), with β representing the temperature correction, and 21 °C our arbitrarily chosen temperature for cross-center comparison.

Correlation with rT₁ times

Correlations between estimated and rT₁ times were derived using linear regression. Tests for significant inter-sequence and cross-vendor correlation differences (setting null value to 0.001) were conducted with alpha 0.01 and confidence level 0.95 [11].

T₁ repeatability

After considering the normal values for native myocardial T₁ reported in the published literature (e.g., in [12,13,14,15,16] as mean ± 1SD, though a 95% reference range is approximately ± 2SD), where 1 SD of the mean native myocardial T₁ is generally ~ 20–30 ms at 1.5 T and ~ 50 ms at 3 T, we arbitrarily pre-defined as repeatable (and suitable for clinical/research use), T₁ mapping approaches where the estimated variance of serial T₁ data did not exceed ½ of the above in vivo 1SD. For T₁ mapping at 1.5 T, this was ≤ 10 ms, i.e., CoV ≤ 1%; for T₁ mapping at 3 T ≤ 25 ms, i.e., CoV ≤ 2%.

The CoV between serial repeat T1MES scans was calculated as the ratio of the SD to the mean. We appraised CoV as a compound measure of all causes of change in the estimated T₁ of all nine tubes before and after temperature correction. We also appraised CoV after temperature correction separately for the four native and five post-GBCA tubes. Sequence-specific differences between the nine temperature-adjusted CoVs were calculated using paired t test with P value adjustment for multiple comparisons by the Bonferroni method (taking two-tailed P < 0.01 as significant).

Sources of T₁ variation

Using temperature-adjusted T₁ values of the “Medium” native myocardium tubes (tubes “F” and “M” respectively), we constructed linear mixed models to study the interplay of some known sources of T₁ variation in multi-center phantom data. We did this separately for 1.5 T and 3 T phantom data. Considering temperature-adjusted T₁ time as the response variable of interest, we examined the influence of phantom ID with and without the added effect of phantom age, as the combined fixed effect. With this, we then tested the following random effects:

i)
Main effects and interactions of scanner vendor/scanner model (Siemens, Philips or General Electric [GE; General Electric Healthcare, Waukesha, Wisconsin, USA]; e.g., for Siemens: MAGNETOM Aera vs. Avanto vs. Espree, etc.);
ii)
Main effects and interaction of sequence/scanner software version (considering all submitted variants of native modified Look-Locker inversion recovery [MOLLI] [17] sequences, shortened MOLLI [18] [ShMOLLI], native saturation-recovery single-shot acquisition [19] [SASHA], and saturation method using adaptive recovery times for cardiac T₁ mapping [20] [SMART]; e.g., for Philips: R4.1.3SP2 vs. R5.1.7SP2 vs. R5.2.0SP2, etc.).

The response variable T₁ fitted a normal probability distribution, so we estimated model parameters using maximum likelihood. ANOVA function using a type II Wald chi-square test evaluated the significance of fixed effects in the model. To compare models, Akaike and Bayesian information criteria (AIC, BIC) with the “smaller-is-better” criterion as well as chi-square values from inter-model ANOVA tests were used. The formulas used for model fitting and more definitions of the applied statistical tests are provided in Table 3.

Software upgrades

To explore whether software upgrades resulted in an abrupt “step” change in the temperature-adjusted T₁ reads, we performed piece-wise linear regression to check for any segmented relationship between the covariates “scan day” and “tube T₁” (considering tube “F” at 1.5 T and “M” at 3 T) [21]. For any broken-line relationship discovered, we defined slope parameters and break points where the linear relation/s changed and temporally correlated these with DICOM software metadata.