- Open Access
T1 mapping performance and measurement repeatability: results from the multi-national T1 mapping standardization phantom program (T1MES)
Journal of Cardiovascular Magnetic Resonance volume 22, Article number: 31 (2020)
The T1 Mapping and Extracellular volume (ECV) Standardization (T1MES) program explored T1 mapping quality assurance using a purpose-developed phantom with Food and Drug Administration (FDA) and Conformité Européenne (CE) regulatory clearance. We report T1 measurement repeatability across centers describing sequence, magnet, and vendor performance.
Phantoms batch-manufactured in August 2015 underwent 2 years of structural imaging, B0 and B1, and “reference” slow T1 testing. Temperature dependency was evaluated by the United States National Institute of Standards and Technology and by the German Physikalisch-Technische Bundesanstalt. Center-specific T1 mapping repeatability (maximum one scan per week to minimum one per quarter year) was assessed over mean 358 (maximum 1161) days on 34 1.5 T and 22 3 T magnets using multiple T1 mapping sequences. Image and temperature data were analyzed semi-automatically. Repeatability of serial T1 was evaluated in terms of coefficient of variation (CoV), and linear mixed models were constructed to study the interplay of some of the known sources of T1 variation.
Over 2 years, phantom gel integrity remained intact (no rips/tears), B0 and B1 homogenous, and “reference” T1 stable compared to baseline (% change at 1.5 T, 1.95 ± 1.39%; 3 T, 2.22 ± 1.44%). Per degrees Celsius, 1.5 T, T1 (MOLLI 5s(3s)3s) increased by 11.4 ms in long native blood tubes and decreased by 1.2 ms in short post-contrast myocardium tubes. Agreement of estimated T1 times with “reference” T1 was similar across Siemens and Philips CMR systems at both field strengths (adjusted R2 ranges for both field strengths, 0.99–1.00). Over 1 year, many 1.5 T and 3 T sequences/magnets were repeatable with mean CoVs < 1 and 2% respectively. Repeatability was narrower for 1.5 T over 3 T. Within T1MES repeatability for native T1 was narrow for several sequences, for example, at 1.5 T, Siemens MOLLI 5s(3s)3s prototype number 448B (mean CoV = 0.27%) and Philips modified Look-Locker inversion recovery (MOLLI) 3s(3s)5s (CoV 0.54%), and at 3 T, Philips MOLLI 3b(3s)5b (CoV 0.33%) and Siemens shortened MOLLI (ShMOLLI) prototype 780C (CoV 0.69%). After adjusting for temperature and field strength, it was found that the T1 mapping sequence and scanner software version (both P < 0.001 at 1.5 T and 3 T), and to a lesser extent the scanner model (P = 0.011, 1.5 T only), had the greatest influence on T1 across multiple centers.
The T1MES CE/FDA approved phantom is a robust quality assurance device. In a multi-center setting, T1 mapping had performance differences between field strengths, sequences, scanner software versions, and manufacturers. However, several specific combinations of field strength, sequence, and scanner are highly repeatable, and thus, have potential to provide standardized assessment of T1 times for clinical use, although temperature correction is required for native T1 tubes at least.
T1 mapping aids clinicians in the assessment and diagnosis of myocardial disease. However, measurement needs to be stable over time with transferable values. Knowledge of normal reference ranges would benefit from not requiring local healthy subject scanning, and the pooling of multi-scanner datasets would have advantages such as increasing available sample sizes for the detection of subtle effects or subgroup analysis and increasing result robustness and generalizability, lowering the chance of unforeseen bias when compared to single-center data . Combining results however introduces sequence, magnet, and field strength bias . The field of T1 mapping would therefore benefit from a “T1 standard” to enable cross-center T1 mapping data pooling and delivery —like the international normalized ratio (INR) which makes it possible to adjust the dosing of vitamin K antagonists regardless of which laboratory has performed the test .
The T1 Mapping and Extracellular volume (ECV) Standardization (T1MES) phantom program was established to explore T1 mapping quality assurance at 1.5 T and 3 T and understand the feasibility of delivering a “T1 standard” . The first step toward the goal was development and mass-production of a phantom  and its European Union Conformité Européenne (CE) and United States Food and Drug Administration (FDA) regulatory clearance. In September 2015, cardiovascular magnetic resonance (CMR) centers worldwide joined the T1MES consortium and committed to submit a minimum of 12 months of center-specific T1 mapping data. Data submitted have now been analyzed to explore phantom performance.
We report phantom data at 1 and 2 years using various T1 mapping sequences, temperature sensitivity, and include platform performance, although we emphasize that comparison of different T1 methods and systems was not the main aim, rather investigating long-term stability towards the “T1 standard”. This involved modeling some of the potential sources of the T1 variation longitudinally and between T1MES centers to identify the most influential factors.
The development and description of the T1MES phantom (Fig. 1) has been previously reported . Briefly, the T1MES phantom was designed to be field-strength specific (i.e., separate 1.5 T and 3 T models). Each phantom contains four tubes representing human native blood/myocardial T1 and T2 values (i.e., pre-gadolinium-based contrast agent [GBCA] values) and five tubes representing human post-GBCA blood/myocardial values. While the main aim of the present study was the collection and analysis of the multi-center data (see the “Methods part 2—Multi-center phantom testing” section), some other tests were applied to a small number of the phantoms during the 2 years to explore the utility of T1MES as a quality assurance device, and these tests are described here first (“Methods part 1—Evaluation of the phantom”). Imaging biomarker terms used follow the recommendations of the Quantitative Imaging Biomarkers Alliance (QIBA) of the Radiological Society of North America (RSNA) .
Methods part 1—Evaluation of the phantom
Gel integrity and aging were checked at each submission time point for participating sites through the manual inspection of localizers that formed part of the minimum dataset requirement for participation. In addition, a high-resolution, isotropic, three-dimensional (3D) gradient echo sequence (0.42 mm3) was run on four phantoms (three 1.5 T phantoms; one 3 T phantom) at baseline (October 2015) and at 2 years post manufacturing in each case using a 3 T MAGNETOM Skyra (Siemens Healthineers, Erlangen, Germany; software syngo MR D13C). The sequence acquired two overlapping slabs (due to scanner software constraints), each with two directions of phase encoding, a slow repetition time (repetition time, TR = 17 ms), and narrow sampling bandwidth (250 Hz/pixel) for better signal-to-noise ratio (SNR). This sequence had weak T1 and T2 image contrast and was only for structural examination.
“Reference” rT1 and rT2 data
Baseline (October 2015) “reference” T1 and T2 values (rT1, rT2) were acquired at the Royal Brompton Hospital CMR Unit using basic single-slice TR = 10 s inversion recovery spin echo (IRSE, 8 inversion times [TI] from 25 to 3200 ms) and single-slice repetition time (TR) = 10 s SE (8 echo times [TE] from 10 to 640 ms)  respectively. These sequences were identically repeated at 2 years on the same three 1.5 T phantoms and on the same three 3 T phantoms sampled from the production batch. The identifying serial numbers of the three 1.5 T phantoms were 15E031, 15E033, and 15E034, and these phantoms were scanned on a 1.5 T MAGNETOM Avanto [Siemens Healthineers; software syngo MR B17A]. The three 3 T phantoms were 30E001, 30E017, and 30E018, and they were scanned on a 3 T MAGNETOM Skyra [Siemens Healthineers; software syngo MR D13C].
Separate rT1 and rT2 data were acquired on the same 3 T phantom (30E021) at the German National Metrology Institute, Physikalisch-Technische Bundesanstalt (PTB) over a period of 1041 days (64 scans) commencing September 2015 (3 T MAGNETOM Verio (Siemens Healthineers; software syngo MR B17A). Sequences used for rT1 and rT2 were respectively basic single-slice TR = 8000 ms IRSE (IRSE, 7 TI from 25 to 4800 ms) and single-slice TR = 3000 ms SE (5 TE from 24 to 400 ms).
The following three methods were used:
First, controlled-temperature experiments over the range 10–30 °C were conducted at the United States National Institute of Standards and Technology (NIST) on six loose T1MES tubes at 1 year (Fig. 2i). T1 and T2 were measured at 10, 17, 20, 23, and 30 °C on an VnmrJ4 small-bore scanner operating at 1.5 T (Varian Medical Systems, Palo Alto, California, USA) in a temperature-controlled environment using a fiber optic temperature probe. T1 was measured by IRSE (TR = 10 s, TI = 50–3000 ms) and T2 by SE (TR = 10 s, TE = 15–960 ms).
Second, controlled-temperature experiments at 19, 21, and 25 °C were conducted at the PTB laboratory on T1MES phantom 30E012 at 1 year (also Fig. 2i). T1 and T2 were measured on a 3 T MAGNETOM Verio scanner (Siemens Healthineers; software syngo MR B17A) using a Pt100 resistance thermometer. T1 was measured by IRSE (TR = 8000 ms, TI = 25–4800 ms) and T2 by SE (TR = 3000 ms, TE = 24–400 ms).
Third, for each T1MES phantom scan at all centers, temperature was measured using liquid crystal thermometers adhered to every phantom. These measurements were pooled and analyzed to derive temperature-correction algorithms (see Statistical Analysis).
B0 and B1 uniformity
These uniformities and the fundamental distortion of B1 by water dielectric permittivity especially at 3 T had been tested at baseline (October 2015, previously reported ). These uniformities were mapped later to check against “cracking” of the gel and subsequent impact of air gaps on B0 in particular, while potential “clumping” of the plastic beads over time might in theory affect the B1 . We therefore considered it prudent to check whether anything unexpected occurred over the long term.
B0 uniformity was therefore mapped at 2 years in six phantoms, in the transverse slice, midway along the length of the tubes, using a multi-echo gradient echo sequence, based on the phase difference between known TEs . A frequency range of ± 50 Hz across the phantom was considered acceptable, based on published T1 mapping off-resonance sensitivity . B1 homogeneity was similarly evaluated using flip angle (FA) maps (double angle method using FA 60° and 120° [θ1, 2 × θ1] with long TR [8 s], and 4 ms sinc [− 3π to + 3π] slice excitation profiles to minimize error due to FA variation through the slice).
Methods part 2—Multi-center phantom testing
Serial, multi-center T1 mapping data
The T1MES user manual (https://doi.org/10.6084/m9.figshare.c.3610175_D1.v1) defined strict scanning instructions (scanning and shim volume strictly at isocenter, use of same supporting materials, etc.). Each contributed T1MES dataset (localizers, sets of inversion recovery images, and inline scanner-generated T1 maps, Fig. 3) underwent initial quality assurance, checking orientation, and isocenter (through visual inspection of localizers and maps and semi-automatically by inspecting metadata contained in Digital Imaging and COmmunications in Medicine [DICOM] headers “ImagePositionPatient” and “ImageOrientationPatient”) and to exclude image artifacts. All Siemens sequences except MyoMaps product variant and all Philips (Philips Healthcare, Best, the Netherlands) sequences except CardiacQuant product variant were prototypes. Any tubes with artifacts detected by operator inspection of the submitted T1 maps were excluded from the analysis. Software version changes were captured automatically from DICOM headers (“StationName” and “SoftwareVersion”).
The T1 measurements from T1MES datasets (directly using only the parametric maps submitted, not by any T1 fitting applied centrally to the submitted sets of T1 recovery images) were carried out using a bespoke MATLAB pipeline (The MathWorks Inc., Natick, Massachusetts, USA, R2012b) assembled in collaboration with the US National Institutes of Health. From the data, T1 for each of the nine tubes was measured in identically sized regions of interest (ROI) occupying the central 50% by area of each tube (accommodating ~ 40 independent pixels) and collated in a dedicated research electronic data capture instrument (REDCap [9, 10]).
Methods part 3—Statistical analysis
Analysis was performed using R (version 3.0.1, R Foundation for Statistical Computing, Vienna, Austria). Descriptive data are expressed as mean ± standard deviation (SD) and standard error of the mean (SEM) as appropriate. Distribution of data were assessed on histograms and using the Shapiro-Wilk test.
Linear regression equations were used to relate temperature (predictor variable in degrees Celsius) and the response variable, phantom T1, by the formula: T1 = Intercept + (β ∗ [Temperature − 21 ° C]), with β representing the temperature correction, and 21 °C our arbitrarily chosen temperature for cross-center comparison.
Correlation with rT1 times
Correlations between estimated and rT1 times were derived using linear regression. Tests for significant inter-sequence and cross-vendor correlation differences (setting null value to 0.001) were conducted with alpha 0.01 and confidence level 0.95 .
After considering the normal values for native myocardial T1 reported in the published literature (e.g., in [12,13,14,15,16] as mean ± 1SD, though a 95% reference range is approximately ± 2SD), where 1 SD of the mean native myocardial T1 is generally ~ 20–30 ms at 1.5 T and ~ 50 ms at 3 T, we arbitrarily pre-defined as repeatable (and suitable for clinical/research use), T1 mapping approaches where the estimated variance of serial T1 data did not exceed ½ of the above in vivo 1SD. For T1 mapping at 1.5 T, this was ≤ 10 ms, i.e., CoV ≤ 1%; for T1 mapping at 3 T ≤ 25 ms, i.e., CoV ≤ 2%.
The CoV between serial repeat T1MES scans was calculated as the ratio of the SD to the mean. We appraised CoV as a compound measure of all causes of change in the estimated T1 of all nine tubes before and after temperature correction. We also appraised CoV after temperature correction separately for the four native and five post-GBCA tubes. Sequence-specific differences between the nine temperature-adjusted CoVs were calculated using paired t test with P value adjustment for multiple comparisons by the Bonferroni method (taking two-tailed P < 0.01 as significant).
Sources of T1 variation
Using temperature-adjusted T1 values of the “Medium” native myocardium tubes (tubes “F” and “M” respectively), we constructed linear mixed models to study the interplay of some known sources of T1 variation in multi-center phantom data. We did this separately for 1.5 T and 3 T phantom data. Considering temperature-adjusted T1 time as the response variable of interest, we examined the influence of phantom ID with and without the added effect of phantom age, as the combined fixed effect. With this, we then tested the following random effects:
Main effects and interactions of scanner vendor/scanner model (Siemens, Philips or General Electric [GE; General Electric Healthcare, Waukesha, Wisconsin, USA]; e.g., for Siemens: MAGNETOM Aera vs. Avanto vs. Espree, etc.);
Main effects and interaction of sequence/scanner software version (considering all submitted variants of native modified Look-Locker inversion recovery [MOLLI]  sequences, shortened MOLLI  [ShMOLLI], native saturation-recovery single-shot acquisition  [SASHA], and saturation method using adaptive recovery times for cardiac T1 mapping  [SMART]; e.g., for Philips: R4.1.3SP2 vs. R5.1.7SP2 vs. R5.2.0SP2, etc.).
The response variable T1 fitted a normal probability distribution, so we estimated model parameters using maximum likelihood. ANOVA function using a type II Wald chi-square test evaluated the significance of fixed effects in the model. To compare models, Akaike and Bayesian information criteria (AIC, BIC) with the “smaller-is-better” criterion as well as chi-square values from inter-model ANOVA tests were used. The formulas used for model fitting and more definitions of the applied statistical tests are provided in Table 3.
To explore whether software upgrades resulted in an abrupt “step” change in the temperature-adjusted T1 reads, we performed piece-wise linear regression to check for any segmented relationship between the covariates “scan day” and “tube T1” (considering tube “F” at 1.5 T and “M” at 3 T) . For any broken-line relationship discovered, we defined slope parameters and break points where the linear relation/s changed and temporally correlated these with DICOM software metadata.
Results part 1—Evaluation of the phantom
Individual inspection of all the localizers submitted by sites with each phantom dataset revealed no visible gel rips or tears down any of the tubes. At baseline and at 2 years following batch manufacturing (Supplementary Movie 1), phantoms were free of air bubbles and susceptibility artifacts at both field strengths. High-resolution imaging showed no evidence of gel rips or tears down any of the tubes, and the gels were intact in the mid slice—the piloted location for serial mapping (Fig. 1, left panel). T1 maps collected through the midline of the phantom, using the specified T1MES scan setup, were free from off-resonance artifacts.
“Reference” rT1 and rT2 data
Phantom measurements (averaged across all tubes) collected at the Royal Brompton Hospital showed that temperature-corrected rT1 and rT2 values at 2 years were stable compared to baseline. At 1.5 T, the 2-year temperature-adjusted %T1 change was 1.95 ± 1.39% SD, 0.37% SEM. At 3 T, the 2-year temperature-adjusted %T1 change was 2.22 ± 1.44% SD, 0.25% SEM. Three tesla measurements at PTB showed a 2-year temperature-adjusted %T1 change of 0.80 ± 0.49%, SEM 0.16% (Supplementary Figure S1A). Although not the aim of this work, all reference T2 parameters remained stable with relative < 1% change at PTB (temperature-adjusted %T2 change over 2 years at 3 T = 1.65 ± 0.26% | 0.09%, Supplementary Figure S1B) and < 2% change over 2 years at Royal Brompton Hospital.
Sequences from Siemens and Philips at 1.5 T collected at Royal Brompton Hospital showed strong correlation with rT1 times with small offsets, as shown in Fig. 4 (left panel). There were no statistically significant differences between the rT1 correlations for a given sequence when comparing Siemens to Philips platforms (Supplementary Table 1, panel C) or when individual sequences were compared on Philips (Supplementary Table S1, panel B). However, on Siemens, rT1 correlations for SASHA were significantly stronger than for both MOLLI and ShMOLLI (Supplementary Table S1, panel A).
Sequences from Siemens and Philips at 3 T showed strong correlation with rT1 times but with visibly imperfect absolute measurement of T1 across tubes of longer T1 times as shown in Fig. 4 (right panel). The correlations between absolute measurements of T1 and rT1 were similar for MOLLI and SASHA on Philips (Supplementary Table S2, panel B). On Siemens, the correlation between absolute measurements of T1 and rT1 was significantly stronger for SASHA when compared to MOLLI and ShMOLLI (Supplementary Table S2, panel A). The correlation between absolute measurements of T1 and rT1 for SASHA was significantly stronger on Siemens compared to Philips (Supplementary Table S2, panel C). Comparison of errors at 3 T relative to 1.5 T showed no significant difference (average SEM 5.49 vs. 4.04 respectively, P = 0.428).
Temperature experiments at both NIST and PTB using spin echo, long TR, and sequences (Fig. 2i) consistently showed that short-T1 tubes (modeling post-GBCA myocardium and blood) were more stable with temperature than very long-T1 tubes (native blood), where T1 increased more significantly with temperature. T1 increased by ~ 11 ms/°C for a typical MOLLI 5s(3s)3s 1.5 T dataset in tube “B” (normal native blood). Conversely, T1 decreased by ~ 1.2 ms/°C in tube “G” (short post-GBCA myocardium). Temperature corrections for each of the nine tubes at both field strengths from the pooled multi-center analysis are presented separately (Supplementary Table S3) from temperature sensitivity experiments of NIST and PTB due to the latter’s use of bespoke standard operating procedures.
B0 and B1 uniformity
B0 uniformity at 2 years was delivered to within ± 9 Hz at 1.5 T and ± 10 Hz at 3 T (Fig. 5a–b). At 2 years after manufacture, the B1 field distortion caused by the phantom, continued to be adequately flattened to within 10% of the B1 at the center of the phantom, at both 1.5 T and 3 T (Fig. 5c–d).
Results part 2—Multi-center phantom testing
Thirty-four of the magnet systems appraised were 1.5 T and 22 were 3 T (see Supplementary Data File and Supplementary Table S4) with manufacturers being 35 Siemens, 18 Philips, and 3 GE. Scan frequencies are reported in the Supplementary Data File. The planned program was 1 year, but 24 centers voluntarily extended, resulting in mean longitudinal data of 358 days, and the longest 1161 days.
Quality assurance of the contributed DICOM image files identified significant protocol deviations or sequence reconstruction failures in nine submissions, the majority of which took place at the start of the study (examples in Supplementary Fig. S2). These were excluded with subsequent advice sent to centers permitting later inclusion in most cases, except where a tube artifact affecting > 25% of a single center’s series led to exclusion of that tube from pooled analysis (see Supplementary Data File).
Fifty-two magnets non-exclusively contributed MOLLI datasets, 16 ShMOLLI, and 12 SASHA, and 2 magnets contributed SMART T1 maps. Center-, session-, and sequence-specific T1 mapping contributions are detailed in Supplementary Data File. During the project, there were three software upgrades at centers: two Siemens and one GE (details in Supplementary Fig. S4).
For serial multi-center data, temperature-unadjusted CoV per tube, per phantom, per sequence, and per magnet are detailed in the Supplementary Data File. Temperature-adjusted CoV for the various native sequences together with inter-sequence and cross-platform differences, at 1.5 T and 3 T, are summarized in Tables 1 and 2 respectively and data for post-GBCA in Supplementary Tables S5 and S6. Results of native and post-GBCA sequence/scanner software version repeatability are provided in Tables 1 and 2 and in Supplementary Tables S5 and S6 respectively.
Based on these temperature-adjusted data, mean (x̅) CoV were generally higher (implying poorer repeatability) at 3 T than at 1.5 T and for the much sparser GE data over Siemens/Philips. Over 1 year, many 1.5 T and 3 T sequences/magnets were repeatable with x̅ CoV < 1% and < 2% respectively. For sequences optimized for native T1 mapping applied to T1MES native tubes, we observed excellent repeatability for several sequences, for example, Siemens MOLLI 5s(3s)3s 448B (x̅ CoV = 0.27%) and Philips MOLLI 3s(3s)5s (x̅ CoV 0.54%) at 1.5 T (Table 1) and Philips MOLLI 3b(3s)5b (x̅ CoV 0.33%) and Siemens ShMOLLI (x̅ CoV 0.69%) at 3 T (Table 2).
Among sequences optimized for post-GBCA T1 mapping applied to T1MES post-GBCA tubes, excellent repeatability was observed for several sequences, for example, Siemens ShMOLLI 1041B (x̅ CoV 0.21%) and Siemens MOLLI 4s(1s)3s(1s)2s 448B (x̅ CoV 0.26%) at 1.5 T (Supplementary Table S5) and Siemens MOLLI 4b(1s)3b(1s)2b 448B (x̅ CoV 0.10%) and Siemens ShMOLLI (x̅ CoV 0.28%) at 3 T (Supplementary Table S5).
Sources of T1 variation
Linear mixed models which excluded centers that had experienced a software change during the time of data collection (Table 3 and Supplementary Tables S7 and S8) indicated that temperature-adjusted T1 differs significantly between sequences and software versions at both field strengths (P < 0.001 all) and between magnet models at 1.5 T (P = 0.011). Notably, phantom age had no significant effect on T1 (model A2).
The two software upgrades on Siemens both occurred towards the end of longitudinal data submissions (Supplementary Fig. S4A, 15E031; 4B, 30E017), so the paucity of data points post-upgrade events precluded statistical testing for significant T1 shifts. For the software upgrade on GE, however, differences between pre- and post-linear regression slopes (Supplementary Fig. S4C, 30E012) indicate a marginally significant T1 shift event (P = 0.024).
To our knowledge, this is the first multi-center study using a wide range of T1 mapping methods to study the interplay of some of the known sources of measured T1 variation. The basis of this work is the T1MES phantom, and a major aim of this work was to evaluate how this performed in use at multiple centers. The phantom appears sufficiently robust for T1 mapping quality assurance purposes. Based on our data, estimated phantom T1 values may show substantial variation between different T1 mapping sequences, scanner software versions, and potentially also scanner models, and temperature correction, at least for native T1 tubes, is necessary to achieve the desired repeatability. Inspection of the Supplementary Table also suggests that there are occasional performance variations producing outliers even within a given T1 mapping sequence prototype/software combination across different magnets. In spite of this variation, however, several specific combinations of field strength, sequence, and scanner generally exhibit excellent repeatability. Given the choices, the CMR community may prefer to standardize and use combinations with high repeatability for future clinical and research use, e.g., CoV < 1% at 1.5 T and < 2% at 3 T, while seeking to optimize less repeatable combinations. Less data were available for GE in T1MES, so the observed variability requires verification in a larger more representative sample.
Agreement with “reference” slow scanning, rT1 times was slightly greater for SASHA compared to MOLLI or ShMOLLI, which both slightly underestimated T1, mainly from the known T2-related underestimation which is larger when measuring the myocardium [13,14,15], (Fig. 4, Supplementary Tables S1 and S2) and not due to magnetization transfer, which is negligible in these agar phantoms .
The phantom data presented here suggest that the “T1 standard” framework remains possible, but the wide variety of different sequence options, vendors, and field strengths distributed the data over too many categories for reliable modeling, unlike the “locked-down” approach mentioned below which does have that strong advantage. Further work will explore the transferability of clinical measurement based on in vitro phantom calibration.
T1MES is but one of a number of phantom objects that have been used or proposed to support T1 mapping quality assurance (elaborated in Table 4). Some alternatives that target native and post-GBCA myocardial and blood relaxation times in support of T1 mapping work are (1) Brompton phantom by Vassiliou et al. , (2) Hypertrophic Cardiomyopathy Registry  phantom (HCMR) by Piechnik et al., and (3) International Society for Magnetic Resonance in Medicine (ISMRM)/NIST MR imaging phantom .
At 1 year, CoV for T1 tubes of the Vassiliou  phantom on a single Siemens scanner ranged from 1.0 to 3.6% using only the native MOLLI 5s(3s)3s sequence, compared to 0.27 to 3.0% in T1MES (for 1.5 T MOLLI 5s(3s)3s [448B] on Siemens to SMART on GE respectively). This within sequence precision heterogeneity, as already alluded to by Kellman et al. , linked to protocol modifications within MOLLI, may partly explain some of the higher CoV for specific MOLLI sequences compared to ShMOLLI where centers obligatorily scanned using a fixed 5b(1b)1b(1b)1b sequence with conditional fitting. As mentioned above for the dispersion of T1MES results, this “real-world” heterogeneity of T1 mapping sequences highlights that while T1 mapping research and innovation calls for multiple flexible prototypes, the resultant diversity poses a nontrivial challenge to multi-center standardization. Conversely, the more “locked-down” approach like that adopted for ShMOLLI, albeit less editable by external researchers, potentially facilitates standardization.
The design of phantoms for this purpose (although often seen as trivial) is challenging: to have a large sufficient ROI in each sample tube and a good number of sample tubes in the main jar of the phantom, the overall size of the phantom increases to the point that B1 non-uniformity at 3 T due to the dielectric permittivity of any water-based phantom becomes problematic. While of course B1 is non-uniform in vivo, one purpose of phantom tests is to eliminate uncertainty or at least enable controlled testing of factors such as B1 rather than introduce their own errors. Conversely, if the phantom is made very small, then the truly acquired pixel size of in vivo T1 mapping methods (which obviously must not be modified or adapted  for the phantom scans) becomes important compared to the sample tube inner diameters leading to questions over SNR  and the impact of Gibbs artifact, for example.
Another obstacle for relaxation time phantoms (on top of basic water T1 and T2 temperature and pressure variations, gel instabilities [30, 31], small leakages, dehydration, or ion migration effects over time) is the differing temperature dependence of each species of paramagnetic ion’s relaxivity (r1, r2) and furthermore frequency dispersion in all of these physical properties between 60 MHz and 120 MHz [32,33,34,35,36]. Efforts to use alternative chemistry have so far not provided the range of T1 and T2 needed for this application.
The results confirmed, as is widely known in the field, that the T1 measured by even nominally identical mapping sequences can differ significantly between software revisions. Scanner software upgrades are not uncommon in centers, and we encountered three such events among contributing partners. There was a measurable shift on at least one system, with potentially important implications for the field as alluded to in the 2017 consensus statement on multi-parametric mapping by the Society for Cardiovascular Magnetic Resonance (SCMR) . Future work using phantoms at scale, across vendors and over time, would ideally continue monitoring center-specific data before and after potential shift events to more fully understand their impact on T1 allowing the community to set up mitigation approaches.
Clearly, sources of T1 variance and drift over time and between centers are numerous. Some T1 variance is related to differences between sequences and magnet platforms: to the scanner itself (equipment drift or equipment shifts from updates of software, replacement of parts, routine service recalibrations, etc.), to the environment (temperature, pressure, humidity), and to the operator (phantom positioning inside scanner bore). In a phantom study, some T1 variance may additionally have been due to intra-phantom differences (liquid/agar status over time collectively contributing to phantom aging) and inter-phantom manufacturing variations. In T1MES, we mitigated potential inter-phantom manufacturing variations by exercising extreme caution in design and prototyping and by the strict batch manufacturing processes to medical device grade standards. Drifts in the system setup (long-term stability) would tend to be “adjusted” out by the shim and center frequency (CF) and transmitter B1 reference (TxREF) adjustments that could otherwise bias estimated T1. Any receiver coil or gain changes would most likely cancel out of T1 maps as constant across the series of input images to T1 derivation (appearing in A and B maps instead). It is considered unlikely that gradient performance would drift significantly, and any significant sudden changes in that are also unlikely even when the eddy-current compensation is adjusted on routine service visits. However, we have shown how unforeseen software changes could potentially cause “step-wise” changes in the estimated T1 records. To directly model the interplay between all these potential sources of T1 variation in longitudinal phantom studies would be a daunting task, and we considered only a subset of these sources in our analyses.
Though B0 and B1 homogeneity tests in T1MES (Fig. 5) were stable, we did measure slightly greater field inhomogeneity at the phantom edge and corners compared to the core (phantom corners and edges are the zones which our T1MES data flagged up as areas of least B0/B1 homogeneity). The tubes with the highest T1 times are the tubes most susceptible to such field inhomogeneities, and this suggests that a future iteration of T1MES could be further optimized by rearrangement to position such tubes in the most uniform regions.
The results of this study are relevant to clinicians or researchers choosing sequences for T1 mapping, and we highlight 3 take home messages as follows:
Sequence/software combinations in Tables 1 and 2, with a CoV ≤ 1% and 2% respectively, are sufficiently repeatable for clinical/research use (see rationale in the “Methods part 3—Statistical analysis” and “T1 repeatability” sections). For the native myocardium T1MES tube at 1.5 T (reference T1 ~ 1037 ms by MOLLI), a ≤ 1% CoV is well within ± 1SD of normal values for T1, meaning that small-magnitude biological changes (e.g., diffuse myocardial fibrosis), will be resolvable with high precision . By contrast, CoVs of 3 or 6% at 1.5 T (variance of 31 and 62 ms respectively) will undermine a center’s ability to resolve small-magnitude changes, and they may also impact the resolution of large-magnitude biological changes  (like amyloid, iron overload, Fabry disease, acute myocardial injury). Our data indicate that many of the investigated sequences for Siemens and Philips at 1.5 T exhibited CoVs ≤ 1%. At 3 T, MOLLI (Siemens and Philips), ShMOLLI (Siemens), and SASHA (Siemens) exhibited CoVs ≤ 2%. The absence of GE from this list is due to limited participation.
For developers, the highlighting of sequence performance may encourage work to refine sequences in some cases.
The prospect of using phantom calibration to remove the need for local reference ranges  is potentially becoming more tangible than before, yet more work is needed to deliver such calibration.
These data may facilitate the use of T1 mapping as a useful clinical biomarker. T1 standardization will also be important to facilitate clinical research. At the present time, there are 25 registered clinical research studies using T1 mapping (clinicaltrials.gov accessed January 2019).
Presented results are not intended to compare the SNR obtained in the phantom setup in different machines but are aimed at studying long-term stability. In spite of our outreach to advertise enrollment at study start, GE centers are under-represented. The scarcity of multi-center data for many of the specific T1 mapping sequence, software, and field-strength combinations has limited our ability to deliver a “T1 standard”. Eight centers enrolled, received a phantom, but then could not deliver the serial phantom scans, and four centers did not meet the 1 year minimum data submission requirement. Each center provided legitimate justifications for the missing data, and furthermore, we highlight that contributions to T1MES were entirely voluntary, with no center receiving remuneration for any staff or scanner time invested in the program.
In total, 3 magnets deviated from the prescribed nominal intervals in some of their initially submitted datasets. Phantom repeatability studies represent an in vitro experiment. Some sequences and scanners that performed well in vitro may exhibit greater variability in the face of biological tissue (magnetization transfer, flow, motion), patient characteristics (fast/slow heart rates, heart rate variability, breath-hold length, implanted metal devices), and scanning process (non-isocentering, scanning in multiple planes). The T1MES phantom does not capture magnetization transfer. The biomarker performance that was measured here accounts for T1 mapping in vitro. While this is obviously related to diagnostic performance in vivo, cardiac motion, etc., introduce yet another degree of freedom, and different hardware/software combinations might deal with this differently. In addition, CoV as a single metric may not be the best way to pick a pulse sequence for repeatability when the data is only derived from a static phantom. A very precise (reproducible) T1 sequence that requires long breatholds or is heart rate-variability-sensitive might perform very well with the T1MES phantom setup but perform poorly in patients with varying biology, e.g., that cannot hold their breath well or who are in atrial fibrillation. The phantom has no intracellular/extracellular component, so any researcher seeking to model “ECV” using pre- and post-GBCA T1 mapping values derived from T1MES (see example data in Supplementary Table S9) must bear this in mind.
The phantom design, ROI size, and coil setup instructions all aimed to make the fundamental image noise SNR an irrelevant factor in the CoV results. The artificially high and “clean” SNR of a phantom setup (if without phantom-related distortions, as we have shown in the “B0 and B1 uniformity” section) is clearly not directly an indicator of satisfactory in vivo performance. It contains many sequence-related factors that may have different consequences in vivo , and such SNR comparisons are not the aim of this work. Ideally, participating sites would have proved this by sending multiple T1 maps per session. However, contributions were already voluntary, and further demands on sites could not be supported. For the same reason, it was not feasible to demand long reference T1/T2 data for each participating site, and this was never part of the original enrollment commitment. As some indication of the typical short-term thermal noise random jitter, the raw (temperature-unadjusted) T1 values from some sites that did anyway submit several repeated T1 maps each session are plotted in Supplementary Fig. S3.
The T1MES phantom developed to CE/FDA manufacturing standards for T1 mapping is a robust quality assurance device. T1 mapping can now be quality assured on a multi-center scale, fulfilling a key requirement for the use of T1 mapping in clinical decision-making or as a surrogate endpoint in drug trials. A good number of T1 measurement sequence/scanner model/vendor/field strength combinations are remarkably repeatable, but some combinations less so, with coefficients of variation exceeding 1–2% over 1 to 2 years. Given the alternatives, we recommend that combinations with poor repeatability are deprioritized for clinical and research use. In spite of the use of a large number of different sequence prototypes and products in this study, a number of agreements and encouraging results are reported. Phantom calibration of T1 mapping which obviates the need for local reference ranges, enabling the establishment of a “T1 standard” to facilitate multi-center T1 studies, comes closer, with further work required to address this.
Availability of data and materials
Datasets collected for the current study are available from the corresponding author upon reasonable request.
Akaike information criteria
Bayesian information criteria
Cardiovascular magnetic resonance
Coefficient of variation
Digital Imaging and COmmunications in Medicine
Food and Drug Administration
Gadolinium-based contrast agent
General Electric Healthcare
Hypertrophic Cardiomyopathy Registry
International normalized ratio
Inversion recovery spin echo
International Society for Magnetic Resonance in Medicine
Modified Look-Locker inversion recovery
National Institute of Standards and Technology
Quantitative Imaging Biomarkers Alliance
Research electronic data capture
Region of interest
Radiological Society of North America
- rT 1 :
- rT 2 :
Saturation recovery single-shot acquisition
Society for Cardiovascular Magnetic Resonance
Saturation method using adaptive recovery times for cardiac T1 mapping
T1 mapping and extracellular volume standardization
Transmitter B1 reference
Stonnington CM, Tan G, Klöppel S, Chu C, Draganski B, Jack CR, Chen K, Ashburner J, Frackowiak RSJ. Interpreting scan data acquired from multiple scanners: a study with Alzheimer’s disease. Neuroimage. 2008;39:1180–5.
Nordin S, Kozor R, Baig S, Abdel-Gadir A, Medina-Menacho K, Rosmini S, Captur G, Tchan M, Geberhiwot T, Murphy E, Lachmann R, Ramaswami U, Edwards NC, Hughes D, Steeds RP, Moon JC. Cardiac phenotype of prehypertrophic Fabry disease. Circ Cardiovasc Imaging. 2018;11:e007168.
Keenan KE, Biller JR, Delfino JG, Boss MA, Does MD, Evelhoch JL, Griswold MA, Gunter JL, Hinks RS, Hoffman SW, Kim G, Lattanzi R, Li X, Marinelli L, Metzger GJ, Mukherjee P, Nordstrom RJ, Peskin AP, Perez E, Russek SE, Sahiner B, Serkova N, Shukla-Dave A, Steckner M, Stupic KF, Wilmes LJ, Wu HH, Zhang H, Jackson EF, Sullivan DC. Recommendations towards standards for quantitative MRI (qMRI) and outstanding needs. J Magn Reson Imaging. 2019;0. https://doi.org/10.1002/jmri.26598 Epub ahead of print.
Lecompte T, Samama MM. Monitoring of treatment with vitamin K antagonists: a plea for “INR” (international normalized ratio). Rev Prat. 1990;40:563–5.
Captur G, Gatehouse P, Keenan KE, Heslinga FG, Bruehl R, Prothmann M, Graves MJ, Eames RJ, Torlasco C, Benedetti G, Donovan J, Ittermann B, Boubertakh R, Bathgate A, Royet C, Pang W, Nezafat R, Salerno M, Kellman P, Moon JC. A medical device-grade T1 and ECV phantom for global T1 mapping quality assurance-the T1 Mapping and ECV Standardization in cardiovascular magnetic resonance (T1MES) program. J Cardiovasc Magn Reson. 2016;18:58–70.
Huang EP, Raunig DL, Kessler LG, Kondratovich M, Obuchowski NA, Sullivan DC, Reeves AP, McShane LM, Wahl RL, Barboriak DP, Gatsonis C, Guimaraes AR. Metrology standards for quantitative imaging biomarkers. Radiology. 2015;277:813–25.
Hernando D, Kellman P, Haldar JP, Liang Z. Robust water/fat separation in the presence of large field inhomogeneities using a graph cut algorithm. Magn Reson Med. 2010;63:79–90.
Kellman P, Herzka DA, Arai AE, Hansen MS. Influence of off-resonance in myocardial T1-mapping using SSFP based MOLLI method. J Cardiovasc Magn Reson. 2013;15:63–73.
Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)-a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42:377–81.
Captur G, Stables RH, Kehoe D, Deanfield J, Moon JC. Why democratize bioinformatics? BMJ Innov. 2016;2:166–71.
Diedenhofen B, Musch J. Cocor: a comprehensive solution for the statistical comparison of correlations. PLoS One. 2015;10:e0121945.
Liu JM, Liu A, Leal J, McMillan F, Francis J, Greiser A, Rider OJ, Myerson S, Neubauer S, Ferreira VM, Piechnik SK. Measurement of myocardial native T1 in cardiovascular diseases and norm in 1291 subjects. J Cardiovasc Magn Reson. 2017;19:74–84.
Rosmini S, Bulluck H, Captur G, Treibel TA, Abdel-Gadir A, Bhuva AN, Culotta V, Merghani A, Fontana M, Maestrini V, Herrey AS, Chow K, Thompson RB, Piechnik SK, Kellman P, Manisty C, Moon JC. Myocardial native T1 and extracellular volume with healthy ageing and gender. Eur Heart J Cardiovasc Imaging. 2018;19:615–21.
Roy C, Slimani A, de Meester C, Amzulescu M, Pasquet A, Vancraeynest D, Vanoverschelde J-L, Pouleur A-C, Gerber BL. Age and sex corrected normal reference values of T1, T2 T2* and ECV in healthy subjects at 3T CMR. J Cardiovasc Magn Reson. 2017;19:72–82.
Granitz M, Motloch LJ, Granitz C, Meissnitzer M, Hitzl W, Hergan K, Schlattau A. Comparison of native myocardial T1 and T2 mapping at 1.5T and 3T in healthy volunteers : reference values and clinical implications. Wien Klin Wochenschr. 2018. https://doi.org/10.1007/s00508-018-1411-3 [Epub ahead of print].
Higgins D, Keeble C, Juli C, Dawson D, Waterton J. Reference range determination for imaging biomarkers: myocardial T1. J Magn Reson Imaging. 2019;50:771–8.
Messroghli DR, Radjenovic A, Kozerke S, Higgins DM, Sivananthan MU, Ridgway JP. Modified Look-Locker inversion recovery (MOLLI) for high-resolution T1 mapping of the heart. Magn Reson Med. 2004;52:141–6.
Piechnik SK, Ferreira VM, Dall’Armellina E, Cochlin LE, Greiser A, Neubauer S, Robson MD. Shortened Modified Look-Locker inversion recovery (ShMOLLI) for clinical myocardial T1-mapping at 1.5 and 3T within a 9 heartbeat breathhold. J Cardiovasc Magn Reson. 2010;12:69–79.
Chow K, Kellman P, Spottiswoode BS, Nielles-Vallespin S, Arai AE, Salerno M, Thompson RB. Saturation pulse design for quantitative myocardial T1 mapping. J Cardiovasc Magn Reson. 2015;17:84–94.
Slavin GS, Stainsby JA. True T1 mapping with SMART1 Map (saturation method using adaptive recovery times for cardiac T1 mapping): a comparison with MOLLI. J Cardiovasc Magn Reson. 2013;15(Suppl 1):P3.
Muggeo VM. Segmented: an R package to fit regression models with broken-line relationships. R News. 2008;20–25:1–73.
Robson MD, Piechnik SK, Tunnicliffe EM, Neubauer S. T1 measurements in the human myocardium: the effects of magnetization transfer on the SASHA and MOLLI sequences. Magn Reson Med. 2013;70:664–70.
Vassiliou VS, Heng EL, Gatehouse PD, Donovan J, Raphael CE, Giri S, Babu-narayan SV, Gatzoulis MA, Pennell DJ, Prasad SK, Firmin DN. Magnetic resonance imaging phantoms for quality-control of myocardial T1 and ECV mapping: specific formulation, long-term stability and variation with heart rate and temperature. J Cardiovasc Magn Reson. 2016;18:1–12.
Kramer CM, Appelbaum E, Desai MY, Desvigne-Nickens P, DiMarco JP, Friedrich MG, Geller N, Heckler S, Ho CY, Jerosch-Herold M, Ivey EA, Keleti J, Kim D-Y, Kolm P, Kwong RY, Maron MS, Schulz-Menger J, Piechnik S, Watkins H, Weintraub WS, Wu P, Neubauer S. Hypertrophic Cardiomyopathy Registry: the rationale and design of an international, observational study of hypertrophic cardiomyopathy. Am Heart J. 2015;170:223–30.
Keenan KE, Stupic K, Boss M, Russek S, Chenevert T, Prasad P, Reddick W, Cecil K, Zheng J, Hu P, Jackson E. Multi-site, multi-vendor comparison of T1 measurement using ISMRM/NIST system phantom. In: Proceedings of the 24th annual meeting of ISMRM, Singapore; 2016. Abstract 3290.
Vassiliou V, Heng E, Donovan J, Greiser A, Babu-Narayan SV, Gatzoulis MA, Firmin D, Pennell DJ, Gatehouse P, Prasad SK. Longitudinal stability of gel T1 MRI phantoms for quality assurance of T1 mapping. J Cardiovasc Magn Reson. 2015;17(Suppl 1):W28.
Kellman P, Hansen MS. T1-mapping in the heart: accuracy and precision. J Cardiovasc Magn Reson. 2014;16:2–12.
Kellner E, Dhital B, Kiselev VG, Reisert M. Gibbs-ringing artifact removal based on local subvoxel-shifts. Magn Reson Med. 2016;76:1574–81.
Kouwenhoven M. On the impact of slice thickness definition and slice profile in 2D bSSFP on SNR and T1 mapping in cardiac MRI for multi-center trials. J Cardiovasc Magn Reson. 2019;2019:22–3 SCMR Scientific Sessions. Abstract P068.
Mitchell MD, Kundel HL, Axel L, Joseph PM. Agarose as a tissue equivalent phantom material for NMR imaging. Magn Reson Imaging. 1986;4:263–6.
Yoshida A, Kato H, Kuroda M, Hanamoto K. Development of a phantom compatible for MRI and hyperthermia using carrageenan gel–relationship between T 1 and T 2 values and NaCl concentration. Int J Hyperth. 2004;8:803–14.
Koenig SH, Brown RD. Relaxation of solvent protons by paramagnetic ions and its dependence on magnetic field and chemical environment: implications for NMR imaging. Magn Reson Med. 1984;1:478–95.
Jiang Y, Ma D, Keenan KE, Stupic KF, Gulani V, Griswold MA. Repeatability of magnetic resonance fingerprinting T1 and T2 estimates assessed using the ISMRM/NIST MRI system phantom. Magn Reson Med. 2017;78:1452–7.
Laukien G, Schlüter JZ. Impulstechnishe messungen der spin-gitter und spin-spin-relaxationszeiten von protonen in waessrigen loesungen paramagnetischer Ionen. Physik. 1956;146:113–4.
Hausser R, Laukien G. Messung und deutung der temperaturabhängigkeit der kernmagnetischen protonenrelaxation in wäßrigen lösungen von Ionen der I. Gruppe der Übergangselemente. Z Phys. 1959;153:394–411.
Morgan LO, Nolle AW. Proton spin relaxation in aqueous solutions of paramagnetic ions. II. Cr+++, Mn++, Ni++, cu++, and Gd+++. J Chem Phys. 1959;31:365–8.
Messroghli DR, Moon JC, Ferreira VM, Grosse-Wortmann L, He T, Kellman P, Mascherbauer J, Nezafat R, Salerno M, Schelbert EB, Taylor AJ, Thompson R, Ugander M, van Heeswijk RB, Friedrich MG. Clinical recommendations for cardiovascular magnetic resonance mapping of T1, T2, T2* and extracellular volume: a consensus statement by the Society for Cardiovascular Magnetic Resonance (SCMR) endorsed by the European Association for Cardiovascular Imaging. J Cardiovasc Magn Reson. 2017;19:75–85.
We thank all other members and contributors of the T1MES consortium listed in the Supplementary Material, and we also thank staff at Siemens Healthineers, Philips Healthcare, and General Electric Healthcare for their expert review of the final manuscript. Dr. Matthias Friedrich served as a JCMR Guest Editor for this manuscript.
This program was funded by the following grants to GC: European Association of Cardiovascular Imaging (EACVI) of the European Society of Cardiology (ESC), the UK National Institute of Health Research (NIHR) Biomedical Research Center (BRC) at University College London (UCL, #BRC/199/JM/101320), and the Barts Charity (#1107/2356/MRC0140). GC is supported by the National Institute for Health Research Rare Diseases Translational Research Collaboration (NIHR RD-TRC) and by the NIHR UCL Hospitals Biomedical Research Center. JCM is directly and indirectly supported by the UCL Hospitals NIHR BRC and Biomedical Research Unit at Barts Hospital respectively. This work was supported by the NIHR infrastructure at Leeds.
CBD is supported directly and indirectly by the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health and Social Care.
NIST disclaimer: Contribution of the National Institute of Standards and Technology not subject to copyright in the United States. Certain commercial instruments and software are identified to specify the experimental study adequately. This does not imply endorsement by NIST or that the instruments and software are the best available for the purpose.
PTB disclaimer: Certain commercial instruments and software are identified to specify the experimental study adequately. This does not imply endorsement by PTB or that the instruments and software are the best available for the purpose.
Ethics approval and consent to participate
Consent for publication
An industrial group (Resonance Health) had been funded to mass-produce the described T1MES phantoms to CE mark/FDA standards to permit international distribution in the T1MES academic program. WP is an employee of Resonance Health. AG is an employee of Siemens Healthineers. PH is an employee of Philips Healthcare. No financial or non-financial competing interests exist for any of the other authors.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
High-resolution imaging of phantom 30E017 at baseline (October 2015, Left panel) and at two years post manufacturing (Right panel). (MP4 69,750 kb)
Supplementary file containinglist of Supplementary Tables and Figures referenced in the main text.
Supplementary data file reporting all the center-, session- and sequence-specific T1 mapping contributions to the T1MES program.
About this article
Cite this article
Captur, G., Bhandari, A., Brühl, R. et al. T1 mapping performance and measurement repeatability: results from the multi-national T1 mapping standardization phantom program (T1MES). J Cardiovasc Magn Reson 22, 31 (2020). https://doi.org/10.1186/s12968-020-00613-3
- T1 mapping
- Extracellular volume