MRXCAT2.0: Synthesis of realistic numerical phantoms by combining left-ventricular shape learning, biophysical simulations and tissue texture generation

Buoso, Stefano; Joyce, Thomas; Schulthess, Nico; Kozerke, Sebastian

doi:10.1186/s12968-023-00934-z

Research
Open access
Published: 20 April 2023

MRXCAT2.0: Synthesis of realistic numerical phantoms by combining left-ventricular shape learning, biophysical simulations and tissue texture generation

Stefano Buoso ORCID: orcid.org/0000-0002-1293-3225¹^na1,
Thomas Joyce¹^na1,
Nico Schulthess¹ &
…
Sebastian Kozerke¹

Journal of Cardiovascular Magnetic Resonance volume 25, Article number: 25 (2023) Cite this article

3126 Accesses
3 Citations
2 Altmetric
Metrics details

Abstract

Background

Standardised performance assessment of image acquisition, reconstruction and processing methods is limited by the absence of images paired with ground truth reference values. To this end, we propose MRXCAT2.0 to generate synthetic data, covering healthy and pathological function, using a biophysical model. We exemplify the approach by generating cardiovascular magnetic resonance (CMR) images of healthy, infarcted, dilated and hypertrophic left-ventricular (LV) function.

Method

In MRXCAT2.0, the XCAT torso phantom is coupled with a statistical shape model, describing population (patho)physiological variability, and a biophysical model, providing known and detailed functional ground truth of LV morphology and function. CMR balanced steady-state free precession images are generated using MRXCAT2.0 while realistic image appearance is ensured by assigning texturized tissue properties to the phantom labels.

Finding

Paired CMR image and ground truth data of LV function were generated with a range of LV masses (85–140 g), ejection fractions (34–51%) and peak radial and circumferential strains (0.45 to 0.95 and − 0.18 to − 0.13, respectively). These ranges cover healthy and pathological cases, including infarction, dilated and hypertrophic cardiomyopathy. The generation of the anatomy takes a few seconds and it improves on current state-of-the-art models where the pathological representation is not explicitly addressed. For the full simulation framework, the biophysical models require approximately two hours, while image generation requires a few minutes per slice.

Conclusion

MRXCAT2.0 offers synthesis of realistic images embedding population-based anatomical and functional variability and associated ground truth parameters to facilitate a standardized assessment of CMR acquisition, reconstruction and processing methods.

Background

In-silico phantoms of human cardiovascular anatomy and function provide a versatile tool for the testing and validation of image acquisition, reconstruction and post-processing strategies in cardiovascular magnetic resonance (CMR) [1]. Producing synthetic images from a phantom has the benefit that the resulting images have corresponding anatomical labels and functional ground-truth data, which are useful for the evaluation of the performance of a CMR pipeline. For example, the availability of a paired image-ground truth dataset is essential for a standardized evaluation of image processing tools, such as those for automatic left-ventricular (LV) segmentation, shape and strain analysis.

Available phantoms can be classified into three categories: voxel-based, analytical and hybrid. Voxel-based phantoms consist of labeled voxelised anatomical representations obtained from patients [2, 3]. These are realistic, but do not generalize to population statistics and pathological cases [4]. Analytical phantoms are based on a mathematical description of tissue structures [5]. While they are less realistic, they are more flexible in terms of definition of anatomical variations. Hybrid phantoms have been proposed to overcome the limitations of the previous two categories [6]. Although hybrid and analytical phantoms allow for morphological variation, anatomical and functional variability is mostly limited to healthy cases and function. Veress et al. [7] proposed to couple a hybrid phantom to a biophysical model of the LV to simulate both healthy and infarct conditions. However, as stated by the authors, the fitting process is time consuming and it cannot account for other pathological scenarios, such as cardiomyopathy. More recently, Segars et al. [8] proposed a methodology to couple a full heart functional model to the XCAT phantom. While this allows to simulate realistic cardiac function, it is specific to XCAT and it cannot be rapidly deployed to general pathological cases.

In the last years, solutions based on shape models (SM) (with voxelised or mesh representations) have been proposed to address the need for expressive anatomical descriptions [9,10,11,12,13]. While these works have shown the capability of representing dominant LV anatomical features, they did not focus on the definition of a sampling strategy to generate synthetic anatomies to capture population variability, including both healthy and pathological conditions.

Given an in-silico phantom, two main methodologies for generating CMR images can be identified. In the first approach, the signal is generated using numerical solutions of the physical equations (Bloch equations). This has been applied for cardiac and brain image synthesis [14,15,16,17,18]. In [1, 19] the use of signal models for specific sequences of interest has been proposed to compute the resulting image data. In [20] a dataset for a virtual population with varying acquisition parameters was generated using MRXCAT [1] and used to pre-train a segmentation network, which was subsequently fine-tuned on real images. This approach greatly reduces the amount of in-vivo images required. However, the segmentation performance degraded when there was no fine-tuning on real data as the simulated images were not completely realistic. In [16, 21, 22] it has been shown that synthetic images can be used to augment, and eventually replace, in-vivo datasets for training of neural networks, making realistic image synthesis an important tool for CMR development.

Alternative generative approaches consist of using neural networks for conditional synthesis or style transfer [23,24,25,26,27,28,29,30]. They have been used for several imaging modalities such as ultrasound [31], computed tomography [26] and magnetic resonance imaging [23,24,25, 32]. The reader is referred to [33] for a recent overview of medical image synthesis.

In [26, 34] a factorised representation of images has been proposed, composed of a spatial representation of the anatomy combined with a modality description. The latter describes how tissue structures are rendered in the image. However, the network cannot be used to generate new anatomies and it requires labelled images for training, which are costly to obtain. In [23] unlabelled CMR images were used to learn a multi-tissue anatomical model which was fit to variable anatomies by a learned deformation model. The anatomical model was then used to condition a SPADE-GAN [35] to synthesise an image volume. While this approach solved both issues of the two previous factorised representation learning approaches [26, 34], the anatomical model learnt using the network does not represent conventional tissue classes and is thus not suited as anatomical ground truth. In [24], the XCAT phantom was used as anatomical ground truth semantic labels and MR images were synthesized using a SPADE-GAN. In [36], DatasetGAN, leveraging the generator features of StyleGAN [37], was proposed to produce a large synthetic dataset of images and to also predict pixel-wise class labels. The evaluation of this method has demonstrated that a segmentation network trained with datasets from DatasetGAN outperforms previous semi-supervised methods and is on par with the same network trained fully-supervised on a real dataset. Similarly, SemanticGAN [38] was developed to simultaneously generate both synthetic images and corresponding segmentation labels using StyleGAN2.

While physics-based approaches allow for better control over the parameters related to image generation with respect to style transfer approaches, they produce less realistic appearance. In [39] intra-organ texture for bones and organs was proposed to improve the realism of images generated with signal models. This approach, however, has not yet been applied to CMR image synthesis.

The present work proposes MRXCAT2.0 to address the two main limitations of in-silico phantoms: reduced variability and lack of realism. Realistic LV anatomy and function are generated by coupling a statistical shape description with a biophysical model. Surrounding tissue structures are generated with the XCAT model. Tissue maps of proton density (PD), longitudinal and transverse relaxation times (T1, T2) are assigned to image labels using a neural network trained to maximize the similarity of the background with the target appearance of real CMR images. Synthetic images are then generated using MRXCAT2.0 and used to assess the performance of published CMR processing methods [40, 41] against known ground truth of healthy and pathological cardiac function as a use case.

Methods

The full method of MRXCAT2.0 is schematically shown in Fig. 1. In the figure, red boxes correspond to the parts of the methods that are connected to each other via input/outputs of the black boxes. The final outputs are the synthetic CMR images paired with ground truth data (green box). The starting points are the two inputs: the selection of the (patho)physiological characteristics of anatomy and function and the parameters for the XCAT phantom (blue boxes). The (patho)physiological status is used to define the corresponding anatomy and tissue micro-structure from the statistical shape model and the appropriate physiological parameters (tissue stiffness, pressure loading, myocyte contraction) for the biophysical simulation that generates the image foreground, i.e. the LV shape and its change over the cardiac cycle. The XCAT parameters are used to define the torso anatomy and the displacement field describing the contraction of the other three cardiac chambers. This is referred to as the background of the image. The background tissue masks are warped to match the foreground and the resulting tissue maps are the input to the texturizer for the calculation of tissue properties (PD, T1, T2) and the definition of the final phantom. These properties are used as input to the signal model to generate synthetic CMR images associated with the input parameters and compliant with fundamental LV biomechanics.

Left-ventricular population shape model

The LV SM was defined using the anatomies from the Multi-Modal Whole Heart Segmentation (MMWHS) dataset [42,43,44] as built in our recent work [13].

A convolutional variational autoencoder (VAE) [45] was used to identify a low-rank representation of epicardium and endocardium coordinates (see Additional file 1 for details). The network structure is shown in Fig. 2. Each variable of the low-rank representation is associated with a normal Gaussian probability distribution, which is sampled to generate synthetic realistic endocardial and epicardial shapes from which it is possible to generate a volumetric mesh [13].

The expressiveness of the SM was assessed using an additional dataset, the Automated Cardiac Diagnosis Challenge (ACDC) [46]. End-systolic images were meshed using our recently published method [12] and the accuracy of the reconstruction with the SM was evaluated as the average distance between corresponding endocardial and epicardial points in the original and reconstructed meshes. Additionally, a k-means clustering algorithm [47] was used on the latent space representation of the ACDC meshes with three target clusters to identify sampling regions of the latent space for the (patho)physiological conditions labelled in the ACDC dataset (healthy (NOR), dilated (DCM), hypertrophic (HCM)). Classification accuracy of healthy, DCM and HCM was evaluated against the clinical labels. The centres of the clusters were then used as reference anatomies for showcasing the method proposed in this work.

Cardiac functional model

The biophysical model for the LV is based on our previous work on cardiac mechanics [13] and material modelling [48,49,50,51]. A technical description is presented in the Supplementary material and in [13].

The response of the LV to the systemic pressure loading depends on the contribution of a passive and an active component. The passive component was described by the Holzapfel-Ogden model [52] defined as a function of tissue shear moduli and fiber orientations. In our approach, the evolution of the active contribution was simulated as in [53,54,55]. In the model, the pericardial sac was simulated by allowing for longitudinal motion of the points but constraining epicardial radial displacement. Endocardial pressure was simulated by coupling the ventricular model to a simplified lumped-parameter model of systemic circulation [13].

Image foreground generation

The functional model was personalized to physiological and pathological conditions of interest for the generation of the image foreground, i.e. to simulate ground-truth cardiac function. In a first step, a synthetic anatomy was sampled from the corresponding cluster (e.g. NOR, DCM, HCM). Then, material properties, fiber orientations and maximum active stress were selected to describe the target cardiac function. The LV micro-structure was defined using linear transmural laws as in [13].

Reference healthy values for the passive tissue response and potential propagation velocities were taken from [56], while values for DCM and HCM cases were obtained by defining the material coefficients between 5 and 10 times larger than in the normal case [57].

Anatomical details can be further modified by adding localized anatomical defects to any of the geometries thanks to the physiological parametrization associated with the shapes. The corresponding variations of wall thickness, mechanical and electrophysiological parameters were automatically adjusted, gradually transitioning from healthy to diseased tissue (see Additional file 1). In this work an elliptical scar at the free wall was considered, but any approach could be adopted here.

Image background generation

The shape and functional models described in the previous sections were used to generate time-resolved 3D LV meshes that were voxelised and sliced to produce the corresponding LV tissue masks. These were then augmented by including tissue labels for the right ventricle (RV), atria and other organs using the XCAT software [6].

Each two-dimensional (2D) slice generated with the XCAT phantom was warped such that the LV epicardial contours from XCAT matched those of the epicardium from the masks generated by LV deformations. The surrounding tissue was deformed accordingly by smooth interpolation. The approach can also account for breathing motion from XCAT. Details are presented in the Additional file 1. This coupling approach does not require modifications to the XCAT code (essentially, a self-contained additional step is added between anatomy and image generation) and, hence, keeps all functionalities of the software.

Tissue properties definition

A neural network was used to assign textured tissue properties to the many-tissue maps combining foreground and background. A dataset of paired many-class segmentation masks and corresponding tissue-property images (i.e. images where PD, T1, and T2 values are known for every pixel) is required to train such a network. To our knowledge, there is no large dataset of tissue-property images available (even ignoring the requirement of corresponding many-class segmentation masks). Such a paired dataset was therefore synthetically generated and then used for training.

First, a CMR generative model (CMRGenNet) based on StyleGAN2 with Adaptive Discriminator Augmentation [37, 58] was trained on the ACDC dataset. Then, using the method proposed in DatasetGAN [36], the CMRGenNet was augmented with an additional branch to produce many-class labels for all generated images (see Fig. 3, technical details in the Additional file 1).

The CMRGenNet was used to generate a dataset of 8640 synthetic CMR images and corresponding 12-class segmentation masks, which were then utilized to train the MultiClassNet, a UNet [59], to predict multi-class segmentation masks from real CMR images. The MultiClassNet was then used to perform multi-class segmentation on end-systolic (ES) and end-diastolic (ED) images from the ACDC dataset.

For these segmentations, PD, T1 and T2, maximising the similarity with the corresponding image, were computed using an analytic closed-form expression of the balanced steady-state free precession (bSSFP) sequence, used to acquire the ACDC dataset (details in the Additional file 1). The same equation is used in the MRXCAT software.

This process yielded a paired dataset of 1800 parameter maps produced directly from the segmentation masks, and the corresponding detailed texture maps produced by the optimisation. As a final step, the texturizer, TextNet, was trained to map initialized uniform parameter maps to textured parameter maps. The TextNet architecture was based on a UNet.

The final images were post-processed such that any texture was removed from the LV myocardium. This is justified by the relative uniformity of image signal in the myocardium of real CMR images and the need for removing tissue property variations at the border of the myocardium resulting from partial-voluming effects. All tissue properties were then warped according to label deformations over the cardiac cycle to preserve the consistency of the anatomical details of the images.

Synthetic CMR image generation

The resulting anatomical phantoms with corresponding texturized tissue properties were used to generate cine CMR images in MRXCAT [1]. For the use cases presented here, 2D bSSFP acquisition parameters were: repetition time TR = 3.0 ms, echo time TE = 1.5 ms, flip angle of 60°, and a signal-to-noise ratio (SNR) of 30. Eight surface coils and a Cartesian trajectory were simulated. The signal of the image was generated using the closed-form expression of the bSSFP signal equation implemented in MRXCAT, which assumes steady-state properties. As a final note, we highlight that the tissue phantom was generated at higher resolution than the target image resolution to accommodate partial voluming effects due to the limited bandwidth of the CMR encoding process.

DeepStrain analysis

The paired ground-truth and images data generated in this work were used as input to the DeepStrain framework [40, 41]. DeepStrain leverages a network for segmentation (CarSON) and one for cardiac motion estimation (CarMEN). The networks were trained on the ACDC datasets, which were also used in our work to define TextNet. CarSON and CarMEN predictions were used as input to an additional network that computes the corresponding LV strain. To be used in these networks, the images of this work were intensity-normalized and resampled to an isotropic in-plane resolution of 1.25 mm and a total number of 16 slices. They were then cropped around the LV mask to obtain 128 × 128 × 16 pixel images.