 Research
 Open access
 Published:
Fully‑automated deep‑learning segmentation of pediatric cardiovascular magnetic resonance of patients with complex congenital heart diseases
Journal of Cardiovascular Magnetic Resonance volume 22, Article number: 80 (2020)
Abstract
Background
For the growing patient population with congenital heart disease (CHD), improving clinical workflow, accuracy of diagnosis, and efficiency of analyses are considered unmet clinical needs. Cardiovascular magnetic resonance (CMR) imaging offers noninvasive and nonionizing assessment of CHD patients. However, although CMR data facilitates reliable analysis of cardiac function and anatomy, clinical workflow mostly relies on manual analysis of CMR images, which is time consuming. Thus, an automated and accurate segmentation platform exclusively dedicated to pediatric CMR images can significantly improve the clinical workflow, as the present work aims to establish.
Methods
Training artificial intelligence (AI) algorithms for CMR analysis requires large annotated datasets, which are not readily available for pediatric subjects and particularly in CHD patients. To mitigate this issue, we devised a novel method that uses a generative adversarial network (GAN) to synthetically augment the training dataset via generating synthetic CMR images and their corresponding chamber segmentations. In addition, we trained and validated a deep fully convolutional network (FCN) on a dataset, consisting of \(64\) pediatric subjects with complex CHD, which we made publicly available. Dice metric, Jaccard index and Hausdorff distance as well as clinicallyrelevant volumetric indices are reported to assess and compare our platform with other algorithms including UNet and cvi42, which is used in clinics.
Results
For congenital CMR dataset, our FCN model yields an average Dice metric of \(91.0\mathrm{\%}\) and \(86.8\mathrm{\%}\) for LV at enddiastole and endsystole, respectively, and \(84.7\mathrm{\%}\) and \(80.6\mathrm{\%}\) for RV at enddiastole and endsystole, respectively. Using the same dataset, the cvi42, resulted in \(73.2\mathrm{\%}\), \(71.0\mathrm{\%}\), \(54.3\mathrm{\%}\) and \(53.7\mathrm{\%}\) for LV and RV at enddiastole and endsystole, and the UNet architecture resulted in \(87.4\mathrm{\%}\), \(83.9\mathrm{\%}\), \(81.8\mathrm{\%}\) and \(74.8\mathrm{\%}\) for LV and RV at enddiastole and endsystole, respectively.
Conclusions
The chambers’ segmentation results from our fullyautomated method showed strong agreement with manual segmentation and no significant statistical difference was found by two independent statistical analyses. Whereas cvi42 and UNet segmentation results failed to pass the ttest. Relying on these outcomes, it can be inferred that by taking advantage of GANs, our method is clinically relevant and can be used for pediatric and congenital CMR segmentation and analysis.
Background
Congenital heart diseases (CHDs) are the most common among the birth defects [1]. It is currently estimated that \(83\mathrm{\%}\) of newborns with CHD in the U.S. survive infancy [2]. These patients require routine imaging follow ups. Cardiovascular magnetic resonance (CMR) imaging is the imaging modality of choice for assessment of cardiac function and anatomy in children with CHD. Not only does CMR deliver images with high spatial and acceptable temporal resolution, but also it is noninvasive and nonionizing [3, 4]. On the other hand, CMR analysis in pediatric CHD patients is among the most challenging, timeconsuming, and operatorintensive clinical tasks.
Presently, artificial intelligence (AI) and particularly deeplearning show strong promise for automatic segmentation of CMR images [5,6,7,8]. While the current AIbased methods have been successfully used for delineating the adult heart disease, they are not yet reliable for segmenting the CMR images of CHD patients, and particularly in children [8, 9]. The foremost basis for this shortcoming is the anatomical heterogeneity and lack of large CMR databases that include data from a diverse group of CHD subjects acquired by diverse scanners and pulse sequences. As indicated by Bai et al. [7], a major limitation of the existing learning methods is the use of homogeneous datasets where the majority of the CMR data are from adult subjects with healthy or closely mimicking healthy hearts, e.g., the Second Annual Data Science Bowl [10] and UK CMR Biobank [11], among others [12, 13].
Training neural networks requires a large set of data that does not currently exist for complex CHD subjects. Another limitation is overfitting, especially over training, to image patterns in a specific dataset that includes images from the same scanner model/vendor, as also reported by Bai et al. [7]. Dealing with limited data is a major challenge in designing effective neural networks for pediatric CMR, particularly for CHD subjects, and necessitates innovative approaches [9].
Among the learningbased algorithms, supervised deeplearning is currently considered the stateoftheart for CMR segmentation [14]. Nevertheless, major limitations of deeplearning methods are their dependency on the number of manuallyannotated training data [15]. Small datasets can incur a large bias, which makes these methods ineffective and unreliable when the heart shape is outside the learning set, as frequently observed in CHD subjects.
To mitigate the need for large datasets of manuallyannotated CHD data, in this study, we employ a Deep Convolutional Generative Adversarial Network (DCGAN) [16] that generates synthetically segmented CMR images and further enriches the training data beyond the classical affine transformations. DCGAN has enabled our deeplearning algorithms to successfully and accurately segment CMR images of complex CHD subjects beyond the existing AI methods.
Methods
Dataset
Our dataset includes \(64\) CMR studies from pediatric patients with an age range of \(2\) to \(18\) scanned at the Children’s Hospital Los Angeles (CHLA). The CMR dataset includes scans from patients with Tetralogy of Fallot (TOF; \(\mathrm{n }= 20\)), Double Outlet Right Ventricle (DORV; \(\mathrm{n }= 9\)), Transposition of the Great Arteries (TGA; \(\mathrm{n }= 9\)), Cardiomyopathy (\(\mathrm{n }= 8\)), Coronary Artery Anomaly (CAA; \(\mathrm{n }= 9\)), Pulmonary Stenosis or Atresia (\(\mathrm{n }= 4\)), Truncus Arteriosus (\(\mathrm{n }= 3\)), and Aortic Arch Anomaly (\(\mathrm{n }= 2\)). All TGA cases were Dtype but had been repaired with an arterial switch operation. The study was reviewed by the Children’s Hospital Los Angeles Institutional Review Board and was granted an exemption per 45 CFR 46.104[d] [4][iii] and a waiver of HIPAA authorization per the Privacy Rule (45 CFR Part 160 and Subparts A and E of Part 164).
CMR studies
Imaging studies were performed on either a 1.5 T (Achieva, Philips Healthcare, Best, the Netherlands) or at 3 T (Ingenia, Philips Healthcare). CMR images for ventricular volume and function analysis were obtained using a standard balanced steady state free precession (bSSFP) sequence without any contrast. Each dataset includes \(12  15\) shortaxis slices encompassing both right ventricle (RV) and left ventricle (LV) from base to apex with \(20  30\) frames per cardiac cycle. Typical scan parameters were slice thickness of \(6  10\mathrm{mm}\), inplane spatial resolution of \(1.5  2{\mathrm{mm}}^{2}\), repetition time of \(3  4\mathrm{ms}\), echo time of \(1.5  2\mathrm{ms}\), and flip angle of \(60\) degrees. Images were obtained with the patients free breathing; \(3\) signal averages were obtained to compensate for respiratory motion. Manual image segmentation was performed by a boardcertified pediatric cardiologist subspecialized in CMR with experience consistent with Society for Cardiovascular Magnetic Resonance (SCMR) Level 3 certification. Endocardial contours were drawn on enddiastolic and endsystolic images. Ventricular volumes and ejection fraction were then computed from these contours. Manual annotations were performed according to SCMR guidelines with cvi42 software (Circle Cardiovascular Imaging, Calgary, Alberta, Canada) without the use of automated segmentation tools. The ventricular cavity in the basal slice was identified by evaluating wall thickening and cavity shrinking in systole.
Postprocessing of CMR data
The original image size was \(512\times 512\) pixels. The original dataset was first preprocessed by centercropping each image to the size of \(445\times 445\) to remove patients’ identifiers. Subsequently, all images were examined to ensure that both the heart and segmentation mask are present. To reduce the dimensionality, each cropped image was subsequently resized to \(128\times 128\) using the imresize function in the opensource Python library SciPy. The entire process was performed using two different downsampling methods: (1) nearestneighbor downsampling and (2) bicubical downsampling. For training data, twentysix patients (\(10\) TOFs, \(4\) DORVs, \(4\) TGAs, \(4\) CAAs and \(4\) cardiomyopathy patients) were randomly selected whereas the remaining \(38\) patients were used as test data.
Image segmentation using fully convolutional networks
A fully convolutional network (FCN), in comparison with a Unet [17] and cvi42, was used for automated pixelwise image segmentation. Convolutional networks are a family of artificial neural networks that are comprised of a series of convolutional and pooling layers in which the data features are learned in various levels of abstraction. These networks are mostly useful when the data is either an image or a map such that the proximity among pixels represents how associated they are. Examples of FCNs used for segmenting healthy adult CMR images include [7, 18]. While these FCNs yield good segmentation accuracy for healthy adult CMR images, they show poor performance on CHD subjects [7]. Inspired by the “skip” architecture used by Long et al. [19] and the FCN model introduced by Tran [18], we designed a novel \(19\) layer FCN for an automated pixelwise image segmentation in CHD subjects.
FCN architecture
The design architecture of our \(19\) layer FCN model and the number of filters for each convolution layer are specified in Fig. 1; four maxpooling layers with pooling size of \(3\) are employed to reduce the dimension of the previous layer’s output. Fine and elementary visual features of an image, e.g., the edges and corners, are learned in the network’s shallow layers whereas the coarse semantic information is generated over the deeper layers. These coarse and fine features are combined to learn the filters of the upsampling layers, which are transposed convolution layers with the kernel size of \(4\). The FCN’s input is a \(128\times 128\) image and the network’s output is a \(128\times 128\) dense heatmap, predicting class membership for each pixel of the input image. The technical details of the FCN architecture are fully described in the Appendix.
Despite incorporating \({l}_{2}\) regularization and dropout in the FCN architecture, as explained in the Appendix, overfitting was still present due to the lack of a large set of annotated training data. A standard solution to this problem is to artificially augment the training dataset using various known image transformations [20]. Classic data augmentation techniques include affine transformations such as rotation, flipping, and shearing [21]. To conserve the characteristics of the heart chambers, only rotation and flipping were used and the transformations such as shearing that instigate shape deformation were avoided. Each image was first rotated \(10\) times at the angles \(\theta =\left[{0}^{^\circ }, {20}^{^\circ },{40}^{^\circ }, ..., {180}^{^\circ }\right]\). Subsequently, each rotated image either remained the same or flipped horizontally, vertically, or both. As a result of this augmentation, the number of training data was multiplied by a factor of \(10\times 4=40\).
FCN training procedure
The dataset was randomly split into training/validation with the ratio of \(0.8/0.2\). The validation set was used to provide an unbiased performance estimate of the final tuned model when evaluated over unseen data. Each image was then normalized to zeromean and unitvariance. Network parameters were initialized according to the Glorot’s uniform scheme [22].
To learn the model parameters, stochastic gradient descent (SGD) with learning rate of \(0.002\) and moment of \(0.9\) was used to accelerate SGD in the relevant direction and dampen oscillations. To improve the optimization process, Nesterov moment updates [23] were used for assessing the gradient at the “lookahead” position instead of the current position. The network was trained using a batch size of \(5\) for \(450\) epochs, i.e., passes over the training dataset, to minimize the negative dice coefficient between the predicted and manual groundtruth segmentation.
Deep convolutional generative adversarial networks to synthesize CMR images
While classic data augmentation techniques increased the number of training data by a factor of \(40\), it did not solve the overfitting issue. To mitigate that, generative adversarial networks (GANs) were used to artificially synthesize CMR images and their corresponding chambers’ segmentation. GANs are a specific family of generative models used to learn a mapping from a known distribution, e.g., random noise, to the data distribution.
A DCGAN was designed to synthesize CMR images to augment the training data. The architecture of both generator and discriminator networks along with their training procedures are described next.
DCGAN architecture
The generator’s architecture is shown in Fig. 2. The input to the generator network is a random noise \(z\in {\mathbb{R}}^{100}\) drawn from a standard normal distribution \(\mathcal{N}\left(0,\mathbf{I}\right)\). The input is passed through six 2D transposed convolution, also known as fractionallystrided convolution, layers with kernel size of \(4\times 4\) to upsample the input into a \(128\times 128\) image. In the first transposed convolution layer, a stride of \(1\) pixel is used while a stride of \(2\) pixels is applied to the crosscorrelation in the remaining layers. The number of channels for each layer is shown in Fig. 2. All 2D transposed convolution layers except the last one are followed by a rectified linear unit (ReLU) layer. The last layer is accompanied by a Tanh activation function. The generator network’s output includes two channels where the first is used for the synthetic CMR image and the second contains the corresponding chamber’s segmentation mask.
The discriminator network’s architecture is a deep convolutional neural network (CNN) as shown in Fig. 2. The discriminator network’s input is a \(2\times 128\times 128\) image whose output is a scalar representing the probability that the input is a real pair of image with its corresponding segmentation mask. The model includes six 2D convolution layers with kernel size of \(4\times 4\) and stride of \(2\) pixels except for the last layer for which a \(1\) pixel stride value is used. The number of channels for each convolution layer is shown in Fig. 2. All layers except the last one are followed by a Leaky ReLU layer with negative slope value of \(0.2\). The last layer is accompanied by a sigmoid function.
DCGAN training procedure
The training data was normalized to zeromean and unitvariance to stabilize the DCGAN learning process. Each training sample was then rotated \(19\) times at angles \(\theta =\left[{0}^{^\circ },{10}^{^\circ },{20}^{^\circ },...,{180}^{^\circ }\right]\) while each rotated image either remained the same or flipped horizontally, vertically or both. As a result of this augmentation process, the number of training data was multiplied by a factor of \(19\times 4=76\).
The DCGAN’s two known issues are mode collapse and gradient vanishing [24]. Mode collapse attributes to the case in which too many values of the input noise are mapped to the same value in the data space. This happens when the generator is overtrained with respect to the discriminator. Alternatively, gradient vanishing refers to the situation in which the discriminator becomes too successful in distinguishing the real from synthetic images with no gradient is backpropagated to the generator. In this case, the generator network cannot learn to generate synthetic images that are similar to the real images. To address these concerns, first, the network parameters were initialized according to a Gaussian distribution with zeromean and variance of \(0.02\). To learn the network parameters, Adam optimizer [25] was used for both generator and discriminator networks. Additional information is provided in the Appendix. Each iteration of the learning procedure included the following two steps:
First, a single optimization step was performed to update the discriminator: A batch of \(5\) real image samples and their corresponding segmentation masks from the training data was randomly selected. Label \(1\) was assigned to them since they are real samples. These pairs of real images and their masks were then passed through the discriminator network and the gradient of the loss, i.e., the binary cross entropy between predicted and true labels, was backpropagated to accordingly adjust the discriminator weights. Then, a batch of five noise samples was drawn from the standard normal distribution and passed through the generator network to create five pairs of images and their corresponding masks. These pairs were then labeled with \(0\) since they were synthetic samples. This batch of synthetic data was then passed through the discriminator and the gradient of the loss was backpropagated to finetune the discriminator weights.
Second, an additional optimization step was performed to update the generator: Each pair of synthetic image and its corresponding segmentation mask from the previous step was labeled \(1\) to mislead the discriminator and create the perception that the pair is real. These samples were then passed through the discriminator and the gradient of the loss was backpropagated to adjust the generator weights.
In summary, in the first step, the discriminator was finetuned while the generator was unchanged, and in the second step, the generator was trained while the discriminator remained unchanged. The training process continued for \(40,000\) iterations, or until the model converged and an equilibrium between the generator and discriminator networks was established.
DCGAN postprocessing
The pixel value in each real mask is either \(1\) or \(0\) implying whether each pixel belongs to one of the ventricles or not. Therefore, the value of each pixel in a synthesized chamber mask was quantized to \(0\) when it was less than \(0.5\) and rounded up to \(1\) otherwise. To avoid very small or large mask areas, only the synthetic samples for which the ratio of the mask area to the total area was within a certain range were retained. For nearestneighbor downsampling, the range was between \(0.005\) and \(0.025\) while for the bicubical downsampling, the range was between \(0.02\) and \(0.05\). Finally, the connected components in each binary mask were located using the MATLAB (Mathworks, Natick, Massachusetts, USA) function bwconncomp. If there were more than one connected component and the ratio of the area of the largest component to the second largest component was less than \(20\), that pair of image and mask would be removed from the set of syntheticallygenerated data.
Network training and testing
Fully convolutional networks using real dataset
For each chamber, one FCN was trained on the CMR images of \(26\) patients and their augmentation via geometric transformations. Each model was jointly trained on both enddiastolic (ED) and endsystolic (ES) images for each heart chamber. These networks are called LVFCN and RVFCN in the results section.
Fully convolutional networks using synthetically augmented dataset
Two separate DCGAN models were designed for LV and RV to further augment the training data. The designed DCGAN was used to generate \(6,000\) pairs of synthetic images and their corresponding segmentation masks. Applying the DCGAN postprocessing step, a set of \(2,500\) synthetic images, out of the \(6,000\) generated pairs, was used for each chamber. Each of the \(2,500\) selected images was then either remained the same, or flipped horizontally, vertically, or rotated \(4\) times at angles \(\theta =\left[{45}^{^\circ },{90}^{^\circ },{135}^{^\circ },{180}^{^\circ }\right]\). Thus, \(\mathrm{2,500}\times 7=\mathrm{17,500}\) synthetic CMR images and their corresponding segmentation masks were generated for each ventricle. Finally, our synthetically augmented repertoire included the CMR images of \(26\) patients and their augmentation via geometric transformations plus the generated \(17,500\) synthetic CMR images. Using this synthetically augmented dataset, another FCN was trained for each chamber. Each model was jointly trained on both ED and ES images. The networks designed using the synthetically augmented dataset (SAD) are called LVFCNSAD and RVFCNSAD in the results section.
UNet architecture
In addition to our network architecture described above, a traditional UNet model was designed to compare its results with those of our designed FCN. For this purpose, a customized UNet architecture with the input size of \(128\times 128\) was used. The architecture of the UNet model is shown in Fig. 3 and its code is available at https://github.com/karolzak/kerasunet. Similar to the case of our FCN, for each chamber, a network was trained on the training set of \(26\) patients and its augmentation via geometric transformations. In the results section, these networks are referred to as LVUNet and RVUNet. For each chamber, another network was trained on the synthetically segmented CMR images, as was used for designing FCNSAD. These networks are referred to as LVUNetSAD and RVUNetSAD. Each network was jointly trained on both ED and ES images for each chamber.
Commercially available segmentation software
The results generated by our models were compared with the results from cvi42 (Circle Cardiovascular Imaging Inc) on our test set that included CMR images from \(38\) patients. All volumetric measures were calculated using OsiriX Lite software (Pixmeo, Bernex, Switzerland). To calculate the volume, small modifications were applied to the open source plugin available at https://github.com/chrischute/numpy2roi to make the format consistent with our dataset. The segmented CMR images were converted into OsiriX’s .roi files using the modified plugin. The resulted .roi files were imported to the OsiriX Lite software for volume calculation through its builtin 3D construction algorithm.
Our method was developed using the Python 2.7.12 and performed on a workstation with Intel(R) Core (TM) i7 − 5930 K CPU 3.50 GHz with four NVIDIA GeForce GTX 980 Ti GPUs, on a 64 − bit Ubuntu platform.
Metrics for performance verification
Our results were compared headtohead with UNet and cvi42. Two different classes of metrics are used to compare the performance of cardiac chamber segmentation methods.
One class uses the clinical indices, such as volumetric data that are crucial for clinical decision making. These indices may not identify the geometric pointbypoint differences between automated and manually delineated segmentations.
Another class of indices uses geometric metrics that indicate how mathematically close the automatic segmentation is to that of the groundtruth. These include the average Dice metric, Jaccard index, Hausdorff distance (HD) and mean contour distance (MCD).
Generalizability to additional training and test subjects
To evaluate the generalizability of our framework on subjects not included in our dataset, our method was tested on the 2017 MICCAI’s Automated Cardiac Diagnosis Challenge (ACDC). The ACDC dataset includes \(100\) subjects: (i) healthy (\(n = 20\)); (ii) previous myocardial infarction (\(n = 20\)); (iii) dilated cardiomyopathy (\(n = 20\)); (iv) hypertrophic cardiomyopathy (\(n = 20\)); and (v) abnormal RV (\(n = 20\)). For a consistent image size, five subjects were removed and the remaining \(95\) subjects were zeropadded to \(256\times 256\), and then downsampled to \(128\times 128\) using nearestneighbor downsampling method. Three subjects from each group were randomly selected as training data and the remaining \(80\) subjects were left as the test data.
For each chamber, one FCN was trained on the combined CMR images of both training sets, i.e. \(26\) patients from our dataset and \(15\) from the ACDC dataset, and their augmentation via geometric transformations. For each heart chamber, another FCN is trained on the dataset that is further augmented via previously generated set of synthetically segmented CMR images. Each model was jointly trained on both ED and ES images for each heart chamber. The first and second segmentation networks are referred to as FCN2.0 and FCNSAD2.0, respectively. FCN2.0 and FCNSAD2.0 were evaluated on the combined set of test subjects, i.e. \(38\) patients from our dataset and \(80\) patients from the ACDC dataset.
Statistical methods
Paired student ttest and intraclass correlation coefficient (ICC) were used for statistical analysis of predicted volumes. The pvalue for the paired student ttest can be interpreted as the evidence against the null hypothesis that predicted and groundtruth volumes have the same mean values. A pvalue greater than \(0.05\) is considered as passing the statistical hypothesis testing. The intraclass correlation coefficient describes how strongly the measurements within the same group are similar to each other. The intraclass correlation first proposed by Fisher et al. [26] was used. It focuses on the paired predicted and groundtruth measurements. The guidelines proposed by Koo and Li [27] were used to interpret the ICC values, as defined below: (a) less than \(0.5\): poor; (b) between \(0.50\) and \(0.75\): moderate; (c) between \(0.75\) and \(0.90\): good; and (d) more than \(0.90\): excellent.
Results
Characteristics of the cohort are reported first. Then, our synthetically generated CMR images and the corresponding automatically generated segmentation masks are presented. Different performance metrics and clinical indices for our fully automatic method compared to those of manual segmentation (groundtruth) are reported. In addition, the same indices calculated by cvi42 software and UNet are presented for headtohead performance comparison.
Characteristics of the Cohort
Characteristics of the cohort are reported in Tables 1 and 2. All chamber volumes in these tables are calculated based on the manual delineation.
Real and synthetically generated CMR images
A sample batch of real CMR images, including their manually segmented LV masks is compared with a sample batch of synthetically generated CMR images with their corresponding automaticallygenerated LV masks in Fig. 4. Similar comparison is made for RV in Fig. 5.
Segmentation performance
As mentioned in the method section, two separate downsampling methods–nearestneighbor and bicubical–were practiced and their training/testing were independently performed. The results for both methods are reported here:
Segmentation performance for nearestneighbor downsampling
The average Dice metric, Jaccard index, Hausdorff distance (HD), mean contour distance (MCD) and coefficient of determination \({R}_{vol}^{2}\) for FCN and FCNSAD computed based on the groundtruth are reported in Table 3.
The Dice metrics for FCN method were \(86.5\mathrm{\%}\), \(83.2\mathrm{\%}\), \(80.3\mathrm{\%}\) and \(74.7\mathrm{\%}\) for LVED, LVES, RVED and RVES, respectively. The corresponding Dice metrics for FCNSAD method were \(90.6\mathrm{\%}\), \(85.0\mathrm{\%}\), \(84.4\mathrm{\%}\) and \(79.2\mathrm{\%}\), respectively.
Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) are summarized in Table 4.
For both methods, average absolute and average relative deviation of the automatically segmented volumes from manuallysegmented volumes, stroke volumes and ejection fractions are reported in Table 5. A smaller deviation indicates better conformity between automatically and manually derived contours.
The ranges of LV enddiastolic volume (LVEDV), LV endsystolic volume (LVESV), LV stroke volume (LVSV) and LV ejection fraction (LVEF) for the \(38\) test subjects were (\(10\,\mathrm{mL}\) to \(202\;\mathrm{mL}\)), (\(4\;\mathrm{mL}\) to \(91\;\mathrm{mL}\)), (\(6\;\mathrm{mL}\) to \(128\;\mathrm{mL}\)) and (\(30\mathrm{\%}\) to \(75\mathrm{\%}\)), respectively. The ranges of RV enddiastolic volume (RVEDV), endsystolic volume (RVESV), stroke volume (RVSV) and ejection fraction (RVEF) for the \(38\) test subjects were (\(20\;\mathrm{mL}\) to \(265\;\mathrm{mL}\)), (\(6\;\mathrm{mL}\) to \(130\;\mathrm{mL}\)), (\(12\;\mathrm{mL}\) to \(138\;\mathrm{mL}\)) and (\(32\mathrm{\%}\) to \(84\mathrm{\%}\)), respectively.
The pvalues for the paired sample ttest of LVEDV, LVESV, RVEDV and RVESV to test the null hypothesis that predicted and groundtruth volumes have identical expected values are tabulated in Table 6. A pvalue greater than \(0.05\) is considered as passing the ttest and is boldfaced in Table 6. The ICC values for the paired predicted and groundtruth values of LVEDV, LVESV, RVEDV and RVESV are listed in Table 6. An ICC value greater than \(0.90\) is considered as an excellent agreement and is boldfaced in Table 6.
Exemplary LV and RV segmentations at ES and ED are shown in Fig. 6. Red contours correspond to the groundtruth (i.e., manual annotation) whereas green and yellow contours correspond to the predicted delineations by FCN and FCNSAD methods, respectively.
The correlation and Bland–Altman plots are shown in Figs. 7, 8, 9 and 10. The FCNSAD results are depicted by blue dots. As shown in Figs. 7 and 8, the points deviated from the line \(y=x\) are due to the mismatch between prediction and groundtruth. The Bland–Altman diagrams are commonly used to evaluate the agreement among clinical measures and identifying any systematic difference (i.e., fixed bias, outliers etc.). The bias values of the FCN for LVEDV, LVESV, RVEDV and RVESV were \(3.9\mathrm{mL}\), \(3.0\mathrm{mL}\), \(8.9\mathrm{mL}\) and \(3.3\mathrm{mL}\), respectively, whereas the bias values of the FCNSAD for LVEDV, LVESV, RVEDV and RVESV were \(1.9\mathrm{mL}\), \(0.5\mathrm{mL}\), \(3.1\mathrm{mL}\) and \(0.8\mathrm{mL}\), respectively. The \(95\mathrm{\%}\) confidence interval of difference between automatic segmentation and groundtruth is shown as dashed lines representing \(\pm 1.96\) standard deviation.
Segmentation performance for bicubical downsampling
The results for the bicubical downsampling method are reported in Table 7. FCNSAD method’s Dice metrics for LVED, LVES, RVED and RVES were \(91.0\mathrm{\%}\), \(86.8\mathrm{\%}\), \(84.7\mathrm{\%}\) and \(80.6\mathrm{\%}\), respectively. The FCNSAD’s ttest pvalues for LVED, LVES, RVED and RVES are \(0.27\), \(0.09\), \(0.08\), and \(0.66\), respectively. FCNSAD method unequivocally passes the paired sample ttest for LV and RV at both ED and ES phases.
The correlation and Bland–Altman plots for ES and ED ventricular volumes, ejection fractions and stroke volumes for the bicubical downsampling method are depicted in Figs. 11, 12, 13 and 14.
Segmentation performance for cvi42
The cvi42associated Dice metrics were \(73.2\mathrm{\%}\), \(71.0\mathrm{\%}\), \(54.3\mathrm{\%}\) and \(53.7\mathrm{\%}\) for LVED, LVES, RVED and RVES, respectively. The corresponding sensitivity, specificity, PPV and NPV are summarized in Table 4. The absolute and relative deviations of automatically from manuallysegmented results for LV and RV volumes at ED and ES as well as SV and EF are summarized in the third column of Table 5.
The correlation and Bland–Altman plots for cvi42 are shown by green dots in Figs. 7, 8, 9 and 10. The bias values of the cvi42 for LVEDV, LVESV, RVEDV and RVESV were \(3.9\,\mathrm{mL}\), \(3.7\,\mathrm{mL}\), \(30.2\,\mathrm{mL}\) and \(8.1\,\mathrm{mL}\), respectively.
Segmentation performance for UNet with nearestneighbor downsampling
Simulations were carried out on the images that were downsampled using nearestneighbor method. The average Dice metric, Jaccard index, Hausdorff distance, mean contour distance, and \({R}_{vol}^{2}\) for UNet and UNetSAD computed based on the groundtruth are reported in Table 3.
The Dice metrics for UNet method were \(84.5\mathrm{\%}\), \(79.4\mathrm{\%}\), \(77.7\mathrm{\%}\) and \(71.3\mathrm{\%}\) for LVED, LVES, RVED and RVES, respectively. The corresponding Dice metrics for UNetSAD method were \(87.1\mathrm{\%}\), \(82.3\mathrm{\%}\), \(81.8\mathrm{\%}\) and \(74.8\mathrm{\%}\), respectively.
Sensitivity, specificity, PPV and NPV for UNet and UNetSAD are summarized in Table 4.
The absolute and relative difference between predicted and groundtruth volumes for LV and RV chambers at ED and ES as well as SV and EF are summarized in the last two columns of the Table 5.
The correlation and Bland–Altman plots for UNetSAD are shown by red dots in Figs. 7, 8, 9 and 10. The bias values of the UNet for LVEDV, LVESV, RVEDV and RVESV were \(7.2\mathrm{mL}\), \(4.2\mathrm{mL}\), \(8.4\mathrm{mL}\) and \(5.4\mathrm{mL}\), respectively. The corresponding bias values of UNetSAD for LVEDV, LVESV, RVEDV and RVESV were \(3.6\mathrm{mL}\), \(2.8\mathrm{mL}\), 7.0 \(\mathrm{mL}\), and \(4.5\mathrm{mL}\), respectively.
Segmentation performance for UNet with bicubical downsampling
Using the images that were downsampled according to the bicubical method, the average Dice metric, Jaccard index, relative volume difference and \({R}_{vol}^{2}\) for UNet and UNetSAD calculated based on the groundtruth are reported in Table 7.
The Dice metrics for UNet method were \(85.5\mathrm{\%}\), \(81.6\mathrm{\%}\), \(76.5\mathrm{\%}\) and \(70.2\mathrm{\%}\) for LVED, LVES, RVED and RVES, respectively. The corresponding Dice metrics for UNetSAD method were \(87.4\mathrm{\%}\), \(83.9\mathrm{\%}\), \(81.8\mathrm{\%}\), and \(74.8\mathrm{\%}\), respectively.
Segmentation performance for FCN2.0 and FCNSAD2.0
To avoid conflict with the definition of HD, MCD, etc., CMR images with no groundtruth segmentation contours are removed from the test set. The average Dice metric, Jaccard index, Hausdorff and mean contour distance for FCN2.0 and FCNSAD2.0 are reported in Table 8. The Dice metrics for FCN2.0 were \(86.7\mathrm{\%}\), \(82.8\mathrm{\%}\), \(80.8\mathrm{\%}\) and \(72.4\mathrm{\%}\) for LVED, LVES, RVED and RVES, respectively. The corresponding Dice metrics for FCNSAD2.0 were \(91.3\mathrm{\%}\), \(86.7\mathrm{\%}\), \(84.5\mathrm{\%}\) and \(77.0\mathrm{\%}\) for LVED, LVES, RVED and RVES, respectively.
Discussion
Many challenges currently exist for segmenting cardiac chambers from CMR images, notably in pediatric and CHD patients [12, 28,29,30]. In the past few years, a great deal of activities involved CMR segmentation using the learningbased approaches [5,6,7,8]. Despite their relative successes, they still have certain limitations. Small datasets incur a large bias to the segmentation, which makes these methods unreliable when the heart shape is outside the learning set (e.g., CHDs and postsurgically remodeled hearts). In brief, in pediatric cardiac imaging, learningbased methods remain computationally difficult and their predictive performance are less than optimal, due to the complexity of estimating parameters, as their convergence is not guaranteed [31].
While traditional deeplearning methods achieve good results for subjects with relatively normal structure, they are not as reliable for segmenting the CMR images of CHD patients [7, 8]. It is believed that the absence of large databases that include CMR studies from heterogeneous CHD subjects significantly limits the performance of these traditional models [32]. To address this shortcoming, our new method simultaneously generates synthetic CMR and their corresponding segmented images. Our DCGANbased FCN model was tested on a heterogeneous dataset of pediatric patients with complex CHDs.
Current software platforms designed for adult patients, such as cvi42 by Circle Cardiovascular Imaging Inc, were previously reported to have many shortcomings when used for pediatric or CHD applications. Children are not scaled little adults; pediatric patient characteristics, such as cardiac anatomy, function, higher heart rates, degree of cooperativity, and smaller body size, all affect postprocessing approaches to CMR, and there is currently no CMR segmentation tool dedicated to pediatric patients. Our major motivation for this study was the fact that current clinically available segmentation tools cannot be reliably used for children.
The LV and RV volumes were computed using our automatic segmentation methods, UNet model and the cvi42 (version 5.10.1.) were compared with the groundtruth volumes. As reported in Table 5, cvi42′s rendered volumes led to a significant difference between the predicted and true values of volumetric measures although it uses the original high quality and high resolution CMR images coming from the scanner for its predictions. Synthetic data augmentation also improved volume prediction for the UNet. In addition, as shown in Table 5, FCNSAD method outperforms UNetSAD for both chambers at endsystole and enddiastole. As reported in Table 7, our FCNSAD passed the ttest’s null hypothesis that the predicted and groundtruth volumes have identical expected values for LVED, LVES, RVED and RVES. However, cvi42 only passed the ttest for LVED. Since the pvalue is largely affected by the sample size etc., the ICC values are also reported for all models in Table 6. Our FCN and FCNSAD models led to an excellent correlation coefficient for both LV and RV at ED and ES. UNetSAD also resulted in ICC values greater than \(0.90\); however, UNet failed to achieve the excellent threshold for LVES. All cvi42′s ICC values are below the excellent threshold as well. Although the exact deep learning architecture of cvi42 is not known to us, in our opinion, the main reason for the relatively poor performance of cvi42 on pediatric CHD patients is the training of its neural network on the UK Biobank (as declared on their website), which is limited to the adult CMR images. More precisely, UK Biobank dataset does not represent features that are inherent to the heart of children with CHD.
As indicated in Tables 3 and 4, our method outperforms cvi42 in Dice metric, Jaccard index, HD, MCD, volume correlation, sensitivity, specificity, PPV and NPV. For LV segmentation, FCNSAD improved Dice metric from \(73.2\mathrm{\%}\) to \(90.6\mathrm{\%}\) and from \(71.0\mathrm{\%}\) to \(85.0\mathrm{\%}\) over cvi42 at enddiastole and endsystole, respectively. Similar improvement was observed for RV segmentation where Dice metric was improved from \(54.3\mathrm{\%}\) to \(84.4\mathrm{\%}\) and from \(53.7\mathrm{\%}\) to \(79.2\mathrm{\%}\) at enddiastole and endsystole, respectively. FCNSAD also reduced the average Hausdorff and mean contour distances compared to cvi42, which improved alignment between the contours as observed for both LV and RV at ED and ES. Similar improvement was observed for FCNSAD over UNetSAD. For LV segmentation, FCNSAD improved the Dice metric over UNetSAD from \(87.1\mathrm{\%}\) to \(90.6\mathrm{\%}\) for ED, and from \(82.3\mathrm{\%}\) to \(85.0\mathrm{\%}\) for ES. Similarly, FCNSAD improved UNetSAD for RV segmentation from \(81.8\mathrm{\%}\) to \(84.4\mathrm{\%}\) for ED, and from \(74.8\mathrm{\%}\) to \(79.2\mathrm{\%}\) for ES. FCNSAD also led to lower HD and MCD values compared to the UNetSAD method.
The data augmentation using DCGAN improved the Dice metric values by about \(3\mathrm{\%}\) in FCNSAD compared to our FCN method. Improvement was observed for Jaccard index, HD, MCD, volume correlation, sensitivity, specificity, PPV and NPV as well.
As shown in Table 3, synthetic data augmentation improved both Dice and Jaccard indices by about \(3\mathrm{\%}\) for UNet, which shows that synthetic data augmentation can improve the performance of FCN methods regardless of the type. Compared to the UNet method, similar improvement was observed in UNetSAD for both HD and MCD as well. Table 3 reveals that our FCN method outperforms UNet. Similarly, our FCNSAD method outperforms UNetSAD in all metrics for LVED, LVES, RVED and RVES.
Synthetic data augmentation also improved both Dice and Jaccard indices by about \(4\mathrm{\%}\) for FCN2.0. Similar improvement was observed in FCNSAD2.0 for both HD and MCD, which indicates better alignment between predicted and manual segmentation contours.
As expected, for all methods, RV segmentation proved to be more challenging than LV segmentation due to the complex RV shape and anatomy. The sophisticated crescent shape of RV as well as the considerable variations among the CHD subjects make it harder for the segmentation models to learn the mapping from a CMR image to its corresponding mask. Another major limiting factor that affects the performance of RV segmentation is the similarity of the signal intensities for RV trabeculations and myocardium.
Our methodology has overcome some of these limiting issues by learning the generative process through which each RV chamber is segmented. This information is then passed to the segmentation model via synthetic samples obtained from that generative process.
Corroborating the fact suggested by Yu et al., [33], larger contours can be more precisely delineated compared to the smaller ones. Segmentation of the CMR slices near the apex, particularly at the endsystole, is more challenging due to their small and irregular shape. Table 3 shows that both Dice and Jaccard indices are higher at ED versus ES for both ventricles. Another possible reason for lower performance at ES could be attributed to their small mask area and the smaller values of denominator at Eq. (3), which can lead to a major effect on the final values of these metrics, in case of even a few misclassified pixels.
Figures 7a and b show that the results generated by our FCNSAD model leads to high correlation for LVEDV and LVESV. This in turn leads to high correlation in EF and SV as shown in Figs. 8a and c in addition to \({R}_{vol}^{2}\) values in Table 3. Similarly, a high correlation was observed for RVEDV and RVESV in Figs. 7c and d, which subsequently leads to high correlation in EF and SV as shown in Figs. 8b and d as well as the \({R}_{vol}^{2}\) scores in Table 3. Bland–Altman analyses in Figs. 9 and 10 show negligible bias for the results due to FCNSAD model trained on the synthetically augmented data. Bland–Altman plots show that applying the FCNSAD method reduced the mean and standard deviation of error in predicted volumes and tightened the confidence interval compared to other methods.
The average elapsed times to segment a typical image in our GPUaccelerated computing platform is \(10\mathrm{ms}\). Overall, our model takes \(0.1\mathrm{s}\) to process each patient’s CMR data. Simulations show that even on a common CPUbased computing platform, our method requires about \(1.3\mathrm{s}\) to segment each patient’s CMR images, which indicates the clinical applicability of our automated segmentation model.
Similar quantitative and volumetric results were observed when the whole training and validation procedures were repeated with a different random split of training and test subjects. This indicates that no noticeable bias has occurred by the way subjects are categorized into training and test set.
Finally, we would like to emphasize on the significance of the choice of downsampling method over the segmentation performance. The entire process of training and testing was repeated using both nearestneighbor and bicubical downsampling methods. Compared to the nearestneighbor downsampling method, the bicubical downsampling provides a better performance for almost all studied models, except for the segmentation of the RVED using UNet and UNetSAD. For example, the bicubical FCNSAD results unequivocally passed the ttest for all chambers denoting the predicted and groundtruth volumes have identical expected value for LVED while the nearestneighbor FCNSAD did not. In our opinion, the main reason behind the superior performance of the bicubical downsampling method is its larger mask area compared to the nearestneighbor method.
Limitations
As a limitation, our method applied to the CMR datasets of patients with two ventricles, and was not yet trained to analyze patients with a systemic RV. Overall, to the computer, CMR images of hypoplastic left heart syndrome hearts are considered totally different objects. Therefore, a new training algorithm is needed to analyze the single ventricle hearts. We are currently designing a new model for that, which is beyond the scope of the present work. A second limitation of our method is that it must be calibrated before it can be applied to CMR images acquired from another scanner and with different cohort characteristics.
It should also be mentioned that we have used Fréchet Inception Distance (FID) to discriminate between real and synthetic CMR images. While the FID is commonly used, human judgment is still the best measure, although it is subjective and depends upon the experience. To derive a statistically significant validation, a large cohort of imaging physicians are needed which we aim to accomplish in near future.
We used OsiriX Lite software to calculate the volumes; however, OsiriX Lite may underestimate the volume if one image slice has no predicted segmentation due to its small chamber size. This was the case for the outliers at the bottom of Figs. 7c and d. Since our dataset did not include epicardial groundtruth contours, the cardiac mass was not calculated. Another limitation of this work is the lack of intra and interobserver variability assessments since only one set of manual segmentation was available. Finally, the loss of resolution, caused by the downsampling, was an inevitable limitation, which led to a compromise among speed, accuracy of the model and the data dimension.
Conclusions
Manual segmentation is subjective, less reproducible, time consuming and requires dedicated experts. Therefore, fully automated and accurate segmentation methods are desirable to provide precise and reproducible clinical indices such as ventricular ejection fraction, chamber volume, etc. in a clinically actionable timeframe. Our learningbased framework provides an automated, fast, and accurate model for LV and RV segmentation, and its outstanding performance in children with complex CHDs implies its potential to be used in clinics across the pediatric age group.
Contrary to many existing automated approaches, our framework does not make any assumption about the image or the structure of the heart, and performs the segmentation by learning features of the image at different levels of abstraction in the hierarchy of network layers. To improve the robustness and accuracy of our segmentation method, a novel generative adversarial network is introduced to enlarge the training data via synthetically generated and realistic looking samples. The new technique is also applicable on other FCN methods (e.g., Unet) and can improve the FCN performance independent of its specific type. The FCN trained on both real and synthetic data exhibits an improvement in various statistical and clinical measures such as Dice, HD and volume over the existing machine learning methods.
Availability of data and materials
The CMR datasets analyzed during the current study are available from the public repository at https://github.com/saeedkarimi/Cardiac_MRI_Segmentation
Abbreviations
 2D:

Two dimensional
 ACDC:

Automated Cardiac Diagnosis Challenge
 AI:

Artificial intellilgence
 bSSFP:

Balanced steady state free precesion
 CAA:

Coronary artery anomaly
 CHD:

Congenital heart disease
 CMR:

Cardiovascular magnetic resonance
 CNN:

Convolutional neural network
 CPU:

Central processing unit
 DCGAN:

Deep convolutional generative adversarial network
 DORV:

Double outlet right ventricle
 ED:

Enddiastole
 EF:

Ejection fraction
 ES:

Endsystole
 FCN:

Fully convolutional network
 FN:

False negative
 FP:

False positive
 GAN:

Generative adversarial network
 GPU:

Graphics processing unit
 HD:

Hausdorff distance
 ICC:

Intraclass Correlation Coefficient;
 LV:

Left ventricle/left ventricular
 LVEDV:

Left ventricular enddiastolic volume
 LVEF:

Left ventricular ejection fraction
 LVESV:

Left ventricular endsystolic volume
 LVSV:

Left ventricular stroke volume
 MCD:

Mean contour distance;
 MCD:

Mean contour distanceNPV: Negative predictive value PPV: Positive predictive value
 ReLU:

Rectified linear unit
 RV:

Right ventricle/right ventricular
 RVEDV:

Right ventricular enddiastolic volume
 RVEF:

Right ventricular ejection fraction
 RVESV:

Right ventricular endsystolic volume
 RVSV:

Right ventricular stroke volume
 SAD:

Synthetically augmented dataset
 SGD:

Stochastic gradient descent
 SV:

Stroke volume
 TGA:

Transposition of the great arteries
 TN:

True negative
 TOF:

Tetralogy of Fallo
 TP:

True positive
References
Best KE, Rankin J. Longterm survival of individuals born with congenital heart disease: a systematic review and metaanalysis. J Am Heart Assoc. 2016;5(6):002846.
Oster ME, Lee KA, Honein MA, RiehleColarusso T, Shin M, Correa A. Temporal trends in survival among infants with critical congenital heart defects. Pediatrics. 2013;131(5):1502–8.
Yuan C, Kerwin WS, Ferguson MS, Polissar N, Zhang S, Cai J, Hatsukami TS. Contrastenhanced high resolution mri for atherosclerotic carotid artery tissue characterization. J Magn Reson Imag. 2002;15(1):62–7.
Lima JA, Desai MY. Cardiovascular magnetic resonance imaging: current and emerging applications. J Am Coll Cardiol. 2004;44(6):1164–71.
Avendi M, Kheradvar A, Jafarkhani H. A combined deeplearning and deformablemodel approach to fully automatic segmentation of the left ventricle in cardiac mri. Med Image Anal. 2016;30:108–19.
Avendi MR, Kheradvar A, Jafarkhani H. Automatic segmentation of the right ventricle from cardiac mri using a learningbased approach. Magn Reson Med. 2017;78(6):2439–48.
Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, Lee AM, Aung N, Lukaschuk E, Sanghvi MM, et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. J Cardiovasc Magn Reson. 2018;20(1):65.
Backhaus SJ, Staab W, Steinmetz M, Ritter CO, Lotz J, Hasenfuß G, Schuster A, Kowallick JT. Fully automated quantification of biventricular volumes and function in cardiovascular magnetic resonance: applicability to clinical routine settings. J Cardiovasc Magn Reson. 2019;21(1):24.
Arafati A, Hu P, Finn JP, Rickers C, Cheng AL, Jafarkhani H, Kheradvar A. Artificial intelligence in pediatric and adult congenital cardiac mri: an unmet clinical need. Cardiovascular diagnosis and therapy. 2019;9(Suppl 2):310.
The 2015 Kaggle Second Annual Data Science Bowl. httpp://http://www.kaggle.com/c/secondannualdatasciencebowl(2015)
Petersen SE, Matthews PM, Francis JM, Robson MD, Zemrak F, Boubertakh R, Young AA, Hudson S, Weale P, Garratt S, et al. Uk biobank’s cardiovascular magnetic resonance protocol. J Cardiovasc Magn Reson. 2015;18(1):8.
Petitjean C, Dacher JN. A review of segmentation methods in short axis cardiac mr images. Med Image Anal. 2011;15(2):169–84.
Petitjean C, Zuluaga MA, Bai W, Dacher JN, Grosgeorge D, Caudron J, Ruan S, Ayed IB, Cardoso MJ, Chen HC, et al. Right ventricle segmentation from cardiac mri: a collation study. Med Image Anal. 2015;19(1):187–202.
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JA, van Ginneken B, Sanchez CI. A survey on deep learning in medical image analysis. Med Image Analy. 2017;42:60–88.
Kazeminia S, Baur C, Kuijper A, van Ginneken B, Navab N, Albarqouni S, Mukhopadhyay A. Gans for medical image analysis. arXiv:1809.06222. https://arxiv.org/abs/1809.06222
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434. https://arxiv.org/abs/1511.06434
Ronneberger O, Fischer P, Brox T. Unet: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computerassisted intervention. Cham: Springer; 2015. p. 234–41.
Tran PV. A fully convolutional neural network for cardiac segmentation in shortaxis mri. arXiv:1604.00494. https://arxiv.org/abs/1604.00494
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In:Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Roth HR, Lu L, Liu J, Yao J, Seff A, Cherry K, Kim L, Summers RM. Improving computeraided detection using convolutional neural networks and random view aggregation. IEEE Trans Med Imaging. 2015;35(5):1170–81.
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256(2010)
Nesterov Y. A method for unconstrained convex minimization problem with the rate of convergence o (1/kˆ2). Doklady AN USSR. 1983;269:543–7.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980. https://arxiv.org/abs/1412.6980
Fisher, R.A., et al.: Statistical methods for research workers. Statistical methods for research workers. (llth ed. revised) (1950)
Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine. 2016;15(2):155–63.
Tavakoli V, Amini AA. A survey of shapedbased registration and segmentation techniques for cardiac images. Comput Vis Image Underst. 2013;117(9):966–89.
Queirós S, Barbosa D, Heyde B, Morais P, Vilaça JL, Friboulet D, Bernard O, D’hooge J. Fast automatic myocardial segmentation in 4d cine cmr datasets. Med Image Analy. 2014;18(7):1115–31.
Hajiaghayi, M., Groves, E.M., Jafarkhani, H., Kheradvar, A.: A 3d active contour method for automated segmentation of the left ventricle from magnetic resonance images. In: IEEE Transactions on Biomedical Engineering 64(1), 134–144 (2017)
Dreijer JF, Herbst BM, Du Preez JA. Left ventricular segmentation from mri datasets with edge modelling conditional random fields. BMC Med Imaging. 2013;13(1):24.
Snaauw G, Gong D, Maicas G, van den Hengel A, Niessen WJ, Verjans J, Carneiro G. Endtoend diagnosis and segmentation learning from cardiac magnetic resonance imaging. In: Proceedings of the IEEE 16th international symposium on biomedical imaging; 2019. p. 802–5.
Yu L, Yang X, Qin J, Heng PA. 3d fractalnet: dense volumetric segmentation for cardiovascular mri volumes. In: Reconstruction, Segmentation, and Analysis of Medical Images. Springer, Cham;2016. pp. 103–110.
Goodfellow I, PougetAbadie J, Mirza M, Xu B, WardeFarley D, Ozair S, Courville A, Bengio Y.Generative adversarial nets. In: Advances in Neural Information Processing Systems, 2014; pp. 2672–2680
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two timescale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, 2017; pp.6626–6637
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2015; pp. 1–9 (2015)
Acknowledgements
The authors wish to thank the staff at the Children’s Hospital Los Angeles.
Funding
This work was supported in part by American Heart Association Grant # 19AIML35180067.
Author information
Authors and Affiliations
Contributions
SK and HJ conceived and designed the artificial intelligence methodology. SK implemented the method and performed the assessment of the automated segmentation. SK, AK, and HJ wrote the manuscript. AA contributed to data processing, performance assessment and depicted Figs. 1, 2, 3. AC performed data acquisition and manual image annotation. YW calculated the volumes using OsiriX Lite software and generated correlation and Bland–Altman plots. AK and HJ share equal contribution in defining the project and guiding the studies and are both corresponding authors. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
The study was reviewed by the Children’s Hospital Los Angeles Institutional Review Board and was granted an exemption per 45 CFR 46.104[d] [4][iii] (secondary research for which consent is not required).
Consent for publication
Not applicable.
Competing interests
SK, AK, and HJ are coinventors on a patent pending related to the scope of the present work. The other authors do not have any competing interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Technical details
Appendix: Technical details
Fully convolutional network architecture
All convolution layers shared the kernel size of \(3\), stride of \(1\) pixel with hyperbolic tangent function (Tanh) as their activation function. The input for each convolution layer was padded such that the output retains the same length as the original input. To avoid overfitting, \({l}_{2}\) regularization was applied to control layer parameters during optimization. To circumvent underfitting, a small regularization coefficient of \(0.0005\) was selected. These penalties were applied on a perlayer basis and incorporated in the loss function that the network optimizes during training. Each convolution layer’s output was normalized to zeromean and unitvariance that allows the model to focus on the structural similarities/dissimilarities rather than on the amplitudedriven ones.
The FCN model in Fig. 1 includes approximately \(11\) million parameters. Considering our relatively small CMR image dataset of \(527\) (\(570\)) left (right) ventricle images, the network is prone to overfitting. Therefore, in addition to \({l}_{2}\) regularization, three dropout layers that randomly set \(50\mathrm{\%}\) of the input units to \(0\) at each update during training were applied after the last convolution layers including \(128\), \(256\) and \(512\) filters.
Deep convolutional generative adversarial networks
The adversarial modeling framework is comprised of two components, commonly referred to as the generator and discriminator. The functionality of the generator is denoted by a differentiable function, \(G\), which maps the input noise variable \(z\sim {p}_{Z}\left(z\right)\) to a point \(x=G\left(z\right)\) in the data space. The generator should compete against an adversary, i.e., the discriminator, that strives to distinguish between real samples drawn from the genuine CMR data and synthetic samples created by the generator. More precisely, if the functionality of the discriminator is denoted by a differentiable mapping \(D\), then \(D\left(x\right)\) is a single scalar representing the probability that \(x\) comes from the data rather than the generator output. The discriminator is trained to maximize the probability of assigning the correct label to both real and synthetic samples while the generator is simultaneously trained to synthesize samples that the discriminator interprets with high probability as real. More precisely, the discriminator is trained to maximize \(D\left(x\right)\) when \(x\) is drawn from the data distribution \({p}_{\mathrm{data}}\) while the generator is trained to maximize \(D\left(G\left(z\right)\right)\), or equivalently minimize \(1D\left(G\left(z\right)\right)\). Hence, adversarial networks are based on a zerosum noncooperative game, i.e., a twoplayer minimax game in which the generator and discriminator are trained by optimizing the following objective function [34]:
where \({\mathbb{E}}\left[.\right]\) represents expectation. The adversarial model converges when the generator and discriminator reach a Nash equilibrium, which is the optimal point for the objective function in Eq. (1). Since both \(G\) and \(D\) strive to undermine each other, a Nash equilibrium is achieved when the generator recovers the underlying data distribution and the output of \(D\) is ubiquitously \(\frac{1}{2}\), i.e., the discriminator cannot distinguish between real and synthetic data anymore. The optimal generator and discriminator at Nash equilibrium are denoted by \({G}^{*}\) and \({D}^{*}\), respectively. New data samples are generated by feeding random noise samples to the optimal generator \({G}^{*}\).
DCGAN optimization
The learning rate, parameter \({\beta }_{1}\), and parameter \({\beta }_{2}\) in Adam optimizer were set to \(0.0002\), \(0.5\), and \(0.999\), respectively. The binary cross entropy between the target and the output was minimized. Since Adam, like any other gradientbased optimizer, is a local optimization method, only a local Nash equilibrium can be established between the generator and discriminator. A common method to quantify the quality of the generated synthetic samples is the FID, originally proposed by Heusel et al. [35]. In FID, features of both real and synthetic data are extracted via a specific layer of Inception v3 model [36]. These features are then modeled as multivariate Gaussian, and the estimated mean and covariance parameters are used to calculate the distance as [35]:
where \(\left({\mu }_{s},{\Sigma }_{s}\right)\) and \(\left({\mu }_{r},{\Sigma }_{r}\right)\) are the mean and covariance of the extracted feature from the synthetic and real data, respectively. Lower FID values indicate better image quality and diversity among the set of synthetic samples.
Once the locally optimal generator was obtained, various randomly selected subsets of the generated synthetic images were considered and the one with the lowest FID distance to the set of real samples was chosen.
Metrics definition
The Dice and Jaccard, as defined in Eq. (3), are two measures of contour overlap with a range from zero to one where a higher index value indicates a better match between the predicted and true contours:
where \(A\) and \(B\) are true and predicted segmentation, respectively. Hausdorff and mean contour distances are two other standard measures that show how far away the predicted and groundtruth contours are from each other. These metrics are defined as:
where \(\partial A\) and \(\partial B\) denote the contours of the segmentation \(A\) and \(B\), respectively, and \(d\left(a,\partial B\right)\) is the minimum Euclidean distance from point \(a\) to contour \(\partial B\). The lower values for these metrics indicate better agreement between automated and manual segmentation. The ICC for paired data values \(\left({x}_{i},{x}_{i}^{^{\prime}}\right)\), for \(i=1,\cdots ,N\), originally proposed in [26], is defined as:
where
ICC is a descriptive statistic that quantifies the similarity of the samples in the same group.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
KarimiBidhendi, S., Arafati, A., Cheng, A.L. et al. Fully‑automated deep‑learning segmentation of pediatric cardiovascular magnetic resonance of patients with complex congenital heart diseases. J Cardiovasc Magn Reson 22, 80 (2020). https://doi.org/10.1186/s12968020006780
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12968020006780