Evaluation of current algorithms for segmentation of scar tissue from late Gadolinium enhancement cardiovascular magnetic resonance of the left atrium: an open-access grand challenge
- Rashed Karim1Email author,
- R James Housden1,
- Mayuragoban Balasubramaniam1,
- Zhong Chen1,
- Daniel Perry2,
- Ayesha Uddin1,
- Yosra Al-Beyatti1,
- Ebrahim Palkhi1,
- Prince Acheampong1,
- Samantha Obom1,
- Anja Hennemuth8,
- YingLi Lu7,
- Wenjia Bai4,
- Wenzhe Shi4,
- Yi Gao6,
- Heinz-Otto Peitgen8,
- Perry Radau7,
- Reza Razavi1,
- Allen Tannenbaum5,
- Daniel Rueckert4,
- Josh Cates2,
- Tobias Schaeffter1,
- Dana Peters3, 9,
- Rob MacLeod2 and
- Kawal Rhode1
© Karim et al.; licensee BioMed Central Ltd. 2013
Received: 12 August 2013
Accepted: 10 December 2013
Published: 20 December 2013
Late Gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) imaging can be used to visualise regions of fibrosis and scarring in the left atrium (LA) myocardium. This can be important for treatment stratification of patients with atrial fibrillation (AF) and for assessment of treatment after radio frequency catheter ablation (RFCA). In this paper we present a standardised evaluation benchmarking framework for algorithms segmenting fibrosis and scar from LGE CMR images. The algorithms reported are the response to an open challenge that was put to the medical imaging community through an ISBI (IEEE International Symposium on Biomedical Imaging) workshop.
The image database consisted of 60 multicenter, multivendor LGE CMR image datasets from patients with AF, with 30 images taken before and 30 after RFCA for the treatment of AF. A reference standard for scar and fibrosis was established by merging manual segmentations from three observers. Furthermore, scar was also quantified using 2, 3 and 4 standard deviations (SD) and full-width-at-half-maximum (FWHM) methods. Seven institutions responded to the challenge: Imperial College (IC), Mevis Fraunhofer (MV), Sunnybrook Health Sciences (SY), Harvard/Boston University (HB), Yale School of Medicine (YL), King’s College London (KCL) and Utah CARMA (UTA, UTB). There were 8 different algorithms evaluated in this study.
Some algorithms were able to perform significantly better than SD and FWHM methods in both pre- and post-ablation imaging. Segmentation in pre-ablation images was challenging and good correlation with the reference standard was found in post-ablation images. Overlap scores (out of 100) with the reference standard were as follows: Pre: IC = 37, MV = 22, SY = 17, YL = 48, KCL = 30, UTA = 42, UTB = 45; Post: IC = 76, MV = 85, SY = 73, HB = 76, YL = 84, KCL = 78, UTA = 78, UTB = 72.
The study concludes that currently no algorithm is deemed clearly better than others. There is scope for further algorithmic developments in LA fibrosis and scar quantification from LGE CMR images. Benchmarking of future scar segmentation algorithms is thus important. The proposed benchmarking framework is made available as open-source and new participants can evaluate their algorithms via a web-based interface.
KeywordsLate gadolinium enhancement Cardiovascular magnetic resonance Atrial fibrillation Segmentation Algorithm benchmarking
In the past decade, there has been a rapid development of analysis tools in medical imaging. In contrast, their translation to the clinical environment has remained limited. A major contributing factor for this failure is lack of proper validation strategies. Even though algorithms are tested in-house extensively following development, it is often not clear how they perform relative to other state-of-the-art algorithms. The main reason for this is they are not compared using the same set of data. Differences in evaluated datasets (i.e. patient type, image quality and resolution) makes a fair comparison difficult.
Benchmarking of algorithms is thus a very important activity as we move from bench to bedside in the medical image processing community. In the last few years, several conferences in the medical image analysis field have provided a platform to benchmark algorithms from multiple research groups. These challenges have been organised to invite participants to test their algorithms on common data. The participants are given a number of training datasets and then asked to complete analysis of a number of unseen data within an allotted time. Following submission, the algorithms’ results are evaluated in a unified manner.
In the past few years, a number of collaborating research groups have set up a publicly available evaluation frameworks for the medical image processing and analysis community. Most of them have been initiated through an organised challenge and an index of past challenges can be found inhttp://www.grand-challenge.org/. In the cardiac imaging domain, some recent challenges include cardiac motion tracking and coronary artery stenosis detection.
Motivation for left atrial fibrosis/scar segmentation challenge
There is a great interest in understanding the mechanisms of the causes of atrial fibrillation (AF) and of pulmonary vein (PV) reconnection following ablation procedures. Late Gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) imaging plays an important role in the management of AF. Recent work has demonstrated its use in assessment of atrial fibrosis before ablation and of atrial injury after ablation[4–8].
Segmentation of fibrosis or scar in LGE CMR is challenging due to multiple causes including the thin LA wall, contrast variation due to inversion time, signal-to-noise ratio, motion blurring and artefacts. The inversion time choice can generate the appearance of more or less scar, and change the appropriate scar threshold. Motion blurring also reduces the appearance of scar. There are also artefacts which appear in the image due to respiratory compensation, selectively reducing the ability to visualise scar in the right PVs. There is also the complex geometry of the LA, resulting in some transverse slices where a very small section of the anatomy is visible, particularly for left and right superior PVs. There are also many regularly enhancing structures, such as the aortic wall, the valves and the oesophagus, which must be distinguished from LA enhancement.
As CMR plays an increasingly important role in the quantification of pre-ablation fibrosis and post-ablation scar, development of reliable algorithms that remove observer bias is key for clinically useful quantification. To our knowledge, there is no standardised evaluation framework or methodology to evaluate the performance of existing or newly developed LGE CMR segmentation.
State-of-the-art for cardiac fibrosis/scar segmentation
Overview of previously published scar detection, quantification and segmentation methods
Kim et al.
Infarct size, ex-vivo
Amado et al.
Bland altman, Infarct volume
Kolipaka et al.
Percentage scar, Bland-Altman
Positano et al.
Yan et al.
Schmidt et al.
Hennemuth et al.
Percentage scar, Bland-Altman
Oakes et al.
Detsky et al.
Tao et al.
Knowles et al.
Lu et al.
Infarct size and Bland-Altman
Other approaches exist to compute the threshold automatically or apply clustering[14, 16], or with Graph-cuts. Visualization of infarcted regions with maximum intensity projections (MIP) is also possible which is useful for visualising the amount of scarring on the LA surface. For detection of pre-ablation fibrosis, a global threshold for the image can be computed and adjusting it on a slice-by-slice basis provides good detection.
All of the existing methods reviewed except for and detect scar in the ventricle myocardium. Segmenting scar in the atrium poses different challenges especially from nearby enhancing structures such as aortic wall and valves. The atrial myocardium is of smaller thickness compared to ventricular myocardium and this adds to the difficulty of the problem. It is also important to understand that using a fixed model (SD and FWHM) is not suitable for the atrium and in our opinion also for the ventricle despite several studies utilising this. The reasons are clear: a fixed model cannot handle all the different variabilities encountered and these are both from the varied internal (size, distribution and heterogeneity of scar) and varied external (resolution, image noise, inversion time, surface coil intensity variation) situations. And there is at least one study supporting this fact - in where it was shown that the threshold had to be re-adjusted on various slices to obtain a suitable segmentation.
Proposed evaluation framework
In this paper we present an evaluation framework, accessible via a web-based interface, for algorithms that segment LA fibrosis or scar from both pre- and post-ablation LGE CMR images. The presented results were submitted as a response to the open challenge that was put to the medical imaging community through the cDEMRIS (Cardiac Delayed Enhancement Segmentation Challenge) workshop organised as part of the ISBI 2012 (IEEE International Symposium on Biomedical Imaging) annual meeting. Each participant quantified the amount of fibrosis or scar in high-resolution 3D LGE CMR of 30 pre- and 30 post-ablation patients. There were in total 7 institutions who responded to the challenge, and segmentation results from 8 different algorithms were submitted. The datasets used in this evaluation are publicly available via the challenge website:http://www.isd.kcl.ac.uk/cdemris/.
The proposed evaluation framework aims to provide a platform for testing and comparing newly devised algorithms through a web-based interface. With 3 out of the 8 algorithms evaluated in this work already published in literature[5, 15, 18], the framework provides a valuable test-bed.
Data acquisition database
Image acquisition: image acquisition parameters for the challenge LGE data
Siemens Avanto 1.5T or Vario 3T
Philips Acheiva 1.5T
Philips Achieva 1.5 T
Free-breathing (FB) with navigator-gating
FB and navigator-gating with fat suppression
FB with navigator-gating with fat suppression
TI † , TR, TE
300 ms, 5.4 ms, 2.3 ms
280 ms, 5.3 ms, 2.1 ms
280 ms, 5.3 ms, 2.1 ms
1.25 × 1.25 × 2.5mm
1.4 × 1.4 × 1.4mm
1.3 × 1.3 × 4.0mm
< 7 days
< 7 days
< 48 hours
3 - 6 months
= 30 days
3 - 6 months
A brief summary of algorithms that were evaluated on the proposed framework
IC: Bai et al.
30 pre and post
Euclidean distance - 3 mm
Fixed sigmoid models derived from empirical data
MV: Hennemuth et al.
Region-growing with EM-fitting
30 pre and post
Euclidean distance - 3 mm
Post ablation imaging
SY: Lu et al.
MRF model with graph-cuts
20 pre and post
Dilation - 4 mm
Fuzzy membership - improved delineation
Post-processing for small cluster removal
HB: Gao et al.
Active contour and EM-fitting
Active contour (snake)
Accurate myocardial segmentation
Fixed number of gaussian mixtures in model (i.e. two)
YL: Peters et al.
15 pre and post
Accurate segmentation on both pre- and post.
KCL: Karim et al.
MRF model with graph-cuts
30 pre and post
Post-processing steps necessary
UTA: Cates et al.
Histogram analysis and simple thresholding
30 pre and post
Accurate segmentation on pre and post.
UTB: Perry et al.
30 pre and post
Equivalent variance across all clusters - LA scar variance more variable
Algorithm 1: Imperial college - hysterisis thresholding (IC)
Hysteresis thresholding was used in this work to segment scar. It is a well-known approach in image processing and computer vision. It is an improvement over regular thresholding where a major drawback is the absence of coherence in the final segmentation. Hysteresis thresholding overcomes this because faint sections of atrial scar can also be segmented as long as they are adjacent to some salient sections.
where c d and h d are parameters of the sigmoid function and d(x) is the Euclidean distance from LA endocardium. The joint probability of both the intensity and distance likelihoods, i.e. p(x) = p d (x) · p i (x) was used to generate a probabilistic map. Using hysteresis thresholding, pixels above the higher threshold limit were classified as foreground. Those above the lower threshold limit and connected to foreground were also classified as foreground. This was accomplished by exploring a foreground pixel’s neighbourhood and thus this ensured coherence in the segmented result.
Algorithm 2: Mevis - Region growing with mixture model fitting (MV)
Region growing is an important segmentation technique for finding groups of connected pixels with intensity homogeneity. It was implemented in this work with thresholds selected both for region-growing and seed selection using Gaussian mixture models.
where I t is the intensity at the intersection of blood and LGE mixtures: B and LGE. It is expected that at this intersection, LGE intensities starts contributing more than blood intensities. Region growing was constrained within a 6 mm band around the endocardial segmentation allowing 1 mm inside and 5 mm outside the endocardial surface. This allowed for any errors in the endocardial contour.
Algorithm 3: Sunnybrook - Graph-cuts with fuzzy c-means clustering (SY)
The proposed technique uses graph-cuts and a modified version of this algorithm is published in. In mathematics, a graph is a network of nodes connected by links. Each link can be assigned a weight. An image contains pixels, each of which can be represented with a node. Adjacent pixels or nodes can then be interconnected with links. This allows an image to be modelled as a graph. Numerous problems have been proposed and solved on graphs, for example shortest path through two nodes or partitioning the graph into two node sets.
For the task of binary image segmentation, pixels are grouped or partitioned into two disjoint sets. Similarly, graph-cuts is an approach of partitioning a graph into two or more sub-graphs with some imposed constraints. Two special nodes called source and sink nodes are assigned, with each node in the graph linked to them. These nodes represent labels of the segmentation (i.e. foreground and background). Each link to the source and sink is weighed based on the probability of the node for the label. A minimum cut through the graph can then be computed, partitioning it into two sets of nodes. Each set is connected to source or sink. This essentially computes a segmentation of each pixel into a label. The minimum cut and maximum flow are dual problems both investigated thoroughly in mathematics[20, 21] and computer vision[22, 23].
where d(x,y) is Euclidean distance between pixels x and y and β is a penalty co-efficient fixed at 5 in this work. This value was chosen to increase the relative importance of high gradient between pixels of different classes, refer to for further details.
Algorithm 4: Harvard/Boston University - Active contours and mixture model fitting (HB)
Two techniques are implemented in this work, namely active contour and the Expectation-Maximization (EM) algorithm. A brief background is given here on each technique. Further details can be found in.
Active contours was used in this technique to obtain the epicardial boundary. It counteracts the issue of region leaking in region growing. This is possible by imposing constraints on the growing region. An initial contour was modelled with a spline (i.e. a free-form curve) allowing it to grow flexibly with additional constraints placed by the image. An energy function captured these constraints and the final shape of the contour was obtained through energy minimisation.
The expectation-maximization (EM) algorithm is a technique for estimating model parameters given the observed data. The observed data in this submission are the distributions of atrial wall image intensities and the model is a statistical Gaussian mixture model. The EM algorithm computes the best estimate of model parameters for which the observed data are most likely. It alternates between the E-step which computes the expectation of the likelihood of observed data using a present estimate of model parameters and the M-step that re-computes model parameters by maximising the likelihood found in the E-step.
where the intensity gradients ∇I(x) are smoothed using a Gaussian filter G σ . This evolves the deformable surface governed by E(S) and restricts it with a combination of distance from endocardium (i.e. maximum 3 mm) and intensity gradient. The evolution must stop at the epicardial border where an intensity change is expected.
Following segmentation of atrial wall, scar is classified from healthy tissue by modelling the distribution of intensities within atrial wall as a mixture of two Gaussians. The Gaussians mixture represent scar and healthy tissue. The mean and standard deviation of each Gaussian in the mixture model is determined using the EM-algorithm.
Algorithm 5: Yale - Threshold selection with manual wall delineation (YL)
Simple thresholding is a fundamental technique in image segmentation. Thresholding is used in this work to segment scar from both pre-ablation and post-ablation images. The main disadvantage of thresholding is that only intensity information is considered and the relationships between pixels is not taken into account. Thus, there is no guarantee that the pixels identified by thresholding are contiguous.
Algorithm 6: KCL - Graph-cuts with EM-algorithm (KCL)
A background of the techniques used in this work is described above in Sections 'Algorithm 3: Sunnybrook - Graph-cuts with fuzzy c-means clustering (SY)’ (Graph-cuts) and 'Algorithm 4: Harvard/Boston University - Active contours and mixture model fitting (HB)’ (EM-algorithm). More details can be found in.
Scar was segmented both in pre- and post-ablation images using the graph-cut algorithm. A statistical distribution model of scar tissue in both pre- and post-ablation images was developed prior to segmentation. This distribution model was derived from a training set of images. As a training set was not provided as part of the challenge, the leave-one-out approach was used for training with 29/30 images for training and 1/30 for testing. The training distribution model is a Gaussian distribution of the scar intensities in the training image represented as a ratio of scar to average blood-pool. Scar was segmented manually by an experienced observer.
The intensity distribution model for non-scar or healthy tissue was obtained from the target or unseen image. A Gaussian mixture was used for this distribution model. The number of mixtures in the model was kept variable (1 to 5) depending on the configuration which best fits the image. The standard EM-algorithm computed mean and variance for each mixture. Only a region 3 mm inside and outside the LA endocardium was used for the EM fit, discarding the rest of the image. This also became the search space for scar.
Pixels within the search space were modelled as a graph network with paths to source and sink nodes (i.e. scar and healthy tissue labels). The path to the scar tissue label was assigned a probability value from the scar training distribution model and the path to the healthy tissue label was assigned a probability value from the non-scar distribution model. Paths between adjacent pixels were assigned a probability value based on intensity homogeneity, with a low probability value for dissimilar intensities. All of the above is captured with an energy function which is the standard graph-cut functional and is equivalent to Eq. 7.
Algorithm 7: Utah A - Threshold selection with manual wall delineation (UTA)
The method was primarily implemented for pre-ablation fibrosis. However, in this challenge, its results on post-ablation data was also submitted. Thresholding is used in this work and is described above in Section 'Background’. The method is also described in detail in.
The atrial wall myocardium is delineated prior to scar segmentation. An experienced observer delineated the wall in every slice. Using the intensity histogram of pixels within the delineated wall, a threshold for scar was calculated. It is expected that the histogram is bi-modal with modes for enhancement and non-enhancement intensities. The threshold was then computed as +2-4 standard deviations off the mean of the lower mode of the histogram. This threshold was adjusted for every slice based on whether the algorithm was over- or under-estimating scar.
Algorithm 8: Utah B - Unsupervised learning using k-means clustering (UTB)
The method uses k-means clustering which is a machine learning approach used to identify the optimal number of pixel groups or clusters. It is an unsupervised learning technique requiring no prior knowledge or training data. In k-means clustering, the number of possible clusters is specified. It is an iterative process, where at each iteration the centre of each cluster is updated and membership of each point to a cluster is updated based on a pre-defined distance/error metric in the feature space.
The technique was primarily implemented for post-ablation scar. However, in this challenge, its results on pre-ablation data was also submitted. There were two important considerations for the implementation of k-means: 1) the number of clusters in the k-means algorithm and 2) the feature vector for comparing pixels. Prior to segmentation, the optimal number of clusters and feature vectors were determined through empirical evaluation. The number of clusters was varied between 3 to 10 and image features such as normalised voxel intensity, the Sobel filter and the 14 texture metrics proposed by Haralick et al. were tested. The optimal number of clusters was found to be 4 with normalised voxel intensity as the feature vector. Following k-means clustering, the cluster with the highest mean intensity was assigned as the scar cluster.
Reference standard 1: pseudo-ground truth
In order to obtain a reference standard for scar, volumetric segmentations of scars were obtained from three separate observers. These observers have substantial experience looking at scars in LGE CMR images for both pre- and post-ablation images. The observers were from different centres. They were blinded to the image scanner manufacturers and also to the results of the challenge. Scars in the images were segmented as follows: 1) each axial slice in the LGE CMR image was analyzed separately. Segmentation of the LA endocardial body was loaded as an overlay; 2) pixels enhanced along the endocardial border were labelled as scar; and 3) segmentations were also corrected in coronal and sagittal slices, wherever necessary.
Although the observers were provided with the same guidelines, their segmentations differed in some instances especially in images with low contrast enhancement ratio. It was thus important to merge the segmentations and obtain a consensus. This was possible by merging segmentations using the STAPLE algorithm described in. For each voxel, a probability estimate for the true segmentation was computed. The consensus segmentation can then be obtained by thresholding this probability above 0.7 or 70%. This is referred in the rest of the text as the pseudo-ground truth.
Reference standard 2: n-SD and FWHM
The optimal method for quantifying scar from LGE CMR images yet remains unclear. However, certain methods have been adopted for obtaining scar using a fixed model. In these fixed models, signal intensity of normal myocardium is measured and a certain number of SD from this measured intensity is used as the threshold. Although in this threshold was set to 2-SD, recently it was shown that FWHM was far more reproducible and reliable than 2-SD. Other cut-offs are also used: 3,4,5 or 6-SDs. The FWHM technique, which uses half the maximal signal within a hyper-enhanced region in scar, is currently being advocated as the most reproducible technique for ventricle myocardial scar.
In order to gauge each challenger’s methodology against fixed-model quantification methods, the LGE CMR images were segmented using 2, 4, 6-SD and FWHM methods. For each method, a segmentation of atrial myocardium was necessary and this was approximated by dilating the endocardial wall 3 mm. For the n-SD methods, an expert observer located a region of voxels in atrial myocardial that was healthy. The mean and SD of this region were calculated. Voxels with intensity greater than 2, 4, 6-SD, in the atrial myocardium, were labelled as scar. For the FWHM method, an experienced observer identified an enhanced region within atrial myocardium. The threshold was then set to 50% of the maximum intensity in this selected region. In some rare instances, the 50% cut-off was adjusted to 60% or 70% when a 50% cut-off was too low for the image.
- 1.Regional metric: The Dice similarity co-efficient was used as a regional metric. It measures the proportion of true positives in the segmentation:(11)
- 2.Surface-based metric: It is common to visualise segmentations of scar on the LA surface. This is usually possible with a MIP. The LA surface can be constructed as an iso-surface from a volumetric binary segmentation using the marching cubes algorithm . Scar segmentation is MIP-ed and each surface mesh vertex attains a label (1 = scar,0 = not scar). The surface-based metric measures the root-mean-squared-error (RMSE) between vertex points labelled as scar in the algorithm’s output and ground-truth distance. The RMSE is given by:(12)
- 3.Volumetric-based metric: The total volume error between the challenger’s segmentation and pseudo ground truth was found:(13)
where V T is the volume of scar in the segmentation and V G is the volume of scar in consensus segmentation.
Acquisition artifacts and non-scar related enhancement are common in atrial LGE CMR scans. Unless these enhancements are explicitly modelled into the technique, it is challenging to distinguish them. Two sources of non-scar related enhancements commonly seen in atrial LGE CMR images are: 1) the navigator beam artifact often seen near the right PVs, and 2) Gadolinium uptake by the aortic wall and valves. To test whether the methods are able to handle un-related enhancements, each challenger’s segmentations were evaluated separately in these regions. An experienced observer selected regions containing navigator artefacts and aortic wall enhancements. The percentage of voxels detected by each method in these spurious regions was determined. This gave an indication of the proportion of false positives.
A good contrast between normal myocardium, blood pool and scar is desirable and is the most technically challenging part of LGE CMR image acquisition. The quality of contrast depends on achieving the optimal inversion time. Each post-ablation image was scored by three raters experienced in LGE CMR images and the average score was taken. Images in the database (only post-ablation scans) were ranked into three categories: good, average and poor. The Dice metric was computed separately in each category. This indicated how robust the algorithms were against contrast enhancement quality.
In this section results from our evaluation are presented with figures and plots.
Segmentation accuracy with pseudo ground truth
For each LGE CMR scan available for the challenge, a pseudo ground truth was available by combining manual segmentations of scar from three experienced observers as described in Section 'Reference standard 1: pseudo-ground truth’.
Segmentation accuracy with root-mean-squared-error (RMSE) and volume difference ( δ V ) on pre and post data for both submitted algorithms (IC to UTB) and fixed-models
Non-scar enhancing structures
Image quality on segmentation
The LGE CMR images included in this challenge were acquired at three imaging centres with differing protocols and scanners (see Table2). The quality of enhancement is known to vary and this variation across the imaging centres was quantified. Further the LGE CMR images were qualitatively classified based on their quality and the algorithms evaluated accordingly.
Analysis of segmentation accuracy based on image quality (good, average and poor) on post-scans
We presented a standardised evaluation framework, accessible via a web-based interface, that allows the effective comparison of scar segmentation algorithms in the LA for pre- and post-ablation fibrosis and scar. The framework has been used to compare eight algorithms as part of the cDEMRIS challenge, a workshop organised at ISBI in 2012. The data is publicly available via the website:http://www.isd.kcl.ac.uk/cdemris/.
The usefulness and effectiveness of an evaluation framework is important. The evaluation framework presented in this work comprised thirty pre-ablation and thirty post-ablation image database from three separate imaging centres (KCL-IM, Utah and BIDMC) acquired using scanners of two different vendors (Siemens Healthcare and Philips Healthcare). Further, images differed in slice-thickness (1.25 - 2.0 mm reconstructed) and acquisition time-point (1-7 days for pre- and 30 - 180 days for post-ablation). This ensured that algorithms would not be biased towards a specific acquisition protocol. The selection of images for the framework was not random. They were carefully chosen to include images that exhibited artefacts (navigator, aortic wall, valve fibrosis), poor contrast-noise ratio and poor enhancement. Thus the presented framework provides a wide spectrum of data suitable for testing algorithms.
Two reference standards are established within the framework: the algorithms were tested against consensus segmentations of multiple observers and established techniques n-SD and FWHM. The task of creating a reference standard from multiple observers is complex and tedious. The observers were provided with set guidelines. Although, their delineations were approximately consistent, some differences remained. It was thus important to merge the segmentations with STAPLE. For instance in images with poor contrast enhancement ratio, observers may differ in their opinion of the level of enhancement that is likely to be scar. When generating consensus segmentations, such disagreement problems are solved by establishing some common ground.
The second reference standard of obtaining locations of enhanced regions with fixed models, n-SD and FWHM methods, was performed by fixed thresholding on the atrial wall. The wall was approximated by dilating the endocardial LA segmentation by three pixels. Both the SD and FWHM require a region of normal myocardium and results can vary with a different selection. The region within normal myocardium was thus carefully selected to exclude any enhanced pixels. The FWHM was implemented as described in with manual selection of an enhanced region and 50% of the maximum intensity in this region used as a threshold. In some rare instances the 50% cut-off was re-adjusted. Note region-growing was used to obtain the final segmentation result and this ensured pixel connectivity and coherence in the result.
A range of different metrics for measuring algorithm performance were explored. The Dice metric was selected for measuring volumetric overlap. It was computed regionally on carefully selected enhanced areas where the consensus segmentation was in agreement for scar or not scar (i.e. artefact). A surface metric was also selected for measuring the amount of overlap in segmentations. All segmentations were projected onto their LA surfaces and the cumulative Euclidean distance between the corresponding scar labels on the surface was represented as RMSE error. Furthermore, a third measure looked at computing the difference of fibrosis/scar volumes in segmentations. This assessed the quantifiable infarct reported by each method.
Segmentation of scar from LGE CMR images poses various challenges and thus an overlap assessment is not alone sufficient. To detect which false positives and negatives are more prevalent, regional assessments of aortic wall and navigator beam artefacts were provided. Regions containing these artefacts were carefully chosen and an overlap assessment was made for each method. This highlights how algorithms fare with regularly enhancing features of LGE CMR images. Further, the framework provided a grading for each post-ablation image in its database. Algorithms can select images of a specific quality when using the framework through the web-based interface.
A limitation of the framework is the size of the image database. It is sufficient for most purposes, for instance assessing an algorithm initially against different protocols and acquisition parameters. The website hosting the image database is scalable and can easily be scaled to include additional images when they become available. A second limitation is the performance metrics. Dice is known to be highly sensitive to mismatch of small structures and thus can disproportionately penalise algorithms in some instances. The surface based metric (i.e. RMSE) also has an important limitation; images with a large amount of false-positive scar detected yield a very low RMSE error. This is because there are false positive points in the vicinity of most surface points labelled by raters as scar making the distance error small. However, this limitation can be overcome if the surface measure is combined and read with the volume difference measure. This gives a truer picture of the segmentation.
Enhancement normalisation models adopted (if any) in each method
All the methods outperformed the FWHM and n-SD methods in our evaluation. There was also significant improvement offered in some: pre-ablation (YL vs. 4-SD, paired t-test: p < 0.05) and post-ablation (KCL vs. 2, 4, 6-SD, paired t-test: p < 0.05). This suggests that a fixed model for scar is not a viable solution and improvements can be made. There is further evidence for this as evaluated methods YL and UTA using simple thresholding find it necessary to adjust thresholds for each slice and achieve significant improvements over fixed models (paired t-test p < 0.05).
Segmentation of LA myocardial wall is an important step before segmenting scar. The LA wall is much more thin and flexible than that of the ventricle. It is known to be 2.5 mm in thickness. Also in areas of no contrast the LA wall is impossible to visualise and thus can only be approximated. In the evaluated algorithms, there were several that used a fixed distance from the endocardial LA border (IC, MV, SY, HB, KCL) of which two (IC, HB) computed this distance directly using an Euclidean distance measure and the rest (MV, SY, KCL) used morphological dilation. However, there were three methods (YL, UTA, UTB) that used a manual delineation of the wall. From the artefact analysis of Figure8 it is also YL, UTA and UTB that have the least amount of aortic wall and navigator artefacts. The aortic wall problem is very minimal in YL, UTA and UTB, whilst there is yet some navigator beam artefact. This is suggestive of the fact that a good LA wall segmentation can counteract to a great extent the aortic wall problem but also overall improves LGE CMR segmentation.
Pre-ablation enhancement that is likely to be due to fibrosis is more challenging to detect than post-ablation enhancement due to scar. One reason is fibrosis appears more diffuse with greater overlap with normal myocardium. Algorithms IC, YL, UTA and UTB only show reasonable overlap (Dice, RMSE and |δ V|), with YL’s results available on a smaller cohort (10 out of 30) and both YL and UTA requiring significantly longer processing times than the rest. Fixed models (4-SD an FWHM) fare poorly in comparison. This comes as no surprise as with greater overlap of intensities for normal myocardium and fibrosis in pre-ablation, a fixed model is bound to fail. Even with an optimal separation between the distributions computed, further processing is needed and this is included in IC (pixel connectivity) and SY (contextual information) algorithms. Others have similar processing steps but were developed primarily for post-ablation enhancement and thus has a bias (MV and KCL). In post-ablation enhancement, most evaluated algorithms demonstrated that good segmentation is possible. This is true in the case of automated (IC, SY, MV, HB, KCL, UTB) and semi-automated ones (YL, UTA). Fixed models had lesser accuracy with a difference of at least 10 points on the Dice compared with some methods (MV, KCL, YL), but their performance was better compared to performance in pre-ablation.
The aim of this work is to provide a standardised methodology and framework for evaluating state-of-the-art algorithms that was made available to the wider community through a web-based interface. The framework has potential that upcoming state-of-the-art algorithms can utilise it to evaluate their performance. That would enable algorithms to be benchmarked against other algorithms. Eight different algorithms were evaluated with the proposed framework, three of which are published or slightly modified versions of published techniques ([5, 15, 18]). This gives the framework some standing and acceptability and gives future algorithms a sensible ground for testing. Also to our knowledge, this is the first proposed framework of its kind for testing LGE CMR algorithms.
CMR continues to play an increasingly important role for quantifying LA fibrosis and scar before and after an ablation procedures for AF. LGE CMR is a challenging imaging technique with variation often seen in image and enhancement quality. Currently, algorithms have only been tested on centre- and vendor-specific images. Their suitability and performance in images from other centres or vendors is not very clear. Also, algorithms cannot be tested on the same datasets and thus they cannot be cross-compared. The proposed framework evaluated 8 different algorithms and measured their performance on a common scale. Reference standards for evaluation were established. Following evaluation, no algorithm was deemed clearly better than the others. This leaves scope to push for further algorithmic developments in LA fibrosis and scar imaging. Benchmarking of future scar segmentation algorithms is important. The proposed framework remains publicly available for accessing the image database, uploading algorithm segmentations for evaluation and contributing manual segmentations for improving the reference standard.
The author acknowledges support from the King’s College London Centre of Excellence in Medical Engineering funded by the Wellcome Trust and EPSRC (WT 088641/Z/09/Z). This research was also supported by the National Institute for Health Research (NIHR) Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
- Tobon-Gomez C, Craene MD, McLeod K, Tautz L, Shi W, Hennemuth A, Prakosa A, Wang H, Carr-White G, Kapetanakis S, Lutz A, Rasche V, Schaeffter T, Butakoff C, Friman O, Mansi T, Sermesant M, Zhuang X, Ourselin S, Peitgen HO, Pennec X, Razavi R, Rueckert D, Frangi A, Rhode K: Benchmarking framework for myocardial tracking and deformation algorithms: an open access database. Med Image Anal. 2013, 17 (6): 632-648. 10.1016/j.media.2013.03.008.View ArticlePubMedGoogle Scholar
- Kirisli H, Schaap M, Metz C, Dharampall A, Meijboom W, Papadopoulou S, Dedic A, Nieman K, deGraaf M, Meijs M, Cramer M, Broersen A, Cetin S, Eslami A, Flórez-Valencia L, Lor K, Matuszewski B, Melki I, Mohr B, Öksüz I, Shahzad R, Wang C, Kitslaar P, Unal G, Katouzian A, Orkisz M, Chen C, Precioso F, Najman L, Masood S, et al: Standardized evaluation framework for evaluating coronary artery stenosis detection, stenosis quantification and lumen segmentation algorithms in Computed Tomography Angiography. Med Image Anal. 2013, 8 (17): 856-876.Google Scholar
- Arujuna A, Karim R, Caulfield D, Knowles B, Rhode K, Schaeffter T, Kato B, Rinaldi CA, Cooklin M, Razavi R, et al: Acute pulmonary vein isolation is achieved by a combination of reversible and irreversible atrial injury after catheter ablation: clinical perspective evidence from magnetic Resonance imaging. Circ Arrhythm Electrophysiol. 2012, 5 (4): 691-700. 10.1161/CIRCEP.111.966523.View ArticlePubMedGoogle Scholar
- Knowles B, Caulfield D, Cooklin M, Rinaldi C, Gill J, Bostock J, Razavi R, Schaeffter T, Rhode K: 3-D visualization of acute RF ablation lesions using MRI for the simultaneous determination of the patterns of necrosis and edema. Biomed Engineer, IEEE Trans. 2010, 57 (6): 1467-1475.View ArticleGoogle Scholar
- Oakes R, Badger T, Kholmovski E, Akoum N, Burgon N, Fish E, Blauer J, Rao S, DiBella E, Segerson N, et al: Detection and quantification of left atrial structural remodeling with delayed-enhancement magnetic Resonance imaging in patients with atrial fibrillation. Circulation. 2009, 119 (13): 1758-1767. 10.1161/CIRCULATIONAHA.108.811877.PubMed CentralView ArticlePubMedGoogle Scholar
- Mahnkopf C, Badger TJ, Burgon NS, Daccarett M, Haslam TS, Badger CT, McGann CJ, Akoum N, Kholmovski E, Macleod RS, et al: Evaluation of the left atrial substrate in patients with lone atrial fibrillation using delayed-enhanced MRI: implications for disease progression and response to catheter ablation. Heart Rhythm. 2010, 7 (10): 1475-1481. 10.1016/j.hrthm.2010.06.030.PubMed CentralView ArticlePubMedGoogle Scholar
- Badger TJ, Daccarett M, Akoum NW, Adjei-Poku YA, Burgon NS, Haslam TS, Kalvaitis S, Kuppahally S, Vergara G, McMullen L, et al: Evaluation of left atrial lesions after initial and repeat atrial fibrillation ablation lessons learned from delayed-enhancement MRI in repeat ablation procedures. Circulation: Arrhythmia Electrophysiol. 2010, 3 (3): 249-259. 10.1161/CIRCEP.109.868356.Google Scholar
- Peters D, Wylie J, Hauser T, Kissinger K, Botnar R, Essebag V, Josephson M, Manning W: Detection of pulmonary vein and left atrial scar after catheter ablation with three-dimensional navigator-gated delayed enhancement MR imaging: initial experience1. Radiol. 2007, 243 (3): 690-10.1148/radiol.2433060417.View ArticleGoogle Scholar
- Kim R, Fieno D, Parrish T, Harris K, Chen E, Simonetti O, Bundy J, Finn J, Klocke F, Judd R: Relationship of MRI delayed contrast enhancement to irreversible injury, infarct age, and contractile function. Circulation. 1999, 100 (19): 1992-2002. 10.1161/01.CIR.100.19.1992.View ArticlePubMedGoogle Scholar
- Kolipaka A, Chatzimavroudis GP, White RD, O’Donnell TP, Setser RM: Segmentation of non-viable myocardium in delayed enhancement magnetic resonance images. Int J Cardiovasc Imaging. 2005, 21 (2–3): 303-311.View ArticlePubMedGoogle Scholar
- Schmidt A, Azevedo C, Cheng A, Gupta S, Bluemke D, Foo T, Gerstenblith G, Weiss R, Marban E, Tomaselli G, et al: Infarct tissue heterogeneity by magnetic resonance imaging identifies enhanced cardiac arrhythmia susceptibility in patients with left ventricular dysfunction. Circulation. 2007, 115 (15): 2006-2014. 10.1161/CIRCULATIONAHA.106.653568.PubMed CentralView ArticlePubMedGoogle Scholar
- Amado L, Gerber B, Gupta S, Rettmann D, Szarf G, Schock R, Nasir K, Kraitchman D, Lima J: Accurate and objective infarct sizing by contrast-enhanced magnetic resonance imaging in a canine myocardial infarction model. J Am College Cardiol. 2004, 44 (12): 2383-2389. 10.1016/j.jacc.2004.09.020.View ArticleGoogle Scholar
- Yan A, Shayne A, Brown K, Gupta S, Chan C, Luu T, DiCarli M, Reynolds H, Stevenson W, Kwong R: Characterization of the peri-infarct zone by contrast-enhanced cardiac magnetic resonance imaging is a powerful predictor of post-myocardial infarction mortality. Circulation. 2006, 114: 32-10.1161/CIRCULATIONAHA.106.613414.View ArticlePubMedGoogle Scholar
- Positano V, Pingitore A, Giorgetti A, Favilli B, Santarelli MF, Landini L, Marzullo P, Lombardi M: A fast and effective method to assess myocardial necrosis by means of contrast magnetic resonance imaging. J Cardiovasc Magnet Res. 2005, 7 (2): 487-494. 10.1081/JCMR-200053630.View ArticleGoogle Scholar
- Hennemuth A, Seeger A, Friman O, Miller S, Klumpp B, Oeltze S, Peitgen HO: A comprehensive approach to the analysis of contrast enhanced cardiac MR images. Med Imaging, IEEE Trans. 2008, 27 (11): 1592-1610.View ArticleGoogle Scholar
- Detsky J, Paul G, Dick A, Wright G: Reproducible classification of infarct heterogeneity using fuzzy clustering on multicontrast delayed Enhancement Magnetic Resonance Images. Med Imaging, IEEE Trans. 2009, 28 (10): 1606-1614.View ArticleGoogle Scholar
- Tao Q, Milles J, Zeppenfeld K, Lamb HJ, Bax JJ, Reiber JH, van der Geest RJ: Automated segmentation of myocardial scar in late enhancement MRI using combined intensity and spatial information. Magn Reson Med. 2010, 64 (2): 586-594.PubMedGoogle Scholar
- Lu Y, Yang Y, Connelly KA, Wright GA, Radau PE: Automated quantification of myocardial infarction using graph cuts on contrast delayed enhanced magnetic resonance images. Quant Imaging Med Surg. 2012, 2 (2): 81-PubMed CentralPubMedGoogle Scholar
- Canny J: A computational approach to edge detection. Pattern Anal Mach Intell, IEEE Trans. 1986, 8 (6): 679-698.View ArticleGoogle Scholar
- Goldberg AV, Tarjan RE: A new approach to the maximum-flow problem. J ACM (JACM). 1988, 35 (4): 921-940. 10.1145/48014.61051.View ArticleGoogle Scholar
- Ford LR, Fulkerson DR: Maximal flow through a network. Can J Math. 1956, 8 (3): 399-404.View ArticleGoogle Scholar
- Boykov Y, Veksler O, Zabih R: Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell. 2001, 23 (11): 1222-1239. 10.1109/34.969114.View ArticleGoogle Scholar
- Boykov Y, Funka-Lea G: Graph cuts and efficient nd image segmentation. Int J Comput Vis. 2006, 70 (2): 109-131. 10.1007/s11263-006-7934-5.View ArticleGoogle Scholar
- Dunn JC: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet. 1973, 3 (3): 32-57. 10.1080/01969727308546046.View ArticleGoogle Scholar
- Gao Y, Gholami B, MacLeod RS, Blauer J, Haddad WM, Tannenbauma AR: Segmentation of the endocardial wall of the left atrium using local region-based active contours and statistical shape learning. Proc SPIE. 2010, 7623: 76234Z1-623418. 10.1117/12.844281.View ArticleGoogle Scholar
- Kass M, Witkin A, Terzopoulos D: Snakes: active contour models. Int J Comp vis. 1988, 1 (4): 321-331. 10.1007/BF00133570.View ArticleGoogle Scholar
- Dempster A, Laird N, Rubin D: Maximum Likelihood from Incomplete data via the EM Algorithm. J R Stat Soc, Ser B (Methodological). 1977, 39: 1-38.Google Scholar
- Karim R, Arujuna A, Brazier A, Gill J, Rinaldi CA, O’Neill M, Razavi R, Schaeffter T, Rueckert D, Rhode KS: Automatic segmentation of left atrial scar from delayed-enhancement magnetic resonance imaging. International Workshop on Functional Imaging and Modeling of the Heart. 2011, Springer, 63-70.View ArticleGoogle Scholar
- Mitchell TM: Machine Learning, Vol. 45. 1997, McGraw Hill: Burr RidgeGoogle Scholar
- Haralick RM, Shanmugam K, Dinstein IH: Textural features for image classification. Syst., Man Cybern, IEEE Trans. 1973, 3 (6): 610-621.View ArticleGoogle Scholar
- Warfield S, Zou K, Wells W: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. Med Imaging, IEEE Trans. 2004, 23 (7): 903-921. 10.1109/TMI.2004.828354.View ArticleGoogle Scholar
- Kramer C, Barkhausen J, Flamm S, et al: Standardized CMR protocols, society for cardiovascular magnetic resonance: board of trustees task force on standardized protocols. J Cardiovasc Magn Reson. 2008, 10: 35-10.1186/1532-429X-10-35.PubMed CentralView ArticlePubMedGoogle Scholar
- Flett AS, Hasleton J, Cook C, Hausenloy D, Quarta G, Ariti C, Muthurangu V, Moon JC: Evaluation of techniques for the quantification of myocardial scar of differing etiology using cardiac magnetic resonance. JACC: Cardiovasc Imaging. 2011, 4 (2): 150-156. 10.1016/j.jcmg.2010.11.015.Google Scholar
- Lorensen WE, Cline HE: Marching cubes: a high resolution 3D surface construction algorithm. ACM Siggraph Computer Graphics, Volume 21. 1987, ACM, 163-169.Google Scholar
- Hall B, Jeevanantham V, Simon R, Filippone J, Vorobiof G, Daubert J: Variation in left atrial transmural wall thickness at sites commonly targeted for ablation of atrial fibrillation. J Intervent Cardiac Electrophysiol. 2006, 17 (2): 127-132.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.