This article has Open Peer Review reports available.
Detection of leukocoria using a soft fusion of expert classifiers under non-clinical settings
- Pablo Rivas-Perea^{1}Email author,
- Erich Baker^{1},
- Greg Hamerly^{1} and
- Bryan F Shaw^{2}
https://doi.org/10.1186/1471-2415-14-110
© Rivas-Perea et al.; licensee BioMed Central Ltd. 2014
Received: 31 March 2014
Accepted: 21 August 2014
Published: 9 September 2014
Abstract
Background
Leukocoria is defined as a white reflection and its manifestation is symptomatic of several ocular pathologies, including retinoblastoma (Rb). Early detection of recurrent leukocoria is critical for improved patient outcomes and can be accomplished via the examination of recreational photography. To date, there exists a paucity of methods to automate leukocoria detection within such a dataset.
Methods
This research explores a novel classification scheme that uses fuzzy logic theory to combine a number of classifiers that are experts in performing multichannel detection of leukocoria from recreational photography. The proposed scheme extracts features aided by the discrete cosine transform and the Karhunen-Loeve transformation.
Results
The soft fusion of classifiers is significantly better than other methods of combining classifiers with p = 1.12 × 10^{-5}. The proposed methodology performs at a 92% accuracy rate, with an 89% true positive rate, and an 11% false positive rate. Furthermore, the results produced by our methodology exhibit the lowest average variance.
Conclusions
The proposed methodology overcomes non-ideal conditions of image acquisition, presenting a competent approach for the detection of leukocoria. Results suggest that recreational photography can be used in combination with the fusion of individual experts in multichannel classification and preprocessing tools such as the discrete cosine transform and the Karhunen-Loeve transformation.
Keywords
Leukocoria Retinoblastoma Fuzzy logic Soft computing Discrete cosine transform Karhunen-Loeve transformBackground
Leukocoria is an abnormal pupillary light reflex that is characterized by a persistent ‘white-eye’ phenomenon during visible light photography. It is often the primary observable diagnostic symptom for a range of catastrophic ocular disorders. In addition, leukocoria is a prevailing symptom of congenital cataracts, vitreoretinal disorders and malformations, retinopathy of prematurity, trauma-associated diseases, Coats’ disease, ocular toxocariasis, Norrie disease, ciliary melanoma, retrolental fibroplasia, and retinal hamartomas [1, 2], see [3] for a review. In children under the age of 5, however, the predominant cause of leukocoria is Rb [4, 5].
In the case of Rb, tumors in the eye can act as diffuse reflectors of visible light [6–9]. Consequently, leukocoria associated with Rb is a progressive symptom that occurs more frequently, during recreational photography, as the size and number of tumors increase [10]. The fact that it occurs in recreational photography opens the door to investigate a way to perform an automatic assessment of visual dysfunction [11]. Leukocoria is optically distinct from specular reflections of the cornea and can be detected with a low resolution digital camera, a camera phone equipped with or without a flash, or with a digital video recorder. In clinical settings, the "red reflex" test is adequate for the identification of tumor reflections when administered by trained clinicians, but may suffer from a high degree of false negatives when conducted under a wide range of conditions [12, 13]. This ineffectiveness of the "red-reflex" test is especially problematic in developing nations where there is a limited supply of properly trained specialists in ophthalmology or pediatrics. Even in developed nations, recent studies suggest that clinicians are either improperly trained for leukocoric screening, or do not perform the test [14]. Indeed, parents or relatives are generally the first individuals to detect leukocoria in a child, and their observation often initiates diagnosis [1, 4, 15–17]. For example, in a study of 1632 patients with Rb, the eventual diagnosis in ∼80% of cases was initiated by a relative who observed leukocoria in a photograph [4].
The consequences of a false negative finding can be profound, as the case of Rb illustrates. While it only comprises 3-4% of pediatric cancer, the incidence of Rb is high enough (i.e., ∼ 1-2:30,000 live births) to mandate universal screening [4, 13]. The median age of diagnosis is 24 months for unilateral disease and 9–12 months for bilateral disease [18, 19]. When detected early, Rb is curable, either by enucleation of the eye, or the use of ocular salvage treatments with chemotherapy and focal treatments or radiation therapy [20, 21]. Delays in diagnosis lead to increased rates of vision loss, need for therapy intensification (with its associated life-time toxicity) and death, particularly for children who live in resource-poor settings [7]. Compressing diagnostic time frames rely, in part, on improved methods for detecting intraocular tumors or their leukocoric presentation.
The autonomous and semi-autonomous analysis of diagnostic medical images, such as those mediated by computational biology and machine learning, are routinely used for the unsupervised and supervised prediction and prognosis of numerous pathologies and pathology outcomes, but have had limited application in areas of detection and diagnosis [22, 23]. In applications where machine learning has been applied to the discernment of disease based on image data (analogous to the observable detection of leukocoria in digital photographs), there has been significant success. These previous studies have employed a variety of soft computing techniques: support vector machines (SVMs), Bayesian statistical approaches and neural networks have been used to assist in the detection of breast cancer in mammograms [24], prostate cancer [25], lung cancer [26] and cervical cancers [27]. Of particular importance has been the successful use of neural networks for the detection of skin cancers, such as melanoma, where non-histological photographic digital images serve as the medium [28–31]. In each of these scenarios, however, studies have been applied to controlled environments where skilled technicians intentionally seek to classify disease states.
Methods
Ethics statement
This study was determined to be exempt from review by an Institutional Review Board at Baylor University. The parents of the study participants have given written informed consent to use and publish unaltered images of faces.
Database and feature extraction
First, the input image is cropped to contain only the M × N image of the circumference delimited by the iris. This process can be done either manually or automatically.
Secondly, the cropped M × N three-channel (RGB) image, denoted as I(n _{1},n _{2},n _{3}), where n _{1} ∈ {0,…,M - 1}, n _{2} ∈ {0,…,N - 1}, and n _{3} ∈ {0,1,2}, is separated into three different gray-scale images, I _{ R }(n _{1},n _{2}), I _{ G }(n _{1},n _{2}), and I _{ B }(n _{1},n _{2}).
where ${\mathcal{F}}^{-1}:{\mathbb{R}}^{M\times N}\mapsto {\mathbb{R}}^{M\times N}$ and α(·) is also computed with (2).
Fourth, each image $\widehat{\mathbf{I}}$ is then down-sampled or up-sampled to a fixed size of 32 × 32. The selection of this particular size was determined experimentally, training several classifiers using different image sizes and choosing the size that produced the smallest classification error in the average case, which was 32 × 32. Note that this is a very small resolution compared to the natural resolution of recreational photographs.
Fifth, we z-score (subtract the mean and divide by the standard deviation) for each channel. The purpose is to have a dataset approximating a $\mathcal{N}(0,1)$ distribution at each channel. That is, having a dataset that follows a normal distribution with zero mean and unit variance at each channel. In order to determine the mean and standard deviation for z-scoring we only make use of all images available for training, i.e., the training dataset. Images in the testing dataset will require the estimated mean and standard deviation estimated for the training dataset. We define $\stackrel{~}{\mathbf{I}}$ as the image $\widehat{\mathbf{I}}$ that has been processed by up-sampling or down-sampling, subtraction of a mean image, and division by a standard deviation.
Finally, the Karhunen-Loeve Transform (KLT) is applied to the data using only the two eigenvectors whose corresponding eigenvalues are the largest of all [42, 43]. This procedure is analog to dimensionality reduction using Principal Component Analysis (PCA). Experimental research determined that the minimum number of eigenvectors that can be used without loss of generalization is two. We define x _{ i } as a two-row vector defining the i-th eye image transformed using the KLT; that is, $\mathbf{x}=\mathcal{T}\{\stackrel{~}{\mathbf{I}}\}$, where $\mathcal{T}\{\xb7\}$ denotes the KLT. Therefore, the transformed training set per each individual channel is defined as $\mathcal{D}={\{{\mathbf{x}}_{i},{d}_{i}\}}_{i=1}^{N}$, where ${\mathbf{x}}_{i}\in {\mathbb{R}}^{2}$, d _{ i } ∈ {-1,1} is the desired target class corresponding to the i-th vector (indicating normal or leukocoric), and N indicates the total number of training samples. Then, the training set is used in the design of classifiers, which is explained in the next section.
Classification architecture
The proposed classification scheme involves the fusion of different classifiers that are known to perform well individually. The purpose of the fusion is to achieve better performance than with individual classifiers [44]. The fusion of classifiers is also known as "combination of multiple classifiers" [45], "mixture of experts" [46], or "consensus aggregation" [47]. This paper uses fuzzy logic to combine different classifiers using the method proposed in [33, 34]. A fuzzy integral conceptualizes the idea of the method along with Sugeno’s g _{ λ }-fuzzy measure [48]. The different classifier performances define the importance that the fusion method will give to each classifier. We propose having nine different classifiers per channel, as shown in Figure 3. The total number of classifiers is 27. We perform the analysis of each channel aiming to observe which channel performs better and to determine its contribution to correct classification in further studies. A final class is given considering each classifier’s output at each channel. The following paragraphs explain the fusion methodology.
Soft fusion of classifiers
which is known as the g _{ λ }-fuzzy measure, for some λ > -1, all $\mathcal{A},\mathcal{B}\subset \mathbf{x}$, and $\mathcal{A}\cap \mathcal{B}=\varnothing $.
where ${\mathcal{C}}_{t}=\left\{y|h(y)\ge t\right\}$. The equality in Equation 5 defines the agreement between the expectation and the evidence.
where λ ∈ (-1,+∞) and λ ≠ 0. However, in order to solve the polynomial, we need to estimate the densities g ^{ i } (i.e., "the expectation"). The i-th density g ^{ i } defines the degree of importance the i-th classifier y _{ i } has in the final classification. This densities can be estimated by an expert, or defined using a training dataset. In this research we defined the densities using the performance obtained from the data, and the process of experimentation will be explained later. In the following subsection we discuss briefly the classifiers used in this research.
Selection of classifiers
Number of hidden neurons for each channel
Channel | ANN_{1} | ANN_{2} | ANN_{3} |
---|---|---|---|
Red | 2 | 20 | 50 |
Green | 3 | 10 | 15 |
Blue | 2 | 3 | 5 |
E.g., consider the third row of Table 1; for the blue channel, the best three architectures were those with two, three, and five neurons in the hidden layer; in contrast, the red channel exhibited the lowest errors using two, 20, and 50 neurons in the hidden layer. Intuitively, one can conclude that the training data for both green and blue channels is much simpler to classify than the data for the red channel.
Next, the SVM-based classifiers in this research are, by necessity, of the soft margin kind since the dataset has two non-linearly separable classes [50]. This research uses four SVMs; each has a different type of kernel function. The four SVM kernel functions are: 1) linear, 2) quadratic, 3) polynomial, and 4) radial basis function (RBF).
An SVM with linear kernel is the simplest form of a soft margin SVM; in practice it only performs a dot product, leaving the data in the input space. SVMs with a quadratic kernel are a particular case of a polynomial kernel of second degree. An RBF kernel is a preferred choice in research that offers little or no information about the dataset properties. SVMs can be very powerful, but its effectiveness, however, is tied up to an appropriate selection of its model parameters, a.k.a. hyper-parameters [51]. The traditional soft-margin SVM requires a hyper-parameter usually known as "regularization" parameter, C, that penalizes data-points incorrectly classified. Then, depending on the kernel choice, SVMs may have additional hyper-parameters; e.g., the polynomial kernel requires a parameter p that defines the degree of the polynomial while the RBF kernel requires the parameter τ which controls the wideness in an exponential Gaussian-like function.
Kernel choice and parameters used with SVMs
Kernel K(x_{ i },x_{ j }) = | ||||
---|---|---|---|---|
x_{ i } | ${\left({\mathbf{x}}_{\mathit{i}}^{\mathit{T}}{\mathbf{x}}_{\mathit{j}}\mathbf{+}\mathbf{1}\right)}^{\mathit{p}}$ | ${\mathit{e}}^{\mathbf{-}\frac{\mathbf{1}}{\mathbf{2}{\mathit{\tau}}^{\mathbf{2}}}{||{\mathbf{x}}_{\mathit{i}}\mathbf{-}{\mathbf{x}}_{\mathit{j}}||}_{\mathbf{2}}^{\mathbf{2}}}$ | ||
Channel | Linear | Quad. | Poly. | RBF |
p = 2 | p = 3 | ( C , τ ) | ||
Red | C = 7 | C = 4 | C = 0.5 | (9, 0.5) |
Green | C = 3 | C = 2 | C = 2 | (33, 2) |
Blue | C = 2 | C = 1 | C = 2 | (0.13, 0.5) |
The last choice of classifiers are based on discriminant analysis. Both Linear Discriminant Analysis (LDA) [53] and Quadratic Discriminant Analysis (QDA) [54] are closely related and are well known in the community for their simplicity and the robustness provided by statistical properties of the data. QDA and LDA achieve optimal results, in terms of probability theory, when the data in each class follows a Gaussian distribution independent and identically distributed (IID). Since this research uses the KLT, the data is close to being IID; however, the data is not actually IID, as in most real-life applications such as this research. LDA and QDA require no parameters except for the mean and covariance matrix estimates for each channel; these are computed from the training set . The experiments performed while training the classifiers and the soft fusion are discussed next.
Experimental design
The soft fusion of i classifiers for detecting leukocoria requires an estimation of each classifier’s importance, i.e., the i-th density g ^{ i }. This research defined each classifier’s importance based on their individual performances using several different performance metrics and averaging the ranking in each individual metric. This section describes the experimental process of evaluating each classifier and the final value for g ^{ i } density corresponding to the i-th classifier.
Cross-validation
The whole database of eye images contains 144 examples. We divided the database into 10 groups of approximately equal size in order to use the well-known K-fold cross validation (CV) technique. Cross validation helps the researcher get an estimate of true classification performances [55]. This research uses 10-fold CV (K = 10) in order to determine the true importance of each classifier.
The database is divided in 10 groups of 14.4 data points in the average case. The methodology selects which points belong to each group randomly. Nine out of the 10 groups follow the pre-processing and feature extraction procedure explained earlier. Then the set of nine groups with its corresponding target classes d _{ i } is defined as the training dataset $\mathcal{D}={\{{\mathbf{x}}_{i},{d}_{i}\}}_{i=1}^{N}$, where ${\mathbf{x}}_{i}\in {\mathbb{R}}^{2}$, d _{ i } ∈ {-1,1}. Then, the 10th group (the one not used for training) is used as the testing set $\mathcal{K}={\{{\mathbf{x}}_{j},{d}_{j}\}}_{j=1}^{M}$, where N + M = 144. The process is repeated 10 times selecting a different combination of nine groups each time leaving the 10th out for testing. Finally, the performances obtained with each testing set are averaged. We ran 10-fold CV 100 times in order to have more meaningful results, averaging each instance of 100 CVs. This process reduces the uncertainty that the CV method will choose nearly the same sets of data for the 10 groups. The following paragraph explains the performance metrics used to rank the classifiers.
Performance metrics
where σ is the standard deviation of y _{ i }.
from where it is desired that both |μ _{ ε }|,σ _{ ε } → 0 as M → ∞.
On the other hand, some standard performance metrics for binary classification employ the well known confusion matrix. For binary classification, four possible prediction outcomes exist. A correct prediction is either a True Positive (TP) or a True Negative (TN), while an incorrect prediction is either a False Positive (FP) or a False Negative (FN). Here ‘Positive’ and ‘Negative’ correspond to the predicted label of the example.
Note that in the literature, one might also find the above measures with different names; e.g., TPR is also known as Sensitivity, SPC is also known as TN rate, PPV is also known as Precision, and the F _{1}-Score is also known as the F-Measure.
In the literature, one can find other typical performance metric based on the area under Receiver Operating Characteristics (ROC) curve [56]. The area under the ROC curve, abbreviated AUC, provides a basis for judging whether a classifier performs realistically better than others in terms of the relationship between its TPR and FPR.
The last performance metric we use is the Cohen’s kappa measure κ. The κ measure scores the number of correct classifications independently for each class and aggregates them [57]. This way of scoring is less sensitive to randomness caused by a different number of examples in each class, therefore, it is less sensitive to class bias in training data.
Performance metrics and their desired outcome
Metric | Interval or domain | Desired |
---|---|---|
RMSE | ${\mathbb{R}}^{+}$ | The smallest value. |
NRMSE | ${\mathbb{R}}^{+}$ | The smallest value. |
|μ _{ ε }| | ${\mathbb{R}}^{+}$ | The smallest value. |
σ _{ ε } | ${\mathbb{R}}^{+}$ | The smallest value. |
ACC | [0,1] | One. |
TPR | [0,1] | One. |
FPR | [0,1] | Zero. |
SPC | [0,1] | One. |
PPV | [0,1] | One. |
NPV | [0,1] | One. |
FDR | [0,1] | Zero. |
MCC | [-1,1] | One. |
F _{1}-Score | [0,1] | One. |
BER | [0,1] | Zero. |
AUC | [0,1] | One. |
κ | [0,1] | One. |
Results
Rank of red channel classifiers by performance analysis
ANN_{1} | ANN_{2} | ANN_{3} | DA_{1} | DA_{2} | SVM_{1} | SVM_{2} | SVM_{3} | SVM_{4} | |
---|---|---|---|---|---|---|---|---|---|
RMSE | 1.180 (8) | 1.172 (7) | 1.221 (9) | 1.097 (1) | 1.146 (6) | 1.103 (3) | 1.144 (5) | 1.124 (4) | 1.100 (2) |
NRMSE | 1.214 (8) | 1.206 (7) | 1.257 (9) | 1.129 (1) | 1.179 (6) | 1.136 (3) | 1.177 (5) | 1.157 (4) | 1.133 (2) |
|μ _{ ε }| | 0.136 (5) | 0.041 (3) | 0.010 (1) | 0.121 (4) | 0.163 (7) | 0.068 (2) | 0.298 (9) | 0.221 (8) | 0.158 (6) |
σ _{ ε } | 1.171 (7) | 1.173 (8) | 1.223 (9) | 1.094 (2) | 1.138 (6) | 1.105 (3) | 1.108 (5) | 1.106 (4) | 1.092 (1) |
ACC | 0.651 (8) | 0.656 (7) | 0.626 (9) | 0.699 (1) | 0.672 (6) | 0.696 (3) | 0.673 (5) | 0.684 (4) | 0.697 (2) |
TPR | 0.775 (1) | 0.741 (2) | 0.697 (5) | 0.711 (4) | 0.672 (7) | 0.729 (3) | 0.619 (9) | 0.659 (8) | 0.694 (6) |
FPR | 0.556 (9) | 0.486 (7) | 0.492 (8) | 0.320 (4) | 0.329 (5) | 0.360 (6) | 0.238 (1) | 0.274 (2) | 0.298 (3) |
SPC | 0.444 (9) | 0.514 (7) | 0.508 (8) | 0.680 (4) | 0.671 (5) | 0.640 (6) | 0.762 (1) | 0.726 (2) | 0.702 (3) |
PPV | 0.700 (9) | 0.718 (7) | 0.703 (8) | 0.787 (4) | 0.773 (5) | 0.771 (6) | 0.813 (1) | 0.800 (2) | 0.795 (3) |
NPV | 0.545 (7) | 0.544 (8) | 0.502 (9) | 0.585 (2) | 0.551 (5) | 0.586 (1) | 0.545 (6) | 0.560 (4) | 0.579 (3) |
FDR | 0.300 (9) | 0.282 (7) | 0.297 (8) | 0.213 (4) | 0.227 (5) | 0.229 (6) | 0.187 (1) | 0.200 (2) | 0.205 (3) |
MCC | 0.232 (8) | 0.259 (7) | 0.206 (9) | 0.381 (2) | 0.333 (6) | 0.363 (5) | 0.370 (4) | 0.372 (3) | 0.385 (1) |
F _{1} | 0.735 (4) | 0.729 (5) | 0.699 (9) | 0.747 (2) | 0.719 (7) | 0.750 (1) | 0.703 (8) | 0.722 (6) | 0.741 (3) |
BER | 0.390 (8) | 0.372 (7) | 0.397 (9) | 0.305 (2) | 0.329 (6) | 0.316 (5) | 0.309 (4) | 0.308 (3) | 0.302 (1) |
AUC | 0.610 (8) | 0.628 (7) | 0.603 (9) | 0.695 (2) | 0.671 (6) | 0.684 (5) | 0.691 (4) | 0.692 (3) | 0.698 (1) |
κ | 0.228 (8) | 0.258 (7) | 0.205 (9) | 0.378 (2) | 0.329 (6) | 0.362 (4) | 0.353 (5) | 0.363 (3) | 0.380 (1) |
Avg. | 7.29 | 6.47 | 8.06 | 2.47 | 5.88 | 3.82 | 4.59 | 3.88 | 2.53 |
Rank of green channel classifiers by performance analysis
ANN_{1} | ANN_{2} | ANN_{3} | DA_{1} | DA_{2} | SVM_{1} | SVM_{2} | SVM_{3} | SVM_{4} | |
---|---|---|---|---|---|---|---|---|---|
RMSE | 0.787 (4) | 0.791 (5) | 0.800 (7) | 0.780 (3) | 0.828 (8) | 0.796 (6) | 0.838 (9) | 0.706 (2) | 0.673 (1) |
NRMSE | 0.810 (4) | 0.814 (5) | 0.823 (7) | 0.802 (3) | 0.853 (8) | 0.819 (6) | 0.863 (9) | 0.727 (2) | 0.693 (1) |
|μ _{ ε }| | 0.030 (3) | 0.025 (2) | 0.028 (1) | 0.075 (5) | 0.059 (4) | 0.107 (8) | 0.137 (9) | 0.081 (7) | 0.078 (6) |
σ _{ ε } | 0.788 (4) | 0.792 (6) | 0.801 (7) | 0.779 (3) | 0.829 (8) | 0.791 (5) | 0.830 (9) | 0.704 (2) | 0.671 (1) |
ACC | 0.845 (4) | 0.843 (5) | 0.839 (7) | 0.848 (3) | 0.828 (8) | 0.842 (6) | 0.824 (9) | 0.875 (2) | 0.887 (1) |
TPR | 0.888 (1) | 0.884 (2) | 0.883 (3) | 0.848 (6) | 0.839 (7) | 0.831 (8) | 0.805 (9) | 0.868 (5) | 0.878 (4) |
FPR | 0.227 (8) | 0.226 (7) | 0.233 (9) | 0.153 (5) | 0.189 (6) | 0.140 (3) | 0.143 (4) | 0.113 (2) | 0.099 (1) |
SPC | 0.773 (8) | 0.774 (7) | 0.767 (9) | 0.847 (5) | 0.811 (6) | 0.860 (3) | 0.857 (4) | 0.887 (2) | 0.901 (1) |
PPV | 0.867 (8) | 0.868 (7) | 0.864 (9) | 0.903 (5) | 0.881 (6) | 0.908 (3) | 0.904 (4) | 0.928 (2) | 0.937 (1) |
NPV | 0.806 (2) | 0.802 (3) | 0.797 (5) | 0.770 (6) | 0.751 (8) | 0.753 (7) | 0.725 (9) | 0.801 (4) | 0.816 (1) |
FDR | 0.133 (8) | 0.132 (7) | 0.136 (9) | 0.097 (5) | 0.119 (6) | 0.092 (3) | 0.096 (4) | 0.072 (2) | 0.063 (1) |
MCC | 0.667 (5) | 0.664 (6) | 0.656 (7) | 0.684 (3) | 0.641 (9) | 0.676 (4) | 0.645 (8) | 0.742 (2) | 0.766 (1) |
F _{1} | 0.877 (3) | 0.876 (4) | 0.873 (6) | 0.875 (5) | 0.859 (8) | 0.868 (7) | 0.851 (9) | 0.897 (2) | 0.906 (1) |
BER | 0.170 (6) | 0.171 (7) | 0.175 (8) | 0.152 (3) | 0.175 (9) | 0.155 (4) | 0.169 (5) | 0.123 (2) | 0.110 (1) |
AUC | 0.830 (6) | 0.829 (7) | 0.825 (8) | 0.848 (3) | 0.825 (9) | 0.845 (4) | 0.831 (5) | 0.877 (2) | 0.890 (1) |
κ | 0.666 (5) | 0.663 (6) | 0.655 (7) | 0.682 (3) | 0.639 (8) | 0.672 (4) | 0.638 (9) | 0.739 (2) | 0.763 (1) |
Avg. | 4.88 | 5.35 | 6.82 | 4.06 | 7.41 | 5.12 | 7.29 | 2.59 | 1.47 |
Rank of blue channel classifiers by performance analysis
ANN_{1} | ANN_{2} | ANN_{3} | DA_{1} | DA_{2} | SVM_{1} | SVM_{2} | SVM_{3} | SVM_{4} | |
---|---|---|---|---|---|---|---|---|---|
RMSE | 0.863 (8) | 0.858 (7) | 0.851 (6) | 0.827 (4) | 0.866 (9) | 0.803 (3) | 0.848 (5) | 0.791 (1) | 0.792 (2) |
NRMSE | 0.888 (8) | 0.883 (7) | 0.876 (6) | 0.851 (4) | 0.891 (9) | 0.826 (3) | 0.873 (5) | 0.814 (1) | 0.815 (2) |
|μ _{ ε }| | 0.063 (9) | 0.058 (7) | 0.063 (8) | 0.024 (2) | 0.029 (4) | 0.043 (6) | 0.029 (3) | 0.018 (1) | 0.036 (5) |
σ _{ ε } | 0.862 (8) | 0.858 (7) | 0.851 (6) | 0.830 (4) | 0.868 (9) | 0.805 (3) | 0.851 (5) | 0.793 (1) | 0.794 (2) |
ACC | 0.813 (8) | 0.816 (7) | 0.818 (6) | 0.829 (4) | 0.813 (9) | 0.839 (3) | 0.820 (5) | 0.844 (1) | 0.843 (2) |
TPR | 0.876 (2) | 0.876 (3) | 0.880 (1) | 0.853 (7) | 0.838 (9) | 0.854 (6) | 0.844 (8) | 0.868 (4) | 0.860 (5) |
FPR | 0.291 (9) | 0.284 (8) | 0.284 (7) | 0.212 (4) | 0.230 (6) | 0.186 (2) | 0.221 (5) | 0.197 (3) | 0.186 (1) |
SPC | 0.709 (9) | 0.716 (8) | 0.716 (7) | 0.788 (4) | 0.770 (6) | 0.814 (2) | 0.779 (5) | 0.803 (3) | 0.814 (1) |
PPV | 0.834 (9) | 0.838 (8) | 0.838 (7) | 0.870 (4) | 0.858 (6) | 0.884 (2) | 0.864 (5) | 0.880 (3) | 0.885 (1) |
NPV | 0.775 (5) | 0.776 (4) | 0.782 (2) | 0.763 (7) | 0.741 (9) | 0.770 (6) | 0.750 (8) | 0.784 (1) | 0.777 (3) |
FDR | 0.166 (9) | 0.162 (8) | 0.162 (7) | 0.130 (4) | 0.142 (6) | 0.116 (2) | 0.136 (5) | 0.120 (3) | 0.115 (1) |
MCC | 0.597 (9) | 0.602 (8) | 0.608 (6) | 0.638 (4) | 0.604 (7) | 0.661 (3) | 0.619 (5) | 0.668 (2) | 0.669 (1) |
F _{1} | 0.854 (8) | 0.856 (6) | 0.858 (5) | 0.862 (4) | 0.848 (9) | 0.869 (3) | 0.854 (7) | 0.874 (1) | 0.873 (2) |
BER | 0.208 (9) | 0.204 (8) | 0.202 (7) | 0.179 (4) | 0.196 (6) | 0.166 (3) | 0.188 (5) | 0.165 (2) | 0.163 (1) |
AUC | 0.792 (9) | 0.796 (8) | 0.798 (7) | 0.821 (4) | 0.804 (6) | 0.834 (3) | 0.812 (5) | 0.835 (2) | 0.837 (1) |
κ | 0.595 (9) | 0.600 (8) | 0.606 (6) | 0.637 (4) | 0.603 (7) | 0.660 (3) | 0.619 (5) | 0.668 (2) | 0.668 (1) |
Avg. | 8.00 | 7.00 | 5.88 | 4.24 | 7.41 | 3.29 | 5.35 | 1.88 | 1.94 |
From Table 4 we observe that for the red channel, the first three best ranked classifiers are LDA (DA _{1}), and SVM with RBF kernel (SVM _{4}), and SVM linear (SVM _{1}). Table 5 shows that for the green channel, SVM with RBF kernel, SVM with polynomial kernel of third degree (SVM _{3}), and LDA as the best ranked classifiers respectively. Similarly, Table 6 shows that for the blue channel, the SVM with polynomial kernel of third degree, SVM with RBF kernel, and SVM linear are the top three classifiers respectively.
Soft fusion classification and comparison
Finally, we can perform the soft fusion of classifiers using the densities found after performance analysis of the classifiers. Since the densities, g ^{ i }, are now known, we can use Equation 8 to determine the appropriate value for λ and then compute the g _{ λ }-fuzzy measure using Equation 7 that allows us to compute the fuzzy integral (Equation 6).
For comparison purposes we also use three of the most common combination methods: 1) Average, 2) Weighted Average, and 3) Majority. The Average method consists of averaging the classification of all classifiers and choosing the class closest to the average. However, the Weighted Average method takes into account the importance of each classifier as determined by the densities g ^{ i } and multiplies each classifier’s output by its corresponding importance; the products are added all together and the method decides for the class closest to the sum. In contrast, the majority method considers all classifiers equally relevant and takes a vote, deciding for class that agrees with the majority. Note that the Average and Majority methods produce the value for metrics based on classification error (such as Accuracy and TPR), but differ in metrics producing real values (such as RMSE). This is because the Average method uses real values output from the individual models, while the Majority method uses voting.
Performance analysis of different methods of classifier combination
Average | Weighted avg. | Majority | Soft fusion | |
---|---|---|---|---|
RMSE | 0.682 ± 0.021(3) | 0.674 ± 0.021(2) | 0.705 ± 0.030(4) | 0.652 ± 0.014(1) |
NRMSE | 0.702 ± 0.021(3) | 0.694 ± 0.021(2) | 0.725 ± 0.031(4) | 0.671 ± 0.014(1) |
|μ _{ ε }| | 0.058 ± 0.018(1) | 0.065 ± 0.016(2) | 0.071 ± 0.023(3) | 0.114 ± 0.008(4) |
σ _{ ε } | 0.682 ± 0.021(3) | 0.673 ± 0.021(2) | 0.703 ± 0.032(4) | 0.644 ± 0.015(1) |
ACC | 0.876 ± 0.011(3) | 0.876 ± 0.011(3) | 0.876 ± 0.011(3) | 0.881 ± 0.011(1) |
TPR | 0.872 ± 0.009(3) | 0.872 ± 0.009(3) | 0.872 ± 0.009(3) | 0.878 ± 0.008(1) |
FPR | 0.119 ± 0.026(3) | 0.119 ± 0.026(3) | 0.119 ± 0.026(3) | 0.114 ± 0.024(1) |
SPC | 0.881 ± 0.026(3) | 0.881 ± 0.026(3) | 0.881 ± 0.026(3) | 0.886 ± 0.024(1) |
PPV | 0.925 ± 0.015(3) | 0.925 ± 0.015(3) | 0.925 ± 0.015(3) | 0.928 ± 0.014(1) |
NPV | 0.805 ± 0.011(3) | 0.805 ± 0.011(3) | 0.805 ± 0.011(3) | 0.813 ± 0.011(1) |
FDR | 0.075 ± 0.015(3) | 0.075 ± 0.015(3) | 0.075 ± 0.015(3) | 0.072 ± 0.014(1) |
MCC | 0.742 ± 0.024(3) | 0.742 ± 0.024(3) | 0.742 ± 0.024(3) | 0.752 ± 0.023(1) |
F _{1} | 0.898 ± 0.008(3) | 0.898 ± 0.008(3) | 0.898 ± 0.008(3) | 0.902 ± 0.008(1) |
BER | 0.123 ± 0.013(3) | 0.123 ± 0.013(3) | 0.123 ± 0.013(3) | 0.118 ± 0.013(1) |
AUC | 0.891 ± 0.009(3) | 0.891 ± 0.009(2) | 0.877 ± 0.013(4) | 0.918 ± 0.007(1) |
κ | 0.739 ± 0.024(3) | 0.739 ± 0.024(3) | 0.739 ± 0.024(3) | 0.750 ± 0.023(1) |
Avg. SD | 0.0169 | 0.0168 | 0.0196 | 0.0141 |
Avg. Rank | 2.8824 | 2.6471 | 3.1176 | 1.3529 |
Discussion
Table 7 shows that the proposed classification scheme performs better than the other three methodologies in most cases. The soft fusion of classifiers produces results that have less variability in the average case, as shown in the second-to-last row.
The results in Tables 4, 5 and 6 clearly indicate that classifiers that use the green channel information perform better than those using blue or red channel information. Also, we can observe that the classifiers using red channel information perform the worst of all. Therefore, we can argue that the most discriminant information is carried over the green channel and the information in the red channel may be introducing noise to the soft fusion of classifiers. Considering this possibility we compare the results of the best classifiers that use the information of the green channel against the proposed scheme, i.e., SVM with RBF kernel from Table 5 against the soft fusion method in Table 7. In comparison we can notice that the proposed soft fusion of classifiers performs better only in terms of the RMSE, NRMSE, σ _{ ε }, and AUC. This means that the proposed scheme has better statistical stability, and that its relationship in terms of TPR and FPR demonstrates better performance. In all the remaining instances the SVM classifier with RBF kernel performs better than the soft fusion; arguably, because of the introduction of noise via red channel information.
We continued by performing the well known Friedman’s test and if the null-hypothesis were rejected we also performed the post-hoc Nemenyi’s test [58]. First, Friedman’s test determined that the results were statistically significant with p = 1.12 × 10^{-5} rejecting the null-hypothesis. The null-hypothesis being tested here is that the different approaches presented in the comparison of Table 7 perform the same, and that their performance differences are random. Then, since the null hypothesis was rejected it followed to perform the post hoc Nemenyi’s test. We determined the critical difference (CD) for comparing four methods of combining classifiers using 17 different performance metrics with a level of significance α = 0.05. The result is the following: $\text{CD}=2.569\sqrt{\frac{4\times 5}{6\times 17}}=1.1376$. Therefore, since the difference between the two best methods, i.e., Weighted Average and Soft Fusion, is greater than the CD, then we conclude that the Soft Fusion of classifiers performs significantly better than the other three methods in a statistical sense. That is, 2.6471 - 1.3529 = 1.2942 > 1.1376. Note that even when both the Soft Fusion and Weighted Average methods take the importance of each classifier into account, still the proposed classification scheme is significantly better.
Conclusions
The proposed classification scheme presented in this research uses a soft fusion of multichannel classifiers that are experts in detecting leukocoria in human eyes. These experts are trained with features extracted from RGB images preprocessed to overcome poor illumination and skin color variation using the DCT, statistical normalization of the images, and the KLT.
This research uses nine different classifiers per channel for a total of 27 experts. These include neural networks, linear discriminant classifiers, and support vector machines. The estimation of the fuzzy densities, a.k.a. importance of classifiers, was determined experimentally using cross-validation. The null-hypothesis was rejected and we demonstrated that the proposed classification scheme performs significantly better than the other approaches. Furthermore, it was shown that the green channel provides with more discriminant information than the other two.
While a soft fusion of classifiers is a good alternative in the detection of leukocoria in eyes of infants, it is just one part of a larger program to identify leukocoria in natural images. Other areas of research include eye localization (to improve detection), age discrimination (to reduce false positives on adult subjects), and alternative learning-based methods for leukocoria detection [59, 60].
Consent
Written informed consent was obtained from the patient’s parents for the publication of this report and any accompanying images.
Declarations
Acknowledgements
This work was supported in part by the National Council for Science and Technology (CONACyT), Mexico, under grant 193324/303732 provided to PRP, and a start-up fund provided to BFS by Baylor University.
Authors’ Affiliations
References
- Balmer A, Munier F: Leukocoria in the child: urgency and challenge. Klinische Monatsblatter Fur Augenheilkunde. 1999, 214 (5): 332-335. 10.1055/s-2008-1034807.PubMedGoogle Scholar
- Meire FM, Lafaut BA, Speleman F, Hanssens M: Isolated norrie disease in a female caused by a balanced translocation t(x,6). Ophthalmic Genet. 1998, 19 (4): 203-207. 10.1076/opge.19.4.203.2306.PubMedGoogle Scholar
- Meier P, Sterker I, Tegetmeyer H: Leucocoria in childhood. Klinische Monatsblatter Fur Augenheilkunde. 2006, 223 (6): 521-527. 10.1055/s-2005-859005.PubMedGoogle Scholar
- Abramson DH, Beaverson K, Sangani P, Vora RA, Lee TC, Hochberg HM, Kirszrot J, Ranjithan M: Screening for retinoblastoma: presenting signs as prognosticators of patient and ocular survival. Pediatrics. 2003, 112 (6 Pt 1): 1248-1255.PubMedGoogle Scholar
- Phan I. T, Stout T: Retinoblastoma presenting as strabismus and leukocoria. J Patient Saf. 2010, 157 (5): 858-Google Scholar
- Poulaki V, Mukai S: Retinoblastoma: genetics and pathology. Int Ophthalmol Clin. 2009, 49 (1): 155-164. 10.1097/IIO.0b013e3181924bc2.PubMedGoogle Scholar
- Rodriguez-Galindo C, Wilson MW, Chantada G, Fu L, Qaddoumi I, Antoneli C, Leal-Leal C, Sharma T, Barnoya M, Epelman S, Pizzarello L, Kane JR, Barfield R, Merchant TE, Robison LL, Murphree AL, Chevez-Barrios P, Dyer MA, O’Brien J, Ribeiro RC, Hungerford J, Helveston EM, Haik BG, Wilimas J: Retinoblastoma: one world, one vision. Pediatrics. 2008, 122 (3): 763-770. 10.1542/peds.2008-0518.Google Scholar
- Melamud A, Palekar R, Singh A: Retinoblastoma. Am Fam Physician. 2006, 73 (6): 1039-1044.PubMedGoogle Scholar
- Houston SK, Murray TG, Wolfe SQ, Fernandes CE: Current update on retinoblastoma. Int Ophthalmol Clin. 2011, 51 (1): 77-91. 10.1097/IIO.0b013e3182010f29.PubMedPubMed CentralGoogle Scholar
- Abdolvahabi A, Taylor BW, Holden RL, Shaw EV, Kentsis A, Rodriguez-Galindo C, Mukai S, Shaw BF: Colorimetric and longitudinal analysis of leukocoria in recreational photographs of children with retinoblastoma. PloS one. 2013, 8 (10): 76677-10.1371/journal.pone.0076677. doi:10.1371/journal.pone.0076677Google Scholar
- Singman EL: Automating the assessment of visual dysfunction after traumatic brain injury. Med Instrum. 2013, 1 (1): 3-10.7243/2052-6962-1-3.Google Scholar
- Khan AO, Al-Mesfer S: Lack of efficacy of dilated screening for retinoblastoma. J Pediatr Ophthalmol Strabismus. 2005, 42 (4): 205-102334.PubMedGoogle Scholar
- Li J, Coats DK, Fung D, Smith EO, Paysse E: The detection of simulated retinoblastoma by using red-reflex testing. Pediatrics. 2010, 126 (1): 202-207. 10.1542/peds.2009-0882.Google Scholar
- Marcou V, Vacherot B, El-Ayoubi M, Lescure S, Moriette G: [abnormal ocular findings in the nursery and in the first few weeks of life: a mandatory, yet difficult and neglected screening]. Arch Pediatr. 2009, 16 (Suppl 1): 38-41.Google Scholar
- Balmer A, Munier F: Differential diagnosis of leukocoria and strabismus, first presenting signs of retinoblastoma. Clin Ophthalmol. 2007, 1 (4): 431-439.PubMedPubMed CentralGoogle Scholar
- Wallach M, Balmer A, Munier F, Houghton S, Pampallona S, von der Weid N, Beck-Popovic M: Shorter time to diagnosis and improved stage at presentation in swiss patients with retinoblastoma treated from 1963 to 2004. Pediatrics. 2006, 118 (5): 1493-1498. 10.1542/peds.2006-0784.Google Scholar
- Imhof SM, Moll AC, Schouten-van Meeteren AY: Stage of presentation and visual outcome of patients screened for familial retinoblastoma: nationwide registration in the netherlands. Br J Ophthalmol. 2006, 90 (7): 875-878. 10.1136/bjo.2005.089375.PubMedPubMed CentralGoogle Scholar
- Goddard AG, Kingston JE, Hungerford JL: Delay in diagnosis of retinoblastoma: risk factors and treatment outcome. Br J Ophthalmol. 1999, 83 (12): 1320-1323. 10.1136/bjo.83.12.1320.PubMedPubMed CentralGoogle Scholar
- Butros LJ, Abramson DH, Dunkel IJ: Delayed diagnosis of retinoblastoma: analysis of degree, cause, and potential consequences. Pediatrics. 2002, 109 (3): 45-10.1542/peds.109.3.e45.Google Scholar
- Shields CL, Shields JA: Retinoblastoma management: advances in enucleation, intravenous chemoreduction, and intra-arterial chemotherapy. Curr Opin Ophthalmol. 2010, 21 (3): 203-212. 10.1097/ICU.0b013e328338676a.PubMedGoogle Scholar
- Friedrich MJ: Retinoblastoma therapy delivers power of chemotherapy with surgical precision. JAMA : Jo Am Med Assoc. 2011, 305 (22): 2276-2278. 10.1001/jama.2011.778.Google Scholar
- Cruz JA, Wishart DS: Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2006, 2: 59-77.Google Scholar
- Drier Y, Domany E: Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes?. PloS one. 2011, 6 (3): 1-7.Google Scholar
- Kim S, Yoon S: Adaboost-based multiple svm-rfe for classification of mammograms in ddsm. BMC Med Inform Decis Making. 2009, 9: 1-10. 10.1186/1472-6947-9-1.Google Scholar
- Doyle S, Feldman M, Tomaszewski J, Madabhushi A: A boosted bayesian multi-resolution classifier for prostate cancer detection from digitized needle biopsies. IEEE Trans Biomed Eng. 2010, 59 (5): 1205-1218. doi:10.1109/TBME.2010.2053540PubMedGoogle Scholar
- Zhou ZH, Jiang Y, Yang YB, Chen SF: Lung cancer cell identification based on artificial neural network ensembles. Artif Intell Med. 2002, 24 (1): 25-36. 10.1016/S0933-3657(01)00094-X.PubMedGoogle Scholar
- Mango LJ: Computer-assisted cervical cancer screening using neural networks. Cancer Lett. 1994, 77 (2–3): 155-162.PubMedGoogle Scholar
- Ercal F, Chawla A, Stoecker WV, Lee HC, Moss RH: Neural network diagnosis of malignant melanoma from color images. IEEE Trans Biomed Eng. 1994, 41 (9): 837-845. 10.1109/10.312091. doi:10.1109/10.312091PubMedGoogle Scholar
- Blum A, Luedtke H, Ellwanger U, Schwabe R, Rassner G, Garbe C: Digital image analysis for diagnosis of cutaneous melanoma. development of a highly effective computer algorithm based on analysis of 837 melanocytic lesions. Br J Dermatol. 2004, 151 (5): 1029-1038. 10.1111/j.1365-2133.2004.06210.x. doi:10.1111/j.1365-2133.2004.06210.xPubMedGoogle Scholar
- Ganster H, Pinz A, Röhrer R, Wildling E, Binder M, Kittler H: Automated melanoma recognition. IEEE Trans Med Imaging. 2001, 20 (3): 233-239. 10.1109/42.918473. doi:10.1109/42.918473PubMedGoogle Scholar
- Garcia-Uribe A, Kehtarnavaz N, Marquez G, Prieto V, Duvic M, Wang LV: Skin cancer detection by spectroscopic oblique-incidence reflectometry: classification and physiological origins. Appl Opt. 2004, 43 (13): 2643-2650. 10.1364/AO.43.002643.PubMedGoogle Scholar
- Viola P, Jones M: Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition, 2001. CVPR 2001 Proceedings of the 2001 IEEE Computer Society Conference On Volume 1. 2001, Piscataway: IEEE, 511-5181.Google Scholar
- Cho S-B, Kim JH: Multiple network fusion using fuzzy logic. Neural Netw IEEE Trans. 1995, 6 (2): 497-501. 10.1109/72.363487.Google Scholar
- Cho S-B, Kim JH: Combining multiple neural networks by fuzzy integral for robust classification. Syst Man Cybernet IEEE Trans. 1995, 25 (2): 380-384. 10.1109/21.364825.Google Scholar
- Abdallah ACB, Frigui H, Gader P: Adaptive local fusion with fuzzy integrals. Fuzzy Syst IEEE Trans. 2012, 20 (5): 849-864.Google Scholar
- Linda O, Manic M: Interval type-2 fuzzy voter design for fault tolerant systems. Inf Sci. 2011, 181 (14): 2933-2950. 10.1016/j.ins.2011.03.008.Google Scholar
- Wang D, Keller JM, Carson CA, McAdo-Edwards KK, Bailey CW: Use of fuzzy-logic-inspired features to improve bacterial recognition through classifier fusion. Syst Man Cybernet Part B: Cybernet IEEE Trans. 1998, 28 (4): 583-591. 10.1109/3477.704297.Google Scholar
- Gader PD, Mohamed MA, Keller JM: Fusion of handwritten word classifiers. Pattern Recognit Lett. 1996, 17 (6): 577-584. 10.1016/0167-8655(96)00021-9.Google Scholar
- Wang Y, Wu J: Fuzzy integrating multiple svm classifiers and its application in credit scoring. Machine Learning and Cybernetics, 2006 International Conference On. 2006, Piscataway: IEEE, 3621-3626.Google Scholar
- Benediktsson JA, Sveinsson JR, Ingimundarson JI, Sigurdsson HS, Ersoy OK: Multistage classifiers optimized by neural networks and genetic algorithms. Nonlinear Anal: Theory Methods Appl. 1997, 30 (3): 1323-1334. 10.1016/S0362-546X(97)00222-8.Google Scholar
- Du S, Shehata M, Badawy W: A novel algorithm for illumination invariant dct-based face recognition. Electrical Computer Engineering (CCECE), 2012 25th IEEE Canadian Conference On. 2012, Piscataway: IEEE, 1-4.Google Scholar
- Najim M: Modeling, Estimation and Optimal Filtering in Signal Processing. Chap. Karhunen Loeve Transform. 2010, London: Wiley – ISTE, :335–340Google Scholar
- Hua Y, Liu W: Generalized karhunen-loeve transform. Signal Process Lett IEEE. 1998, 5 (6): 141-142.Google Scholar
- Kuncheva LI, Bezdek JC, Duin RPW: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit. 2001, 34 (2): 299-314. 10.1016/S0031-3203(99)00223-X.Google Scholar
- Kittler J, Hatef M, Duin RPW, Matas J: On combining classifiers. Pattern Anal Mach Intell IEEE Trans. 1998, 20 (3): 226-239. 10.1109/34.667881.Google Scholar
- Jordan MI, Xu L: Convergence results for the em approach to mixtures of experts architectures. Neural Netw. 1995, 8 (9): 1409-1431. 10.1016/0893-6080(95)00014-3.Google Scholar
- Swain PH, Benediktsson JA: Consensus theoretic classification methods. Syst Man Cybernet IEEE Trans. 1992, 22 (4): 688-704. 10.1109/21.156582.Google Scholar
- Sugeno M: Fuzzy measures and fuzzy integrals: a survey. Fuzzy Automata Decis Process. 1977, 78 (33): 89-102.Google Scholar
- Chacon MI, Rivas-Perea P: Performance analysis of the feedforward and som neural networks in the face recognition problem. IEEE Symposium on Computational Intelligence in Image and Signal Processing, 2007. CIISP 2007 Hawaii, USA. 2007, Piscataway: IEEE, 313-318.Google Scholar
- Cristianini N, Scholkopf B: Support vector machines and kernel methods: the new generation of learning machines. Ai Magazine. 2002, 23 (3): 31-Google Scholar
- Haykin SS: Neural Networks and Learning Machines. 2009, Upper Saddle River: Pearson EducationGoogle Scholar
- Rivas-Perea P, Cota-Ruiz J, Rosiles J-G: A nonlinear least squares quasi-newton strategy for lp-svr hyper-parameters selection. Int J Mach Learn Cybernet. 2013, 5 (4): 579-597.Google Scholar
- Yang J, Frangi AF, Yang J-Y, Zhang D, Jin Z: Kpca plus lda: a complete kernel fisher discriminant framework for feature extraction and recognition. Pattern Anal Mach Intell IEEE Trans. 2005, 27 (2): 230-244.Google Scholar
- Frigyik BA, Gupta MR: Bounds on the bayes error given moments. Inf Theory IEEE Trans. 2012, 58 (6): 3606-3612.Google Scholar
- Cawley GC: Leave-one-out cross-validation based model selection criteria for weighted ls-svms. Neural Networks, 2006. IJCNN’06. International Joint Conference On. 2006, Piscataway: IEEE, 1661-1668.Google Scholar
- Fawcett T: Roc graphs: notes and practical considerations for researchers. Mach Learn. 2004, 31: 1-38.Google Scholar
- Carletta J: Assessing agreement on classification tasks: the kappa statistic. Comput Linguist. 1996, 22 (2): 249-254.Google Scholar
- Demšar J: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006, 7: 1-30.Google Scholar
- Henning R, Rivas-Perea P, Shaw B, Hamerly G: A convolutional neural network approach for classifying leukocoria. Image Analysis and Interpretation (SSIAI) 2014 IEEE Southwest Symposium On. 2014, Piscataway: IEEE, 9-12. doi:10.1109/SSIAI.2014.6806016Google Scholar
- Rivas-Perea P, Henning R, Shaw B, Hamerly G: Finding the smallest circle containing the iris in the denoised wavelet domain. Image Analysis and Interpretation (SSIAI) 2014 IEEE Southwest Symposium On. 2014, Piscataway: IEEE, doi:10.1109/SSIAI.2014.6806017Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2415/14/110/prepub
Pre-publication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.