These results characterize a potentially new approach to the diagnosis of glaucoma. Whereas the clinical diagnosis of glaucoma requires that defects in optic nerve structure and function should be found together, there is currently no test available to quantitatively combine measures of both structure and function. We propose the SFI as such a test. We have explicitly defined it to overcome some of the limitations of current testing. First of all, it utilizes continuous probabilities of abnormality rather than relying on the highly specific "abnormal" determinations of each device. The use of continuous probabilities has also been proposed for analysis of visual field data. Relying on this feature alone might make the SFI prone to false positive results, however. To overcome this, the SFI explicitly modulates defects in structure and function by requiring an anatomic relationship between the two, thereby augmenting defects that would not reach statistical significance on either test alone.
The results presented above show some desirable characteristics of the SFI. We see that the values of the SFI in the reference population are heavily skewed toward the expected (lower) values (Figure 3A). At the same time, the values for the glaucoma population are more spread out between low and high values (Figure 3B). Since Figure 3 depicts data for a single visual field location, one would expect such a distribution in the glaucoma group, as a particular location may not be affected in some individuals. A similar comment can be made about Figure 8. The fact that the SFI values are widely distributed for subjects with "mild" disease (based on visual field) is expected as this group contains both normal subjects and those with varying degrees of early damage. We are now investigating through the use of longitudinal data whether those with higher levels of SFI will show more rapid progression, thereby validating the index. This finding would not be as significant if the subjects with more severe field loss were also widely distributed in terms of the SFI. The fact that they are not suggests that the SFI is not simply a random number generator.
The ROC data also suggest that the SFI is a useful synthesis of structure and function. First of all, while the total area under the curve is not significantly different from those of MD and PSD, it does show higher sensitivity at the highest specificity values. The Venn diagram in Figure 10 showing the classification of subjects by each test also helps with the interpretation of the ROC curves. While all tests are highly correlated in severe disease (data not shown), there is significant disagreement for those with the mildest disease shown in the figure. In other words, each test is classifying (or misclassifying) different subjects on the right side of the ROC plot, the zone containing those with mild disease. As always, the lack of an accurate test for glaucoma makes it difficult to compare new to existing methods. A determination of which diagnostic test (including the clinician) is correct is not possible using data from a single point in time. The ultimate evaluation of the SFI (or any glaucoma test) will be longitudinal studies, now ongoing, in which development of initial injury will be compared among all diagnostic tests.
The areas under our ROC curves are lower than some prior studies using other patient populations and other diagnostic tests. This is likely due to the fact that we purposefully studied a challenging classification problem by attempting to distinguish a group of glaucoma suspects from those classified as glaucoma. Studies that evaluate the ability to discriminate between eyes with glaucoma and normals would be expected to find more striking differences, though this comparison does not duplicate the problem encountered by clinicians. To that end, we chose to use glaucoma suspects to define our normal values since they are, in fact, the group that clinicians are most often forced to differentiate from patients with early glaucoma. It is seldom a dilemma in clinical practice to distinguish a patient with low glaucoma risk (no family history of glaucoma, normal optic nerve appearance, normal visual field, normal IOP) from someone with manifest disc or field change. On the other hand, it is a frequently encountered problem to determine which patients with clear risk factors (strong family history, "suspicious" discs, non-specific field changes, elevated IOP) have disease and which do not. By choosing to use suspects as our reference group, we are therefore making the classification problem more difficult, but also more applicable to clinical practice.
Previous research on combining measures of structure and function used regression models that included variables from various optic nerve analyses and from automated perimetry. Subsequently, investigators applied machine learning techniques to combined structural and functional data[34–37]. While some of these studies reported an improvement of one kind or another in the detection of glaucoma, none included the steps of explicitly combining the structural and functional data using knowledge of nerve fiber layer topography or the superior-inferior difference in glaucoma damage. It may also be possible for a machine learning classifier to "learn" nerve fiber layer anatomy given enough training data. However, training a system in this way will always be hampered by the fact that empiric data contain correlations between structural and functional defects that are not due to a cause and effect relationship. For example, when significant damage has occurred, correlations between disc rim loss and decreased field test sensitivity will occur simply because all points are functionally depressed and not necessarily because the two are linked by ganglion cell anatomy. In other words, when the disc and field are both severely, rather than focally damaged, one will be able to find correlations between disconnected areas like superior field points and superior optic nerve parameters. Furthermore, most machine learning classifiers represent "black boxes" that model knowledge in ways that are not easily understandable. By explicitly including our knowledge of the anatomic basis for structure-function correlations in glaucoma, we avoid the need to "teach" classifiers how structure and function are related by anatomy in glaucoma. On the other hand, machine-learning approaches may provide the benefit of discovering alternative relationships between structure and function in glaucoma, though this benefit remains to be seen.
Another area of investigation that has some relationship to what we present here is the modeling of structural and functional changes in glaucoma. Starting with the assumption that changes in sensitivity at a particular point in the visual field should correlate closely with changes in nerve fiber layer thickness or the number of ganglion cells, both Harwerth et al. and Hood et al. have proposed linear models of this relationship. Both have shown significant correlation between structural and functional measures in both animals and humans and support the concept that glaucoma produces changes in both. While these models are useful for understanding local relationships between loss of ganglion cells and loss of visual sensitivity, they have not yet been shown to have application to diagnosis of disease. Furthermore, the variability in the data used to create the models and the subsequent uncertainty in the models themselves suggests that it will be difficult to apply them to individual patients. By emphasizing anatomically meaningful relationships, the SFI may therefore be a useful tool to bridge the gap between work on local correlations between structure and function and the significant variability that exists within each test alone.
Analysis of our study is clearly limited by the fact that it was carried out retrospectively. Specifically, the diagnostic criteria used by the clinicians as expressed in the billing code data were not standardized, so there is potential variability in diagnostic classification. We confirmed the validity of our diagnostic coding by reviewing a subset of charts, revealing a small error rate compared to the documented clinical impression and no evidence for bias in misdiagnosis favoring either group. Any study of glaucoma faces the difficulty that diagnostic criteria are either objective or subjective. When specific imaging and field criteria are chosen, there is the possibility that expert clinicians would differ on those criteria. When subjective expert judgment is the defining rule, one must be concerned that the result is not reproducible by some other group of experts. Clinician diagnostic biases may therefore be embedded in the characteristics of the two groups in this study and only through application of the method to other databases, collected prospectively, and with a variety of diagnostic criteria, will the ultimate value of the method be demonstrated.
A related issue with the glaucoma subjects used to test the SFI is that a significant portion of them has a visual field mean deviation with a value greater than 0 (Figure 5). This would imply that the clinicians making the diagnosis of glaucoma were likely using optic disc or retinal nerve fiber layer examination to define the presence of glaucoma. This group of glaucoma patients with above average field sensitivity could be explained either by the fact that optic disc change can precede visual field loss or by mis-diagnosis by the examining clinician. The latter option again points out the ambiguity caused by relying on "expert" clinicians to define the presence or absence of a disease and is something the SFI might help overcome.