Magnetic resonance imaging parameters on lacrimal gland in thyroid eye disease: a systematic review and meta-analysis
BMC Ophthalmology volume 23, Article number: 347 (2023)
Thyroid eye disease is an extrathyroidal manifestation of Graves’ disease and is associated with dry eye disease. This is the first systematic review and meta-analysis to evaluate the role of magnetic resonance imaging lacrimal gland parameters in thyroid eye disease diagnosis, activity grading, and therapeutic responses prediction.
Up to 23 August, 2022, 504 studies from PubMed and Cochrane Library were analyzed. After removing duplicates and imposing selection criteria, nine eligible studies were included. Risk of bias assessment was done. Meta-analyses were performed using random-effect model if heterogeneity was significant. Otherwise, fixed-effect model was used. Main outcome measures include seven structural magnetic resonance imaging parameters (lacrimal gland herniation, maximum axial area, maximum coronal area, maximum axial length, maximum coronal length, maximum axial width, maximum coronal width), and three functional magnetic resonance imaging parameters (diffusion tensor imaging-fractional anisotropy, diffusion tensor imaging-apparent diffusion coefficient or mean diffusivity, diffusion-weighted imaging-apparent diffusion coefficient).
Thyroid eye disease showed larger maximum axial area, maximum coronal area, maximum axial length, maximum axial width, maximum coronal width, diffusion tensor imaging-apparent diffusion coefficient/ mean diffusivity, and lower diffusion tensor imaging-fractional anisotropy than controls. Active thyroid eye disease showed larger lacrimal gland herniation, maximum coronal area, diffusion-weighted imaging-apparent diffusion coefficient than inactive. Lacrimal gland dimensional (maximum axial area, maximum coronal area, maximum axial length, maximum axial width, maximum coronal width) and functional parameters (diffusion tensor imaging-apparent diffusion coefficient, diffusion tensor imaging-apparent diffusion coefficient) could be used for diagnosing thyroid eye disease; lacrimal gland herniation, maximum coronal area, and diffusion-weighted imaging-apparent diffusion coefficient for differentiating active from inactive thyroid eye disease; diffusion tensor imaging parameters (diffusion tensor imaging-fractional anisotropy, diffusion tensor imaging-mean diffusivity) and lacrimal gland herniation for helping grading and therapeutic responses prediction respectively.
Magnetic resonance imaging lacrimal gland parameters can detect active thyroid eye disease and differentiate thyroid eye disease from controls. Maximum coronal area is the most effective indicator for thyroid eye disease diagnosis and activity grading. There are inconclusive results showing whether structural or functional lacrimal gland parameters have diagnostic superiority. Future studies are warranted to determine the use of magnetic resonance imaging lacrimal gland parameters in thyroid eye disease.
Thyroid eye disease (TED), also known as thyroid-associated ophthalmopathy (TAO), Graves’ ophthalmopathy (GO), or Graves’ orbitopathy, is an autoimmune disorder involving the orbital soft tissues, namely the extraocular muscles (EOMs) and orbital fat (OF) . It is an extrathyroidal manifestation of Graves’ disease (GD) . In GD, thyrotropin receptor (TSHr) autoantibodies (TRAbs) and insulin-like growth factor-1 (IGF-1) receptor (IGF-1r) autoantibodies attack their respective receptors on orbital fibroblasts and EOMs, stimulating adipogenesis and inflammation. The resultant increased volume of retrobulbar soft tissues within the limited space contributes to various thyroid eye signs .
TED is a biphasic disease that begins with an active phase with progressive inflammation, followed by an inactive phase with stable fibrosis of orbital soft tissues . After the current clinical assessment of TED, its differentiation of disease activity and severity is commonly based on the European Group of Graves’ Orbitopathy (EUGOGO)  classification system, in which the disease activity is assessed by the modified Clinical Activity Score (CAS), including spontaneous and gaze-evoked orbital pain, eyelid swelling and erythema, conjunctival erythema and chemosis, as well as inflammation of caruncle. A cutoff score of three or above out of seven items is defined as active ophthalmopathy; while less than three is inactive. TED is then classified into mild, moderate-to-severe, and sight-threatening . This classification often guides TED management. However, the different clinical presentations in Asian population would raise questions whether implementing this Caucasian based classification may delay and underestimate TED diagnosis in this group, and whether there are any better potential parameters to help in early TED diagnosis.
There is currently an increasing number of studies supporting the correlation between LG dysfunction and TED progress, ranging from LG enlargement clinically , TSHr on LG and the involvement of TRAbs immunologically , LG inflammation pathologically , to increased proinflammatory cytokines and proteomic changes in tear films molecularly [10,11,12,13]. Recently, imaging has been acting as an adjunct to the clinical-endocrinological assessment for the diagnosis, grading, treatment, and monitoring of TED. With the established role of LG in TED, imaging studies have been changing their focus from the traditional retrobulbar soft tissues to the structural and functional changes of LG. Previously, computed tomography (CT) studies reported an increase in LG dimensions and volume in TED patients. When compared to CT, magnetic resonance imaging (MRI) has higher soft tissue resolution without radiation.
Apart from the quantitative measurements of structural parameters like dimensions [14,15,16] and LG herniations [17,18,19], functional parameters like signal intensity ratio (SIR) in T2-weighted imaging (T2WI) [16, 19], apparent diffusion coefficient (ADC) in diffusion-weighted imaging (DWI) [15, 20], and fractional anisotropy (FA) [21, 22] and ADC  or mean diffusivity (MD)  in diffusion tensor imaging (DTI) were investigated in multiple studies. To the best of our knowledge, there is no review on MRI parameters of LG in TED patients. We would like to explore whether this newer imaging modality (i.e. MRI), combined with LG parameters could better aid the clinical management of TED. Herein, this systematic review and meta-analysis review and report the outcomes and clinical implications of different MRI parameters on LG in TED patients.
Our systematic review and meta-analysis followed the PRISMA 2020 guidelines . (PROSPERO registration number: CRD42022335591).
On 23 August, 2022, we performed our literature search on the following electronic bibliographic databases: PubMed and the Cochrane Central Register of Controlled Trials (issue 7 of 12, July 2022). We formulated sensitive search strategies using keywords and Medical Subject Heading (MeSH) terms stated in Table 1. No language restrictions nor limitations on publication years were applied. A total of 504 results were yielded (468 from PubMed, 36 from the Cochrane Library). 15 duplicates were identified, and 489 results were left for screening (Fig. 1).
The search mainly focused on mapping existing literature on MRI parameters on LG in TED. From the 489 results, we included studies based on the following inclusion criteria: 1) comparative studies including case–control and cohort studies, 2) cases were TED patients based on clinical diagnosis, 3) controls were healthy subjects or GD patients without TED or patients with inactive TED based on clinical diagnosis, 4) study focuses were on LG findings on MRI, 5) study subjects were unrelated individuals from clearly defined populations, 6) clear MRI LG results (or existing data adequate for calculation) in both case and control groups were provided. Animal studies, case reports, case series, reviews, abstracts, studies without or with incomplete original data were excluded.
In our first review, the titles and abstracts were screened by two independent reviewers (K.Y., and N.W.,) after applying search strategy and eligibility criteria. Disagreements were resolved after discussions with four senior reviewers (K.C., F.A., K.L., and Z.H.,). 475 irrelevant results were removed. Full text screening was then performed on the remaining 14 eligible articles by two independent reviewers (K.Y., and N.W.,). Disagreements were resolved after discussions with four senior reviewers (K.C., F.A., K.L., and Z.H.,). After ensuring eligibility, a total of nine qualified studies were included in our review and meta-analysis (Fig. 1).
We adopted a pre-designed form to collect all the extracted data, including the name of first author, year of publication, country of study, ethnicity, definition of case and control groups (thyroid status), CAS, age, sex, sample size, MRI parameters and their respective findings (expressed as mean ± standard deviation (SD) or median (interquartile range (IQR)), and the purposes of the parameters used (including disease activity, severity, or therapeutic responses). We extracted and analysed data on an eye-basis instead of a patient-basis. If results were reported on a patient-basis, we estimated and converted to eye-basis based on the assumption that all subjects had two eyes. If no extractable MRI results were obtained from an eligible study, or if there were confusions with the data reported, we emailed the authors for the missing data and for verification. Two independent reviewers (K.Y., and N.W.,) extracted data, and discrepancies were resolved after discussions with four senior reviewers (K.C., F.A., K.L., and Z.H.,).
Risk of bias (quality) assessment
We adopted a modified Newcastle–Ottawa Scale (NOS) for cross-sectional studies in the assessment of the quality of each of the selected studies [24, 25]. We gave scores based on the selection, comparability, and outcome to each study. The number of scores were marked in the same form for collecting extracted data. Any study obtaining less than or equal to six out of ten scores was considered as a high risk of inducing bias. Two independent reviewers (K.Y., and N.W.,) were involved in the quality assessment. Disagreements were resolved after discussions with four senior reviewers (K.C., F.A., K.L, and Z.H.,).
Review Manager (RevMan, Version 5.4. The Cochrane Collaboration, 2020.) was used to perform the meta-analysis for outcome measures which were included in two or more studies. We analyzed LG herniation (LGH), LG dimensions (maximum axial area (MAA), maximum coronal area (MCA), maximum axial length (MAL), maximum coronal length (MCL), maximum axial width (MAW), maximum coronal width (MCW)), DTI-FA, DTI-ADC/MD and DWI-ADC as continuous variables. As all outcomes in the included studies were measured in the same scale, we used mean difference as the summary effect measure for all variables. Mean difference is the difference between the mean of two groups . It is interpreted with P-value and 95% confidence interval (CI). If median and interquartile range were used in the studies, mean and standard deviation (SD) were estimated respectively as suggested by Wan et al. . (See Formula (1) and (2), Additional file 2) If the measurement of right eye and left eye were grouped and reported separately in the studies, we combined the two subgroups  (See Formula (3) , Additional file 2).
Heterogeneity was tested using Cochran's Q-statistics chi-square test and I2-statistic. If significant heterogeneity was found between the studies (P < 0.1 or I2 ≥ 50%), a random-effect model was used for meta-analysis. Otherwise, a fixed-effect model was used.
From our literature search, we identified a total of 504 titles and abstracts, and retrieved 14 full texts for review. We finally included nine studies in our systematic review and meta-analyses [14,15,16,17,18,19,20,21,22].
Characteristics of included studies
Table 2 summarizes the characteristics of the nine included studies. A total of 1012 eyes were included in the nine studies, in which 693 were cases and 319 were controls. Seven studies were conducted in China recruiting Chinese subjects [14,15,16, 18, 19, 21], while the remaining two studies were conducted in Italy and Egypt recruiting Italians  and North Africans subjects  respectively. The age ranged from 33.5  to 54.1 , while the sample sizes ranged from 64  to 222 eyes . Given that the CAS of a study [15, 17] could not be retrieved, the CAS ranged from 1 [14, 16, 17] to 4.6 . Around 78% (seven out of nine) studies discussed diagnostic purposes [14,15,16,17,18, 20, 21], in which four studies compared both active and inactive TED with healthy controls (HCs) [14, 16, 20, 21], two studies compared active TED with inactive TED [17, 18], while one study used GD as control to make comparisons with both active and inactive TED . The other two studies focused on grading  and therapeutic purposes  respectively; whilst the former compared mild and moderate-to-severe TED with HCs, the latter compared responsive to unresponsive group to glucocorticoid (GC) therapy in patients with active and moderate-to-severe TED. In terms of the MRI parameters used, three studies [14,15,16] looked into LG dimensions, among which all three studies reported MCA, MCL, and MCW, while only two studies [14, 16] reported MAA, MAL and MAW. Two studies investigated LG herniation in T2WI-fat suppression (T2WI-FS) [17, 18], two studies explored SIR in T2WI [16, 19], and two studies studied DWI-ADC [15, 20]. For the two studies that studied DTI-FA [21, 22], one study reported DTI-ADC , while the other reported DTI-MD .
The definitions of the MRI parameters are consistent among the included studies. For structural parameters, MCA is defined as LG area in the coronal image in which the LG is the largest, as shown in Fig. 2. MCL is defined as the distance between the superior tip and the inferior tip of LG in the coronal cut where MCA is obtained. MCW is defined as the widest distance perpendicular to the length (MCL) within the LG. The same principle applies to the axial parameters. MAA is defined as LG area in the axial image in which the LG is the largest, as shown in Fig. 3. MAL is defined as the distance between the anterior tip and the posterior tip of LG in the axial cut where MAA is obtained. MAW is defined as the widest distance perpendicular to the length (MAL) within the LG. LGH is defined as the distance between the anterior tip of LG and the interzygomatic line as shown in Fig. 4. For functional parameters, DWI-ADC, DTI-ADC (or MD) and DTI-FA were obtained by first placing a region of interest in the LG which has the largest cross-sectional area, and then measuring the value of ADC, MA or FA of that region of interest in DTI or DWI scan. In our paper, we combine the findings of both DTI-ADC and MD together since they both reflect the magnitude of water diffusion.
Risks of bias in included studies
Table 3 summarized the risk of bias assessment using a modified scale adapted from the Newcastle–Ottawa Scale (NOS) for cohort studies for all our nine selected cross-sectional studies. For more in-depth details, see Supplementary Table 1, Additional File 1.
Among the nine studies, except for one which scored six , all of them scored seven or above, indicating that they have a low risk of inducing bias [15,16,17,18,19,20,21,22]. Six of them (67%) adopted convenience sampling by choosing consecutive patients with TED [15, 17, 18, 20,21,22]. Two of them (22%) lacked detailed descriptions on the recruitment method of subjects [14, 22]. In terms of sample size, all studies did not justify nor show relevant sample size calculation. They all lacked an explanation on the expected sample size to provide a statistically significant information [14,15,16,17,18,19,20,21,22]. For outcome assessment, five of them (56%) involved more than one assessor who were blinded to the clinical condition of subjects to independently evaluate the MRI results. The intra- or inter-observer variability were appropriately adjusted using relevant statistical methods [15, 18, 19, 21, 22]. Two of them did not mention whether the assessors were blinded or not [16, 17]; while one of them involved only one blinded assessor and did not mention the correction of intra-observer variability . Otherwise, all studies had satisfactory response rates and established characteristics of the subjects, included CAS to ascertain the exposure of subjects (i.e. the status of active or inactive TED), adjusted age and sex as confounders, and clearly stated the appropriate statistical test for data analysis [14,15,16,17,18,19,20,21,22].
Active vs inactive
Table 4 summarized the MRI parameters used in the included studies to compare between active TED group and inactive TED group.
We conducted a meta-analysis on eight MRI measurements of active TED patients with inactive TED patients as control group, including LG herniation, LG dimensional parameters (MAA, MCA, MAL, MCL, MAW, MCW) and DWI-ADC. The results are shown in Fig. 5, and the summary is shown in Table 5. Two to three studies were included in each outcome measures. In MAA, MAW and DWI-ADC, there were statistically significant heterogeneity.
The active TED group showed a significant larger LG herniation than the inactive TED group by 3.37 mm (Fig. 5a). For LG dimensions, there was significant difference between the two groups only in MCA by 8.1 mm2 (Fig. 5c). In contrast, there were no significant differences in MAA (pooled mean difference: 8.3 mm2; Fig. 5b), MAL (pooled mean difference: 0.55 mm; Fig. 5d), MCL (pooled mean difference: 0.19 mm; Fig. 5e), MAW (pooled mean difference: -0.05 mm; Fig. 5f) and MCW (pooled mean difference: 0.22 mm; Fig. 5g). The active TED group was also associated with higher DWI-ADC than the inactive TED group by 0.1 × 10–3 mm2/s (Fig. 5h).
TED vs control
Table 6 summarized the MRI parameters used in the included studies to compare between TED group and healthy control group.
We also conducted a meta-analysis on nine MRI measurements of TED patients compared to control group, including LG dimensional parameters (MAA, MCA, MAL, MCL, MAW, MCW), DTI-FA, DTI-ADC/MD, and DWI-ADC. All included studies used healthy subjects as control group except Wu, who used Grave’s disease patients without TED as control . The results are shown in Fig. 6, and the summary is shown in Table 7. Two to three studies were included in each outcome measures. In MCA, MCL, MCW, DTI-FA and DWI-ADC, there were statistically significant heterogeneity.
For LG dimensions, there was significant difference between the TED group and the control group in MAA (pooled mean difference: 23.28 mm2; Fig. 6a), MCA (pooled mean difference: 14.44 mm2; Fig. 6b), MAL (pooled mean difference: 1.88 mm; Fig. 6c), MAW (pooled mean difference: 1.45 mm; Fig. 6e), and MCW (pooled mean difference: 1.00 mm; Fig. 6f). There was no significant difference in MCL (pooled mean difference: 0.37 mm; Fig. 6d). The TED group was associated with lower DTI-FA than the control group by 0.04 (Fig. 6g), and higher DTI-ADC/MD by 0.05 × 10–3 mm2/s (Fig. 6h). No significant difference in DWI-ADC was found (pooled mean difference: 0.12 × 10–3 mm2/s; Fig. 6i).
Other MRI parameters
Table 8 summarized other MRI parameters used in the included studies for grading or therapeutic purposes.
This systematic review and meta-analysis focused on the MRI measurement on LG of TED patients. Two to three studies were included in the meta-analyses. Active TED patient group has significantly larger LGH, larger MCA and larger DWI-ADC value than inactive TED patients. TED patient group was significantly larger in five dimensional parameters (MAA, MCA, MAL, MAW, MCW) and DTI-ADC/MD, and was significantly lower in DTI-FA than health controls.
MRI LG parameters comparisons
Active TED vs. inactive TED
In the comparison between active TED patients and inactive TED patients, we found that active TED patient group has significantly larger LGH, larger MCA and larger DWI-ADC value. This implies that these three parameters are potential parameters to differentiate active TED patients from the inactive ones. Out of the seven structural parameters, only LGH and MCA have significant difference. In contrast to functional MRI parameters, structural parameters measure physical LG characteristics to indirectly reflect the degree of inflammation, which is generally more severe in active TED. As a result, structural MRI parameters may not be superior to differentiate active TED patients from inactive ones, comparing with functional MRI parameters. Out of the three parameters showing significant differences, LGH (I2 = 0%; Fig. 5a) and MCA (I2 = 16%; Fig. 5c) have insignificant heterogeneity, while DWI-ADC (I2 = 93%; Fig. 5h) has substantial heterogeneity. The high heterogeneity may affect the validity of the result.
TED vs. control
In the comparison between TED patients and control, TED patient group was significantly larger in five dimensional parameters (MAA, MCA, MAL, MAW, MCW) and DTI-ADC/MD, and was significantly lower in DTI-FA. This implies that these seven parameters are potential parameters to differentiate TED patients from healthy subjects. Five out of six structural parameters (MAA, MCA, MAL, MAW, MCW) show significant differences. As the difference in the severity of inflammation between TED patients and control is larger than that between active TED patients and the inactive ones, structural parameters can also differentiate TED patients from healthy subjects.
For dimensional parameters, all coronal parameters, i.e., MCA (I2 = 85%; Fig. 6b), MCL (I2 = 81%; Fig. 6d) and MCW (I2 = 59%; Fig. 6f), show significant heterogeneity. In contrast, all axial parameters, i.e., MAA (I2 = 0%; Fig. 6a), MAL (I2 = 0%; Fig. 6c) and MAW (I2 = 48%; Fig. 6e), show insignificant heterogeneity. The study by Wu only measured coronal parameters . It is observed that the result of Wu’s study showed lower mean differences consistently, accounting to the high heterogeneity in coronal parameters.
Among the dimensional parameters, area parameters perform better at differentiating active TED patients from the inactive ones and TED patients from healthy subjects than length and width parameters. Only MCA can differentiate active TED patients from the inactive ones (Fig. 5c), while both MCA and MAA can differentiate TED patients from healthy subjects (Fig. 6a and 6b). Length and width parameters cannot differentiate active TED patients from the inactive ones. The possible reason is that areas are two-dimensional entities. The differences between the groups are more prominent. Area parameters are also more accurate as they reflect changes in two dimensions. LG volume may be an even better dimensional parameter because it is three-dimensional. Among the nine included studies, only the study by Hu (2016) measured LG volume . Hu’s method for measuring LG volume requires delineating LG in all slides to obtain the areas, and then multiplying the sum of area and slice interval to compute the volume . It is much more labour-intensive to measure volume than area, as it is required to delineate LG in all slides. In clinical practice, it is more difficult to manually measure LG volume for all patients. Measuring maximum area is easier and more practical.
Among the six dimensional parameters, the best parameter is MCA and the worst parameter is MCL. MCA can differentiate active TED patients from the inactive ones (Fig. 5c), and TED patients from healthy subjects (Fig. 6b). In contrast, MCL cannot differentiate both (Figs. 5e and 6d).
Substantial heterogeneity was observed in DWI-ADC of both comparisons (I2 = 93% in active TED vs. inactive TED, Fig. 5h; I2 = 98% in TED vs control, Fig. 6i). It is observed that DWI-ADC value in Wu’s study is generally lower than that in Razek’s study [15, 20]. Wu’s method of measuring DWI-ADC value involved delineating the largest coronal area in T2 weighted sections, and then measuring the ADC value of that area in DWI sequence. The most hyperintense spot, which represents the area of most severe inflammation, may not be hit. In contrast, Razek’s method involved placing region of interest directly in DWI sequence and measuring the ADC value. The difference in the method of measuring DWI-ADC value is a possible reason for the generally low DWI-ADC values in Wu’s study, and thus the high heterogeneity in the meta-analyses. Another possible reason is ethnicity difference. Wu’s study recruited Chinese subjects while Razek’s study recruited Egyptians [15, 20].
Structural vs. functional parameters
In evaluating the inflammatory activity, functional MRI parameters may be better than structural MRI parameters as functional parameters reflect directly on the level of metabolic activity. However, in both comparisons (i.e., active TED vs inactive TED and TED vs healthy control), structural and functional MRI parameters show comparable results in differentiating between two groups. Two (i.e., LGH and MCA) out of seven structural parameters, and one (i.e., DWI-ADC) out of one functional parameter can differentiate active TED patients from the inactive ones. Five (i.e., MAA, MCA, MAL, MAW and MCW) out of six structural parameters, and two (i.e., DTI-FA and DTI-ADC/MD) out of three functional parameters can differentiate TED patients from healthy subjects. The result of this meta-analyses showed that functional MRI parameters has no superiority than structural MRI parameters, and vice versa. Computed tomography (CT) is another imaging modality that can measure structural parameters. With comparable results between functional and structural MRI parameters, CT may be comparable to functional MRI in diagnosing TED. However, a study by Lee showed the sensitivity of CT and MRI for detecting active inflammation in TED is 50% and 100% respectively, despite limited validation of the ability of MRI in the study . As a result, there was an inconclusive result in evaluating the superiority between structural and functional MRI parameters.
Clinical diagnosis of disease activity
Out of the nine included studies, seven studies (78%) were conducted on Chinese patients (Table 2). This could possibly be accounted by the more difficult diagnosis and management based on the clinical manifestations of Chinese patients with TED, where Lim et al. concluded that East Asians generally had fewer exophthalmos, upper eyelid retractions and edema than Caucasian patients, leading to more research interests in finding alternatives (e.g. imaging modalities) for earlier detection or diagnosis of TED . While five [14,15,16, 18, 21] out of these seven studies, plus two studies done in Italy  and Egypt , looked into the different LG parameters to aid the diagnosis of TED, we are the first study to do a meta-analysis on the data provided in all these studies. We have found that the LG parameters, both structural and functional, generally provide a more significant diagnostic value in differentiating TED from disease-free patients than in differentiating active from inactive TED patients. To be more precise, LG dimensional parameters including MAA, MCA, MAL, MAW, MCW (Fig. 6a-c, e, f), as well as LG functional parameters including DTI-FA and DTA-ADC/MD (Fig. 6g, h) could possibly be used in clinical practices for differentiating TED from disease-free patients. This is compared to the fewer parameters, i.e., LG herniation, MCA, and DWI-ADC (Fig. 5a, c, h), that could possibly be used to differentiate active from inactive TED patients. This implies that we might take these LG MRI parameters into account when diagnosing TED in the future along with the traditional modified CAS. The possibility of creating a new scoring system for TED activity diagnosis incorporating LG MRI parameters may also be considered, especially among the Asian population.
Grading of disease severity
Among all the nine included studies, only a Chinese study by Rui et al.  took a step further to compare mild to moderate-severe TED patients and investigate the use of DTI parameters for grading TED severity. Thus, we could not perform a meta-analysis regarding this perspective. Based on the findings by Rui et al. , moderate-severe TED group had significantly lower DTI-FA, especially of medial rectus (MR) (P = 0.017), and higher DTI-MD (P = 0.021) than mild TED group. It also concluded that DTI parameters, especially FA, of MR were sensitive indicators that could help in the differentiation between mild and moderate-severe TED. From this result, we could see the potential role of LG DTI parameters in guiding the grading of TED severity and hence the management plan of TED patients more accurately. However, it is obvious that more studies need to be carried out to draw a more statistically significant conclusion.
Prediction of therapeutic responses and prognosis
Similar to the above, among all the nine included studies, only a Chinese study by Hu et al.  compared the LG parameters, i.e., LG herniation and SIR (SIR-max, SIR-mean, SIR-min), of active and moderate-severe TED patients responsive to intravenous (IV) steroidal therapy after six months to those unresponsive patients. Thus, a meta-analysis regarding the therapeutic responses was not performed. Based on the sole results by Hu et al. , it is found that those responsive to IV steroids had a significantly larger LG herniation than those unresponsive (P = 0.019), while there were no statistically significant differences in SIR (SIR-max, SIR-mean, SIR-min) between the two groups (P = 0.514, 0.776 and 0.642 respectively). It summarized that the larger LG herniation could possibly be used to distinguish treatment responsive and unresponsive group. This could then possibly allow a wiser allocation of treatment plans, i.e., glucocorticoid therapy for responsive patients, and immunotherapies for unresponsive patients. With more studies investigating this aspect, a more accurate conclusion could then be drawn, and hence more targeted treatment plans could be made for patients to improve their disease prognosis.
Use of (LG) imaging in managing other orbital/ inflammatory diseases
Apart from the LG parameters studied in our systematic review and meta-analysis for the diagnosis, grading, and prediction of therapeutic responses in TED, in fact, different LG parameters have gained an emerging role in the assessment of other orbital or inflammatory diseases. For instance, one of the differential diagnoses of dry eyes is primary Sjogren’s syndrome (pSS), which is an autoimmune disease affecting the salivary and LGs, causing dryness of mouth and eyes . However, its clinical diagnosis is difficult due to its non-specific signs and symptoms . While the current diagnostic criteria related to the orbit involves Schirmer’s test and ocular dry scores [32, 33], discomfort to patients and uneasy interpretation arises respectively . Hence, different studies have tried to look for different LG parameters by non-invasive imaging to aid pSS diagnosis. For example, the change in LG size and enhanced signal intensity with accelerated fat deposition in MRI could predict pSS stages ; the significantly lower DWI-ADC of LG may suggest LG abnormalities in pSS patients ; the lower 11C-MET uptake by LG in PET-CT scan has found to have positive correlation with reduced tear flow . Another example of LG-related orbital disease is IgG4-related disease (IgG4-RD), which is a fibroinflammatory disease with lymphoplasmacytic IgG4-positve plasma cells infiltration to multiple organ tissues that could involve the orbit, in case of IgG4-related ophthalmic disease (IgG4-ROD) . Its current diagnosis is based on the typical organ dysfunctions or structural changes (i.e., swelling), high serum IgG4 titer, and histopathological results from biopsy which is invasive, while imaging could also serve as a non-invasive tool to aid the diagnosis . For instance, it is found that the hypointense and enlarged LG on T2W MRI , and the higher uptake of 68 Ga-FAPI in PET-CT by LG could aid the diagnosis and assessment of IgG4-RD . While both pSS and IgG4-RD could result in enlarged LG, the infraorbital nerve enlargement (IONE) in MRI could act as a specific MRI sign of IgG4-ROD . Besides, IONE could also help differentiate IgG4-ROD from other lymphoproliferative orbital diseases, including lymphoma, reactive lymphoid hyperplasia, and idiopathic or other orbital inflammation . From these examples, we could see an increasingly important role of non-invasive imaging techniques, as well as the rising role of LG parameters on different imaging to aid the diagnosis of various orbital or inflammatory diseases in which ophthalmologists are of particular interests.
There were few limitations in our systematic review. First, only two to three studies were included under each outcome measures. The pooled sample sizes may not be large enough to draw a clinically significant conclusion due to random sampling error. Secondly, if heterogeneity was found to be significant by Cochran's Q-statistics chi-square test and I2-statistic, subgroup analysis cannot be performed as each subgroup would consist of one to two studies only. As a result, heterogeneity may be significant and may affect the validity of the result of meta-analysis. Further investigations may be needed to explore the reasons behind the high heterogeneity, such as ethnicity differences and scanner dependent differences. Thirdly, the few numbers of included studies also reflects that the field of MRI on lacrimal gland in thyroid eye disease requires further studies.
Out of the nine included studies, seven studies were written by Chinese authors and recruited Chinese subjects [14,15,16, 18, 19, 21, 22]. Care in application of results to other ethnicities should be considered. Previous studies have demonstrated that differences in clinical manifestations of TED exist between East Asian and Caucasian patients [29, 43]. Radiological differences can arise between different ethnicities, affecting the representativeness of this meta-analysis as most included patients are Chinese.
This is the first systematic review and meta-analysis on the use of MRI LG parameters in TED patients. MRI is a non-invasive imaging modality that can effectively guide the management of TED patients. While the current number of studies on MRI LG parameters is limited, where only two to three studies focus on each parameter, more studies with larger sample sizes and from a wider range of ethnicities would be warranted. The potential LG imaging markers for TED, especially in the aspects of disease grading and therapeutic responses prediction are still under investigations. The use of MRI, which is non-invasive, safe, highly sensitive, could possibly be a rising trend for the diagnosis of TED or other orbital diseases.
The systematic review and meta-analyses suggest that lacrimal gland herniation, maximum coronal area, and DWI-ADC are able to detect TED patients with active diseases. Maximum axial area, maximum coronal area, maximum axial length, maximum axial width, maximum coronal width, DTI-FA, and DTI-ADC/MD are able to differentiate TED patients from healthy controls. Further studies on the use of MRI on lacrimal gland in the field of thyroid eye disease are warranted to confirm our results.
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information files.
Thyroid eye disease
Dry eye disease
Magnetic resonance imaging
Lacrimal gland herniation
Maximum axial area
Maximum coronal area
Maximum axial length
Maximum coronal length
Maximum axial width
Maximum coronal width
Signal intensity ratio
Apparent diffusion coefficient
Diffusion tensor imaging
Thyrotropin receptor autoantibodies
Insulin-like growth factor-1
Insulin-like growth factor-1 receptor
European Group of Graves’ Orbitopathy
Clinical Activity Score
Dysthyroid optic neuropathy
Medical Subject Heading
Primary Sjogren’s syndrome
IgG4-related ophthalmic disease
Infraorbital nerve enlargement
Weiler DL. Thyroid eye disease: a review. Clin Exp Optom. 2017;100(1):20–5.
Eckstein A, Esser J. Graves’ orbitopathy. Klin Monbl Augenheilkd. 2011;228(5):432–8.
Khoo TK, Bahn RS. Pathogenesis of Graves’ ophthalmopathy: the role of autoantibodies. Thyroid. 2007;17(10):1013–8.
Dolman PJ. Grading Severity and Activity in Thyroid Eye Disease. Ophthalmic Plast Reconstr Surg. 2018;34(4S):S34–40.
Bartalena L, Baldeschi L, Dickinson A, Eckstein A, Kendall-Taylor P, Marcocci C, et al. Consensus statement of the European Group on Graves’ orbitopathy (EUGOGO) on management of GO. Eur J Endocrinol. 2008;158(3):273–85.
Barrio-Barrio J, Sabater AL, Bonet-Farriol E, Velázquez-Villoria Á, Galofré JC. Graves’ Ophthalmopathy: VISA versus EUGOGO Classification, Assessment, and Management. J Ophthalmol. 2015;2015: 249125.
Khu J, Freedman KA. Lacrimal gland enlargement as an early clinical or radiological sign in thyroid orbitopathy. Am J Ophthalmol Case Rep. 2017;5:1–3.
Eckstein AK, Finkenrath A, Heiligenhaus A, Renzing-Köhler K, Esser J, Krüger C, et al. Dry eye syndrome in thyroid-associated ophthalmopathy: lacrimal expression of TSH receptor suggests involvement of TSHR-specific autoantibodies. Acta Ophthalmol Scand. 2004;82(3 Pt 1):291–7.
Jacobson DH, Gorman CA. Endocrine ophthalmopathy: current ideas concerning etiology, pathogenesis, and treatment. Endocr Rev. 1984;5(2):200–20.
Zoukhri D. Effect of inflammation on lacrimal gland function. Exp Eye Res. 2006;82(5):885–98.
Huang D, Xu N, Song Y, Wang P, Yang H. Inflammatory cytokine profiles in the tears of thyroid-associated ophthalmopathy. Graefes Arch Clin Exp Ophthalmol. 2012;250(4):619–25.
Matheis N, Okrojek R, Grus FH, Kahaly GJ. Proteomics of tear fluid in thyroid-associated orbitopathy. Thyroid. 2012;22(10):1039–45.
Khalil HA, De Keizer RJ, Bodelier VM, Kijlstra A. Secretory IgA and lysozyme in tears of patients with Graves’ ophthalmopathy. Doc Ophthalmol. 1989;72(3–4):329–34.
Huang D, Luo Q, Yang H, Mao Y. Changes of lacrimal gland and tear inflammatory cytokines in thyroid-associated ophthalmopathy. Invest Ophthalmol Vis Sci. 2014;55(8):4935–43.
Wu D, Zhu H, Hong S, Li B, Zou M, Ma X, et al. Utility of multi-parametric quantitative magnetic resonance imaging of the lacrimal gland for diagnosing and staging Graves’ ophthalmopathy. Eur J Radiol. 2021;141: 109815.
Hu H, Xu XQ, Wu FY, Chen HH, Su GY, Shen J, et al. Diagnosis and stage of Graves’ ophthalmopathy: Efficacy of quantitative measurements of the lacrimal gland based on 3-T magnetic resonance imaging. Exp Ther Med. 2016;12(2):725–9.
Gagliardo C, Radellini S, Morreale Bubella R, Falanga G, Richiusa P, Vadalà M, et al. Lacrimal gland herniation in Graves ophthalmopathy: a simple and useful MRI biomarker of disease activity. Eur Radiol. 2020;30(4):2138–41.
Gao Y, Chang Q, Li Y, Zhang H, Hou Z, Zhang Z, et al. Correlation between extent of lacrimal gland prolapse and clinical features of thyroid-associated ophthalmopathy: a retrospective observational study. BMC Ophthalmol. 2022;22(1):66.
Hu H, Xu XQ, Chen L, Chen W, Wu Q, Chen HH, et al. Predicting the response to glucocorticoid therapy in thyroid-associated ophthalmopathy: mobilizing structural MRI-based quantitative measurements of orbital tissues. Endocrine. 2020;70(2):372–9.
Razek AA, El-Hadidy EM, Moawad ME, El-Metwaly N, El-Said AAE. Assessment of lacrimal glands in thyroid eye disease with diffusion-weighted magnetic resonance imaging. Pol J Radiol. 2019;84:e142–6.
Chen L, Hu H, Chen W, Wu Q, Zhou J, Chen HH, et al. Usefulness of readout-segmented EPI-based diffusion tensor imaging of lacrimal gland for detection and disease staging in thyroid-associated ophthalmopathy. BMC Ophthalmol. 2021;21(1):281.
Rui L, Jing L, Zhenchang W. Diffusion Tensor Imaging Technology to Quantitatively Assess Abnormal Changes in Patients With Thyroid-Associated Ophthalmopathy. Front Hum Neurosci. 2021;15: 805945.
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71.
Wells GA, Wells G, Shea B, Shea B, O'Connell D, Peterson J, et al., editors. The Newcastle-Ottawa Scale (NOS) for Assessing the Quality of Nonrandomised Studies in Meta-Analyses2014.
Herzog R, Álvarez-Pasquin MJ, Díaz C, Del Barrio JL, Estrada JM, Gil Á. Are healthcare workers’ intentions to vaccinate related to their knowledge, beliefs and attitudes? a systematic review. BMC Public Health. 2013;13(1):154.
Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.3: Cochrane; 2022 [updated February 2022. Available from: www.training.cochrane.org/handbook.
Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014;14:135.
Lee MJ, Hamilton BE, Pettersson D, Ogle K, Murdock J, Dailey RA, et al. Radiologic imaging shows variable accuracy in diagnosing orbital inflammatory disease and assessing its activity. Sci Rep. 2020;10(1):21875.
Lim NC, Sundar G, Amrith S, Lee KO. Thyroid eye disease: a Southeast Asian experience. Br J Ophthalmol. 2015;99(4):512–8.
Jonsson R, Brokstad KA, Jonsson MV, Delaleu N, Skarstein K. Current concepts on Sjögren's syndrome - classification criteria and biomarkers. Eur J Oral Sci. 2018;126 Suppl 1(Suppl Suppl 1):37–48.
Bjordal O, Norheim KB, Rødahl E, Jonsson R, Omdal R. Primary Sjögren’s syndrome and the eye. Surv Ophthalmol. 2020;65(2):119–32.
Shiboski CH, Shiboski SC, Seror R, Criswell LA, Labetoulle M, Lietman TM, et al. 2016 American College of Rheumatology/European League Against Rheumatism classification criteria for primary Sjögren’s syndrome: A consensus and data-driven methodology involving three international patient cohorts. Ann Rheum Dis. 2017;76(1):9–16.
Vitali C, Bombardieri S, Jonsson R, Moutsopoulos HM, Alexander EL, Carsons SE, et al. Classification criteria for Sjögren’s syndrome: a revised version of the European criteria proposed by the American-European Consensus Group. Ann Rheum Dis. 2002;61(6):554–8.
Izumi M, Eguchi K, Uetani M, Nakamura H, Takagi Y, Hayashi K, et al. MR features of the lacrimal gland in Sjögren’s syndrome. AJR Am J Roentgenol. 1998;170(6):1661–6.
Kawai Y, Sumi M, Kitamori H, Takagi Y, Nakamura T. Diffusion-weighted MR microimaging of the lacrimal glands in patients with Sjogren’s syndrome. AJR Am J Roentgenol. 2005;184(4):1320–5.
Jimenez-Royo P, Bombardieri M, Ciurtin C, Kostapanos M, Tappuni AR, Jordan N, et al. Advanced imaging for quantification of abnormalities in the salivary glands of patients with primary Sjögren’s syndrome. Rheumatology (Oxford). 2021;60(5):2396–408.
McNab AA, McKelvie P. IgG4-related ophthalmic disease. Part I: background and pathology. Ophthalmic Plast Reconstr Surg. 2015;31(2):83–8.
Umehara H, Okazaki K, Masaki Y, Kawano M, Yamamoto M, Saeki T, et al. Comprehensive diagnostic criteria for IgG4-related disease (IgG4-RD), 2011. Mod Rheumatol. 2012;22(1):21–30.
Toyoda K, Oba H, Kutomi K, Furui S, Oohara A, Mori H, et al. MR imaging of IgG4-related disease in the head and neck and brain. AJNR Am J Neuroradiol. 2012;33(11):2136–9.
Luo Y, Pan Q, Yang H, Peng L, Zhang W, Li F. Fibroblast Activation Protein-Targeted PET/CT with (68)Ga-FAPI for Imaging IgG4-Related Disease: Comparison to (18)F-FDG PET/CT. J Nucl Med. 2021;62(2):266–71.
Soussan JB, Deschamps R, Sadik JC, Savatovsky J, Deschamps L, Puttermann M, et al. Infraorbital nerve involvement on magnetic resonance imaging in European patients with IgG4-related ophthalmic disease: a specific sign. Eur Radiol. 2017;27(4):1335–43.
Ohshima K, Sogabe Y, Sato Y. The usefulness of infraorbital nerve enlargement on MRI imaging in clinical diagnosis of IgG4-related orbital disease. Jpn J Ophthalmol. 2012;56(4):380–2.
Chng CL, Seah LL, Khoo DH. Ethnic differences in the clinical presentation of Graves’ ophthalmopathy. Best Pract Res Clin Endocrinol Metab. 2012;26(3):249–58.
No conflicting relationship exists for any author.
Ethics approval and consent to participate
Consent for publication
The authors declared that they do not have any competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Wong, N.T.Y., Yuen, K.F.K., Aljufairi, F.M.A.A. et al. Magnetic resonance imaging parameters on lacrimal gland in thyroid eye disease: a systematic review and meta-analysis. BMC Ophthalmol 23, 347 (2023). https://doi.org/10.1186/s12886-023-03008-x