Magnetic resonance imaging parameters on lacrimal gland in thyroid eye disease: a systematic review and meta-analysis

Background Thyroid eye disease is an extrathyroidal manifestation of Graves’ disease and is associated with dry eye disease. This is the first systematic review and meta-analysis to evaluate the role of magnetic resonance imaging lacrimal gland parameters in thyroid eye disease diagnosis, activity grading, and therapeutic responses prediction. Methods Up to 23 August, 2022, 504 studies from PubMed and Cochrane Library were analyzed. After removing duplicates and imposing selection criteria, nine eligible studies were included. Risk of bias assessment was done. Meta-analyses were performed using random-effect model if heterogeneity was significant. Otherwise, fixed-effect model was used. Main outcome measures include seven structural magnetic resonance imaging parameters (lacrimal gland herniation, maximum axial area, maximum coronal area, maximum axial length, maximum coronal length, maximum axial width, maximum coronal width), and three functional magnetic resonance imaging parameters (diffusion tensor imaging-fractional anisotropy, diffusion tensor imaging-apparent diffusion coefficient or mean diffusivity, diffusion-weighted imaging-apparent diffusion coefficient). Results Thyroid eye disease showed larger maximum axial area, maximum coronal area, maximum axial length, maximum axial width, maximum coronal width, diffusion tensor imaging-apparent diffusion coefficient/ mean diffusivity, and lower diffusion tensor imaging-fractional anisotropy than controls. Active thyroid eye disease showed larger lacrimal gland herniation, maximum coronal area, diffusion-weighted imaging-apparent diffusion coefficient than inactive. Lacrimal gland dimensional (maximum axial area, maximum coronal area, maximum axial length, maximum axial width, maximum coronal width) and functional parameters (diffusion tensor imaging-apparent diffusion coefficient, diffusion tensor imaging-apparent diffusion coefficient) could be used for diagnosing thyroid eye disease; lacrimal gland herniation, maximum coronal area, and diffusion-weighted imaging-apparent diffusion coefficient for differentiating active from inactive thyroid eye disease; diffusion tensor imaging parameters (diffusion tensor imaging-fractional anisotropy, diffusion tensor imaging-mean diffusivity) and lacrimal gland herniation for helping grading and therapeutic responses prediction respectively. Conclusions Magnetic resonance imaging lacrimal gland parameters can detect active thyroid eye disease and differentiate thyroid eye disease from controls. Maximum coronal area is the most effective indicator for thyroid eye disease diagnosis and activity grading. There are inconclusive results showing whether structural or functional lacrimal gland parameters have diagnostic superiority. Future studies are warranted to determine the use of magnetic resonance imaging lacrimal gland parameters in thyroid eye disease. Supplementary Information The online version contains supplementary material available at 10.1186/s12886-023-03008-x.

TED is a biphasic disease that begins with an active phase with progressive inflammation, followed by an inactive phase with stable fibrosis of orbital soft tissues [4].After the current clinical assessment of TED, its differentiation of disease activity and severity is commonly based on the European Group of Graves' Orbitopathy (EUGOGO) [5] classification system, in which the disease activity is assessed by the modified Clinical Activity Score (CAS), including spontaneous and gaze-evoked orbital pain, eyelid swelling and erythema, conjunctival erythema and chemosis, as well as inflammation of caruncle.A cutoff score of three or above out of seven items is defined as active ophthalmopathy; while less than three is inactive.TED is then classified into mild, moderate-to-severe, and sight-threatening [6].This classification often guides TED management.However, the different clinical presentations in Asian population would raise questions whether implementing this Caucasian based classification may delay and underestimate TED diagnosis in this group, and whether there are any better potential parameters to help in early TED diagnosis.
There is currently an increasing number of studies supporting the correlation between LG dysfunction and TED progress, ranging from LG enlargement clinically [7], TSHr on LG and the involvement of TRAbs immunologically [8], LG inflammation pathologically [9], to increased proinflammatory cytokines and proteomic changes in tear films molecularly [10][11][12][13].Recently, imaging has been acting as an adjunct to the clinicalendocrinological assessment for the diagnosis, grading, treatment, and monitoring of TED.With the established role of LG in TED, imaging studies have been changing their focus from the traditional retrobulbar soft tissues to the structural and functional changes of LG.Previously, computed tomography (CT) studies reported an increase in LG dimensions and volume in TED patients.When compared to CT, magnetic resonance imaging (MRI) has higher soft tissue resolution without radiation.
Apart from the quantitative measurements of structural parameters like dimensions [14][15][16] and LG herniations [17][18][19], functional parameters like signal intensity ratio (SIR) in T2-weighted imaging (T2WI) [16,19], apparent diffusion coefficient (ADC) in diffusion-weighted imaging (DWI) [15,20], and fractional anisotropy (FA) [21,22] and ADC [21] or mean diffusivity (MD) [22] in diffusion tensor imaging (DTI) were investigated in multiple studies.To the best of our knowledge, there is no review on MRI parameters of LG in TED patients.We would like to explore whether this newer imaging modality (i.e.MRI), combined with LG parameters could better aid the clinical management of TED.Herein, this systematic review and meta-analysis review and report the outcomes and clinical implications of different MRI parameters on LG in TED patients.

Search strategy
On 23 August, 2022, we performed our literature search on the following electronic bibliographic databases: Pub-Med and the Cochrane Central Register of Controlled Trials (issue 7 of 12, July 2022).We formulated sensitive search strategies using keywords and Medical Subject Heading (MeSH) terms stated in Table 1.No language restrictions nor limitations on publication years were applied.A total of 504 results were yielded (468 from Pub-Med, 36 from the Cochrane Library).15 duplicates were identified, and 489 results were left for screening (Fig. 1).

Selection criteria
The search mainly focused on mapping existing literature on MRI parameters on LG in TED.From the 489 results, we included studies based on the following inclusion criteria: 1) comparative studies including case-control and cohort studies, 2) cases were TED patients based on clinical diagnosis, 3) controls were healthy subjects or GD patients without TED or patients with inactive TED based on clinical diagnosis, 4) study focuses were on LG findings on MRI, 5) study subjects were unrelated individuals from clearly defined populations, 6) clear MRI LG results (or existing data adequate for calculation) in both case and control groups were provided.Animal studies, case reports, case series, reviews, abstracts, studies without or with incomplete original data were excluded.
In our first review, the titles and abstracts were screened by two independent reviewers (K.Y., and N.W.,) after applying search strategy and eligibility criteria.Disagreements were resolved after discussions with four senior reviewers (K.C., F.A., K.L., and Z.H.,).475 irrelevant results were removed.Full text screening was then performed on the remaining 14 eligible articles by two independent reviewers (K.Y., and N.W.,).Disagreements were resolved after discussions with four senior reviewers (K.C., F.A., K.L., and Z.H.,).After ensuring eligibility, Fig. 1 PRISMA flow diagram of literature search and selection process a total of nine qualified studies were included in our review and meta-analysis (Fig. 1).

Data extraction
We adopted a pre-designed form to collect all the extracted data, including the name of first author, year of publication, country of study, ethnicity, definition of case and control groups (thyroid status), CAS, age, sex, sample size, MRI parameters and their respective findings (expressed as mean ± standard deviation (SD) or median (interquartile range (IQR)), and the purposes of the parameters used (including disease activity, severity, or therapeutic responses).We extracted and analysed data on an eye-basis instead of a patient-basis.If results were reported on a patient-basis, we estimated and converted to eye-basis based on the assumption that all subjects had two eyes.If no extractable MRI results were obtained from an eligible study, or if there were confusions with the data reported, we emailed the authors for the missing data and for verification.Two independent reviewers (K.Y., and N.W.,) extracted data, and discrepancies were resolved after discussions with four senior reviewers (K.C., F.A., K.L., and Z.H.,).

Risk of bias (quality) assessment
We adopted a modified Newcastle-Ottawa Scale (NOS) for cross-sectional studies in the assessment of the quality of each of the selected studies [24,25].We gave scores based on the selection, comparability, and outcome to each study.The number of scores were marked in the same form for collecting extracted data.Any study obtaining less than or equal to six out of ten scores was considered as a high risk of inducing bias.Two independent reviewers (K.Y., and N.W.,) were involved in the quality assessment.Disagreements were resolved after discussions with four senior reviewers (K.C., F.A., K.L, and Z.H.,).

Statistical analysis
Review Manager (RevMan, Version 5.4.The Cochrane Collaboration, 2020.) was used to perform the metaanalysis for outcome measures which were included in two or more studies.We analyzed LG herniation (LGH), LG dimensions (maximum axial area (MAA), maximum coronal area (MCA), maximum axial length (MAL), maximum coronal length (MCL), maximum axial width (MAW), maximum coronal width (MCW)), DTI-FA, DTI-ADC/MD and DWI-ADC as continuous variables.As all outcomes in the included studies were measured in the same scale, we used mean difference as the summary effect measure for all variables.Mean difference is the difference between the mean of two groups [26].It is interpreted with P-value and 95% confidence interval (CI).If median and interquartile range were used in the studies, mean and standard deviation (SD) were estimated respectively as suggested by Wan et al. [27].(See Formula ( 1) and (2), Additional file 2) If the measurement of right eye and left eye were grouped and reported separately in the studies, we combined the two subgroups [26] (See Formula (3), Additional file 2).Heterogeneity was tested using Cochran's Q-statistics chi-square test and I 2 -statistic.If significant heterogeneity was found between the studies (P < 0.1 or I 2 ≥ 50%), a random-effect model was used for meta-analysis.Otherwise, a fixed-effect model was used.

Characteristics of included studies
Table 2 summarizes the characteristics of the nine included studies.A total of 1012 eyes were included in the nine studies, in which 693 were cases and 319 were controls.Seven studies were conducted in China recruiting Chinese subjects [14-16, 18, 19, 21], while the remaining two studies were conducted in Italy and Egypt recruiting Italians [17] and North Africans subjects [22] respectively.The age ranged from 33.5 [20] to 54.1 [18], while the sample sizes ranged from 64 [17] to 222 eyes [15].Given that the CAS of a study [15,17] could not be retrieved, the CAS ranged from 1 [14,16,17] to 4.6 [21].Around 78% (seven out of nine) studies discussed diagnostic purposes [14-18, 20, 21], in which four studies compared both active and inactive TED with healthy controls (HCs) [14,16,20,21], two studies compared active TED with inactive TED [17,18], while one study used GD as control to make comparisons with both active and inactive TED [15].The other two studies focused on grading [22] and therapeutic purposes [19] respectively; whilst the former compared mild and moderate-to-severe TED with HCs, the latter compared responsive to unresponsive group to glucocorticoid (GC) therapy in patients with active and moderate-to-severe TED.In terms of the MRI parameters used, three studies [14][15][16] looked into LG dimensions, among which all three studies reported MCA, MCL, and MCW, while only two studies [14,16] reported MAA, MAL and MAW.Two studies investigated LG herniation in T2WIfat suppression (T2WI-FS) [17,18], two studies explored SIR in T2WI [16,19], and two studies studied DWI-ADC [15,20].For the two studies that studied DTI-FA [21,22], one study reported DTI-ADC [21], while the other reported DTI-MD [22].The definitions of the MRI parameters are consistent among the included studies.For structural parameters, MCA is defined as LG area in the coronal image in which the LG is the largest, as shown in Fig. 2. MCL is defined as the distance between the superior tip and the inferior tip of LG in the coronal cut where MCA is obtained.MCW is defined as the widest distance perpendicular to the length (MCL) within the LG.The same principle applies to the axial parameters.MAA is defined as LG area in the axial image in which the LG is the largest, as shown in Fig. 3. MAL is defined as the distance between the anterior tip and the posterior tip of LG in the axial cut where MAA is obtained.MAW is defined as the widest distance perpendicular to the length (MAL) within the LG.LGH is defined as the distance between the anterior tip of LG and the interzygomatic line as shown in Fig. 4. For functional parameters, DWI-ADC, DTI-ADC (or MD) and DTI-FA were obtained by first placing a region of interest in the LG which has the largest cross-sectional area, and then measuring the value of ADC, MA or FA of that region of interest in DTI or DWI scan.In our paper, we combine the findings of both DTI-ADC and MD together since they both reflect the magnitude of water diffusion.

Risks of bias in included studies
Table 3 summarized the risk of bias assessment using a modified scale adapted from the Newcastle-Ottawa Scale (NOS) for cohort studies for all our nine selected cross-sectional studies.For more in-depth details, see Supplementary Table 1, Additional File 1.
Among the nine studies, except for one which scored six [14], all of them scored seven or above, indicating that they have a low risk of inducing bias [15][16][17][18][19][20][21][22].Six of them (67%) adopted convenience sampling by choosing consecutive patients with TED [15,17,18,[20][21][22].Two of them (22%) lacked detailed descriptions on the recruitment method of subjects [14,22].In terms of sample size, all studies did not justify nor show relevant sample size calculation.They all lacked an explanation on the expected sample size to provide a statistically significant information [14][15][16][17][18][19][20][21][22].For outcome assessment, five of them (56%) involved more than one assessor who were blinded to the clinical condition of subjects to independently evaluate the MRI results.The intra-or inter-observer variability were appropriately adjusted using relevant statistical methods [15,18,19,21,22].Two of them did not mention whether the assessors were blinded or not [16,17]; while one of them involved only one blinded assessor and did not mention the correction of intra-observer variability [20].Otherwise, all studies had satisfactory response rates and established characteristics of the subjects, included CAS to ascertain the exposure of subjects (i.e. the status of active or inactive TED), adjusted age and sex as confounders, and clearly stated the appropriate statistical test for data analysis [14][15][16][17][18][19][20][21][22].

Outcome measures i. Active vs inactive
Table 4 summarized the MRI parameters used in the included studies to compare between active TED group and inactive TED group.
We conducted a meta-analysis on eight MRI measurements of active TED patients with inactive TED patients as control group, including LG herniation, LG dimensional parameters (MAA, MCA, MAL, MCL, MAW, MCW) and DWI-ADC.The results are shown in Fig. 5, and the summary is shown in Table 5.Two to three studies were included in each outcome measures.In MAA, MAW and DWI-ADC, there were statistically significant heterogeneity.

ii. TED vs control
Table 6 summarized the MRI parameters used in the included studies to compare between TED group and healthy control group.
We also conducted a meta-analysis on nine MRI measurements of TED patients compared to control group, including LG dimensional parameters (MAA, MCA, MAL, MCL, MAW, MCW), DTI-FA, DTI-ADC/MD, and DWI-ADC.All included studies used healthy subjects as control group except Wu, who used Grave's disease patients without TED as control [15].The results are shown in Fig. 6, and the summary is shown in Table 7. Two to three studies were included in each outcome measures.In MCA, MCL, MCW, DTI-FA and DWI-ADC, there were statistically significant heterogeneity.

Discussion
This systematic review and meta-analysis focused on the MRI measurement on LG of TED patients.Two to three studies were included in the meta-analyses.Active TED patient group has significantly larger LGH, larger MCA and larger DWI-ADC value than inactive TED patients.TED patient group was significantly larger in five dimensional parameters (MAA, MCA, MAL, MAW, MCW) and DTI-ADC/MD, and was significantly lower in DTI-FA than health controls.

MRI LG parameters comparisons i. Active TED vs. inactive TED
In the comparison between active TED patients and inactive TED patients, we found that active TED patient  group has significantly larger LGH, larger MCA and larger DWI-ADC value.This implies that these three parameters are potential parameters to differentiate active TED patients from the inactive ones.Out of the seven structural parameters, only LGH and MCA have significant difference.In contrast to functional MRI parameters, structural parameters measure physical LG characteristics to indirectly reflect the degree of inflammation, which is generally more severe in active TED.As a result, structural MRI parameters may not be superior to differentiate active TED patients from inactive ones, comparing with functional MRI parameters.Out of the three parameters showing significant differences, LGH (I 2 = 0%; Fig. 5a) and MCA (I 2 = 16%; Fig. 5c) have insignificant heterogeneity, while DWI-ADC (I 2 = 93%; Fig. 5h) has substantial heterogeneity.The high heterogeneity may affect the validity of the result.
ii. TED vs. control In the comparison between TED patients and control, TED patient group was significantly larger in five dimensional parameters (MAA, MCA, MAL, MAW, MCW) and DTI-ADC/MD, and was significantly lower in DTI-FA.This implies that these seven parameters are potential parameters to differentiate TED patients from healthy subjects.Five out of six structural parameters (MAA, MCA, MAL, MAW, MCW) show significant differences.As the difference in the severity of inflammation between TED patients and control is larger than that between active TED patients and the inactive ones, structural parameters can also differentiate TED patients from healthy subjects.

iii. Structural parameters
Among the dimensional parameters, area parameters perform better at differentiating active TED patients from the inactive ones and TED patients from healthy subjects than length and width parameters.Only MCA  can differentiate active TED patients from the inactive ones (Fig. 5c), while both MCA and MAA can differentiate TED patients from healthy subjects (Fig. 6a and 6b).
Length and width parameters cannot differentiate active TED patients from the inactive ones.The possible reason is that areas are two-dimensional entities.The differences between the groups are more prominent.Area parameters are also more accurate as they reflect changes in two dimensions.
LG volume may be an even better dimensional parameter because it is three-dimensional.Among the nine included studies, only the study by Hu (2016) measured LG volume [16].Hu's method for measuring LG volume requires delineating LG in all slides to obtain the areas, and then multiplying the sum of area and slice interval to compute the volume [16].It is much more labour-intensive to measure volume than area, as it   is required to delineate LG in all slides.In clinical practice, it is more difficult to manually measure LG volume for all patients.Measuring maximum area is easier and more practical.Among the six dimensional parameters, the best parameter is MCA and the worst parameter is MCL.MCA can differentiate active TED patients from the inactive ones (Fig. 5c), and TED patients from healthy subjects (Fig. 6b).In contrast, MCL cannot differentiate both (Figs.5e and 6d).

iv. Functional parameters
Substantial heterogeneity was observed in DWI-ADC of both comparisons (I 2 = 93% in active TED vs. inactive TED, Fig. 5h; I 2 = 98% in TED vs control, Fig. 6i).It is observed that DWI-ADC value in Wu's study is generally lower than that in Razek's study [15,20].Wu's method of measuring DWI-ADC value involved delineating the largest coronal area in T2 weighted sections, and then measuring the ADC value of that area in DWI sequence.The most hyperintense spot, which represents the area of most severe inflammation, may not be hit.In contrast, Razek's method involved placing region of interest directly in DWI sequence and measuring the ADC value.The difference in the method of measuring DWI-ADC value is a possible reason for the generally low DWI-ADC values in Wu's study, and thus the high heterogeneity in the meta-analyses.Another possible reason is ethnicity difference.Wu's study recruited Chinese subjects while Razek's study recruited Egyptians [15,20].

xxii. Structural vs. functional parameters
In evaluating the inflammatory activity, functional MRI parameters may be better than structural MRI parameters as functional parameters reflect directly on the level of metabolic activity.However, in both comparisons (i.e., active TED vs inactive TED and TED vs healthy control), structural and functional MRI parameters show comparable results in differentiating between two groups.Two (i.e., LGH and MCA) out of seven structural parameters, and one (i.e., DWI-ADC) out of one functional parameter can differentiate active TED patients from the inactive ones.Five (i.e., MAA, MCA, MAL, MAW and MCW) out of six structural parameters, and two (i.e., DTI-FA   [28].As a result, there was an inconclusive result in evaluating the superiority between structural and functional MRI parameters.

Clinical implications i. Clinical diagnosis of disease activity
Out of the nine included studies, seven studies (78%) were conducted on Chinese patients (Table 2).This could possibly be accounted by the more difficult diagnosis and management based on the clinical manifestations of Chinese patients with TED, where Lim et al. concluded that East Asians generally had fewer exophthalmos, upper eyelid retractions and edema than Caucasian patients, leading to more research interests in finding alternatives (e.g.imaging modalities) for earlier detection or diagnosis of TED [29].While five [14-16, 18, 21] out of these seven studies, plus two studies done in Italy [17] and Egypt [20], looked into the different LG parameters to aid the diagnosis of TED, we are the first study to do a meta-analysis on the data provided in all these studies.We have found that the LG parameters, both structural and functional, generally provide a more significant diagnostic value in differentiating TED from disease-free patients than in differentiating active from inactive TED patients.To be more precise, LG dimensional parameters including MAA, MCA, MAL, MAW, MCW (Fig. 6a-c, e, f ), as well as LG functional parameters including DTI-FA and DTA-ADC/MD (Fig. 6g, h) could possibly be used in clinical practices for differentiating TED from disease-free patients.This is compared to the fewer parameters, i.e., LG herniation, MCA, and DWI-ADC (Fig. 5a, c, h), that could possibly be used to differentiate active from inactive TED patients.This implies that we might take these LG MRI parameters into account when diagnosing TED in the future along with the traditional modified CAS.The possibility of creating a new scoring system for TED activity diagnosis incorporating LG MRI parameters may also be considered, especially among the Asian population.
ii. Grading of disease severity Among all the nine included studies, only a Chinese study by Rui et al. [22] took a step further to compare mild to moderate-severe TED patients and investigate the use of DTI parameters for grading TED severity.Thus, we could not perform a meta-analysis regarding this perspective.Based on the findings by Rui et al. [22], moderate-severe TED group had significantly lower DTI-FA, especially of medial rectus (MR) (P = 0.017), and higher DTI-MD (P = 0.021) than mild TED group.It also concluded that DTI parameters, especially FA, of MR were sensitive indicators that could help in the differentiation between mild and moderate-severe TED.From this result, we could see the potential role of LG DTI parameters in guiding the grading of TED severity and hence the management plan of TED patients more accurately.However, it is obvious that more studies need to be carried out to draw a more statistically significant conclusion.

iii. Prediction of therapeutic responses and prognosis
Similar to the above, among all the nine included studies, only a Chinese study by Hu et al. [19] compared the LG parameters, i.e., LG herniation and SIR (SIR-max, SIR-mean, SIR-min), of active and moderatesevere TED patients responsive to intravenous (IV) steroidal therapy after six months to those unresponsive patients.Thus, a meta-analysis regarding the therapeutic responses was not performed.Based on the sole results by Hu et al. [19], it is found that those responsive to IV steroids had a significantly larger LG herniation than those unresponsive (P = 0.019), while there were no statistically significant differences in SIR (SIR-max, SIR-mean, SIR-min) between the two groups (P = 0.514, 0.776 and 0.642 respectively).It summarized that the larger LG herniation could possibly be used to distinguish treatment responsive and unresponsive group.This could then possibly allow a wiser allocation of treatment plans, i.e., glucocorticoid therapy for responsive patients, and immunotherapies for unresponsive patients.With more studies investigating this aspect, a more accurate conclusion could then be drawn, and hence more targeted treatment plans could be made for patients to improve their disease prognosis.

Use of (LG) imaging in managing other orbital/ inflammatory diseases
Apart from the LG parameters studied in our systematic review and meta-analysis for the diagnosis, grading, and prediction of therapeutic responses in TED, in fact, different LG parameters have gained an emerging role in the assessment of other orbital or inflammatory diseases.For instance, one of the differential diagnoses of dry eyes is primary Sjogren's syndrome (pSS), which is an autoimmune disease affecting the salivary and LGs, causing dryness of mouth and eyes [30].However, its clinical diagnosis is difficult due to its non-specific signs and symptoms [31].While the current diagnostic criteria related to the orbit involves Schirmer's test and ocular dry scores [32,33], discomfort to patients and uneasy interpretation arises respectively [31].Hence, different studies have tried to look for different LG parameters by non-invasive imaging to aid pSS diagnosis.For example, the change in LG size and enhanced signal intensity with accelerated fat deposition in MRI could predict pSS stages [34]; the significantly lower DWI-ADC of LG may suggest LG abnormalities in pSS patients [35]; the lower 11C-MET uptake by LG in PET-CT scan has found to have positive correlation with reduced tear flow [36].Another example of LG-related orbital disease is IgG4-related disease (IgG4-RD), which is a fibroinflammatory disease with lymphoplasmacytic IgG4-positve plasma cells infiltration to multiple organ tissues that could involve the orbit, in case of IgG4-related ophthalmic disease (IgG4-ROD) [37].Its current diagnosis is based on the typical organ dysfunctions or structural changes (i.e., swelling), high serum IgG4 titer, and histopathological results from biopsy which is invasive, while imaging could also serve as a non-invasive tool to aid the diagnosis [38].For instance, it is found that the hypointense and enlarged LG on T2W MRI [39], and the higher uptake of 68 Ga-FAPI in PET-CT by LG could aid the diagnosis and assessment of IgG4-RD [40].While both pSS and IgG4-RD could result in enlarged LG, the infraorbital nerve enlargement (IONE) in MRI could act as a specific MRI sign of IgG4-ROD [41].Besides, IONE could also help differentiate IgG4-ROD from other lymphoproliferative orbital diseases, including lymphoma, reactive lymphoid hyperplasia, and idiopathic or other orbital inflammation [42].
From these examples, we could see an increasingly important role of non-invasive imaging techniques, as well as the rising role of LG parameters on different imaging to aid the diagnosis of various orbital or inflammatory diseases in which ophthalmologists are of particular interests.

Limitations
There were few limitations in our systematic review.First, only two to three studies were included under each outcome measures.The pooled sample sizes may not be large enough to draw a clinically significant conclusion due to random sampling error.Secondly, if heterogeneity was found to be significant by Cochran's Q-statistics chi-square test and I 2 -statistic, subgroup analysis cannot be performed as each subgroup would consist of one to two studies only.As a result, heterogeneity may be significant and may affect the validity of the result of meta-analysis.Further investigations may be needed to explore the reasons behind the high heterogeneity, such as ethnicity differences and scanner dependent differences.Thirdly, the few numbers of included studies also reflects that the field of MRI on lacrimal gland in thyroid eye disease requires further studies.
Out of the nine included studies, seven studies were written by Chinese authors and recruited Chinese subjects [14-16, 18, 19, 21, 22].Care in application of results to other ethnicities should be considered.Previous studies have demonstrated that differences in clinical manifestations of TED exist between East Asian and Caucasian patients [29,43].Radiological differences can arise between different ethnicities, affecting the representativeness of this meta-analysis as most included patients are Chinese.

Future insights
This is the first systematic review and meta-analysis on the use of MRI LG parameters in TED patients.MRI is a non-invasive imaging modality that can effectively guide the management of TED patients.While the current number of studies on MRI LG parameters is limited, where only two to three studies focus on each parameter, more studies with larger sample sizes and from a wider range of ethnicities would be warranted.The potential LG imaging markers for TED, especially in the aspects of disease grading and therapeutic responses prediction are still under investigations.The use of MRI, which is non-invasive, safe, highly sensitive, could possibly be a rising trend for the diagnosis of TED or other orbital diseases.

Conclusions
The systematic review and meta-analyses suggest that lacrimal gland herniation, maximum coronal area, and DWI-ADC are able to detect TED patients with active diseases.Maximum axial area, maximum coronal area, maximum axial length, maximum axial width, maximum coronal width, DTI-FA, and DTI-ADC/MD are able to differentiate TED patients from healthy controls.Further studies on the use of MRI on lacrimal gland in the field of thyroid eye disease are warranted to confirm our results.
• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ? Choose BMC and benefit from:

Fig. 5
Fig. 5 Lacrimal gland parameters in active TED and inactive TED groups.SD = standard deviation; IV = inverse variance; CI = confidence interval

Fig. 6
Fig. 6 Lacrimal gland parameters in TED and control groups.SD = standard deviation; IV = inverse variance; CI = confidence interval

Table 2
Characteristics of included studies in our systematic review and meta-analysis ADC Apparent diffusion coefficient, CAS Clinical activity score, DTI Diffusion tensor imaging, DWI Diffusion-weighted imaging, F female, FA Fractional anisotropy, GC Glucocorticoid, GD Graves' disease, HCs Healthy controls, LE left eye, LGH Lacrimal gland herniation, M Male, MAA Maximum axial area, MCA Maximum coronal area, MAL Maximum axial length, MCL Maximum coronal length, MAW Maximum axial width, MCW Maximum coronal width, MD Mean diffusivity, RE Right eye, SD Standard deviation, T2WI-FS T2-weighted imaging-fat suppression, TED Thyroid eye disease a Sample size is expressed in an eye-basis ^Gender is expressed in an eye-basis

Table 8
summarized other MRI parameters used in the included studies for grading or therapeutic purposes.

Table 3
Risk of bias summaryDetails of criteria of each item could be found in supplementary materials.Max Maximum

Table 4
Summary of MRI parameters of Active TED group vs. Inactive TED group in included studies

Table 5
Meta-analyses of MRI parameters of Active TED group vs. Inactive TED group

Table 6
Summary of MRI parameters of TED group vs. Healthy control group in included studies

Table 7
Meta-analyses of MRI parameters of TED group vs. Healthy control group

Table 8
Summary of other MRI parameters for grading or therapeutic purposes