Integrated bioinformatics analysis of aberrantly-methylated differentially-expressed genes and pathways in age-related macular degeneration

Background Age-related macular degeneration (AMD) represents the leading cause of visual impairment in the aging population. The goal of this study was to identify aberrantly-methylated, differentially-expressed genes (MDEGs) in AMD and explore the involved pathways via integrated bioinformatics analysis. Methods Data from expression profile GSE29801 and methylation profile GSE102952 were obtained from the Gene Expression Omnibus database. We analyzed differentially-methylated genes and differentially-expressed genes using R software. Functional enrichment and protein–protein interaction (PPI) network analysis were performed using the R package and Search Tool for the Retrieval of Interacting Genes online database. Hub genes were identified using Cytoscape. Results In total, 827 and 592 genes showed high and low expression, respectively, in GSE29801; 4117 hyper-methylated genes and 511 hypo-methylated genes were detected in GSE102952. Based on overlap, we categorized 153 genes as hyper-methylated, low-expression genes (Hyper-LGs) and 24 genes as hypo-methylated, high-expression genes (Hypo-HGs). Four Hyper-LGs (CKB, PPP3CA, TGFB2, SOCS2) overlapped with AMD risk genes in the Public Health Genomics and Precision Health Knowledge Base. KEGG pathway enrichment analysis indicated that Hypo-HGs were enriched in the calcium signaling pathway, whereas Hyper-LGs were enriched in sphingolipid metabolism. In GO analysis, Hypo-HGs were enriched in fibroblast migration, membrane raft, and coenzyme binding, among others. Hyper-LGs were enriched in mRNA transport, nuclear speck, and DNA binding, among others. In PPI network analysis, 23 nodes and two edges were established from Hypo-HGs, and 151 nodes and 73 edges were established from Hyper-LGs. Hub genes (DHX9, MAPT, PAX6) showed the greatest overlap. Conclusion This study revealed potentially aberrantly MDEGs and pathways in AMD, which might improve the understanding of this disease.


Background
Age-related macular degeneration (AMD) is the leading cause of adult blindness in developed countries, particularly in those over 55 years of age. Worldwide, this condition accounts for 7% of cases of blindness [1,2] and is expected to affect 288 million people by the year 2040 [3]. AMD first appears as drusen (dry AMD) and advances to late AMD (wet AMD) characterized by choroidal neovascularization (CNV) [4]. Drusen is composed of lipoproteinaceous deposits and acellular debris [5]. CNV involves the growth of new abnormal blood vessels originating from the choroid through a break in the Bruch's membrane, which then invade the subretinal pigment epithelium or sub-retinal space, resulting in severe vision loss [6,7]. AMD affects central fine vision, significantly impairs a patient's ability to drive, read, and recognize faces, and greatly affects quality of life [8]. As for wet-AMD (i.e. advanced stages), anti-vascular endothelial growth factor (anti-VEGF) therapy was shown to be effective and has become the first choice for the treatment of CNV [9]. However, anti-VEGF therapy requires repeated intra-vitreal injections, which are associated with a risk of infection and treatment burden for both the patients and the ophthalmologists [10]. Moreover, some patients have poor response to the drugs after a long-term treatment [11]. Apart from anti-VEGF drugs, some other therapies have also impacted macular disease treatment and showed their effectiveness, such as dexamethasone implant [12][13][14].
Previous studies indicated that many environmental factors are associated with an increased risk of AMD, such as age, race, smoking, obesity, and hypertension [15,16]. Additionally, genetic factors are regarded as important for the initiation and progression of AMD [17,18]. Comparing to tumor tissue samples, ocular fundus tissue samples (i.e. retina and choroid) of patients with AMD are quite difficult to obtain in real world. The difficulties to obtain human fundus tissues restricted our understanding of this blinding disease. Over the past few years, most genetic studies on AMD were case-control genome-wide association studies (GWASs) of single-nucleotide polymorphisms in the patients' peripheral blood [19][20][21], which were valuable but provided limited information.
With the rapid development of gene assay technology, studies of disease pathogenesis are no longer limited to gene deletions, gene mutations, and gene insertions, among other changes. Microarrays based on highthroughput platforms are useful and efficient tools to search for meaningful genes and epigenetic alterations for the identification of diagnostic or prognostic biomarkers [22]. To better explore the molecular mechanism underlying AMD, it is necessary to conduct full transcriptome analysis at the tissue level; however, as mentioned previously herein, obtaining ocular tissue samples is difficult. Compared to retina or choroid tissue, blood samples are relatively easy to obtain from patients with AMD. Therefore, it would be helpful to evaluate gene expression in the ocular tissue of patients with AMD as biomarkers in the blood. In the present study, data from gene expression profiling microarrays of human retinal and choroidal samples from the Iowa and Oregon cohort AMD and control donors (GSE29801: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2 9801) [23], as well as gene methylation profiling microarrays of the peripheral blood of subjects with AMD (GSE102952: https://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSE102952), were integrated and analyzed using a series of bioinformatics tools. More precise screening results were obtained by overlapping these two AMD data sets. Few studies have attempted to combine gene expression profile microarrays and gene methylation profile microarrays to understand the development of AMD. The introduction of DNA methylation characteristics in the blood is useful to understand the characteristics of AMD disease at the tissue level.
In the present study, data from gene expression profiling microarrays and gene methylation profiling microarrays were integrated and analyzed. Functional enrichment and protein-protein interaction (PPI) network analyses of screened genes were performed using the R package "clus-terProfiler" and Search Tool for the Retrieval of Interacting Genes (STRING) online database. We identified methylated genes in the peripheral blood, which might be useful as biomarkers for the precise diagnosis and treatment of AMD.

Methods
The need for ethics approval was waived by the ethics committee of Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (the document is attached as an additional file).

Microarray data information
We identified methylated, differentially-expressed genes (MDEGs) between AMD and control samples by analyzing mRNA microarray and methylation profiling datasets. In this study, a gene expression profiling dataset (GSE29801: https://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE29801) and gene methylation dataset (GSE102952: https://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE102952) were downloaded from the Gene Expression Omnibus (https://www.ncbi. nlm.nih. gov/geo/) of the National Center for Biotechnology Information. The gene expression profiling data were based on the mRNA from the macular regions of human donor eyes from the retina and retinal pigment epithelium (RPE)-choroids. The gene methylation microarray data were assessed using genome-wide DNA methylation profiling of peripheral blood.
Microarray data from GSE29801 included 177 samples from the macular or extramacular regions of human donor eye RPE-choroids and 118 samples from the macular or extramacular region of human donor retinas with no reported ocular disease, with possible preclinical AMD or AMD [23]. RPE-choroid and retinal samples were isolated from human donor eyes obtained from the University of Iowa (GSH) and Lions Eye Bank of Oregon. The Iowa eyes were selected from a well-characterized repository derived from more than 3900 donors. The Oregon eyes were generally classified as AMD based on medical histories confirmed by ophthalmological records [23,24]. Global transcriptome profiling was carried out using the Agilent Whole Human Genome 4 × 44 K in situ oligonucleotide array platform (G4112F, Agilent Technologies, Santa Clara, CA, USA). After removing redundant data, the microarray data of 41 patients with AMD and 42 normal controls were included in our analysis [23]. First, the authors removed redundant data and selected the subjects who had both macular retina and macular RPE-choroid records. Such selection criteria were implemented due to the fact that AMD mainly affects central vision acuity (i.e. the central macular area). Second, we compared the gene expression levels of macular retina and macular RPE-choroid, then, we chose the higher one of them to conduct the further bioinformatics analysis.
For the gene methylation microarray data, Oliver et al. previously performed genome-wide DNA methylation profiling of blood from nine patients with AMD and nine controls based on the GPL13534 Illumina Human-Methylation 450 BeadChip platform (HumanMethyla-tion450_15017482, San Diego, CA, USA), which covers approximately 450,000 CpG sites in different gene regions including the transcription start sites 1500 and 200, 5′ untranslated region, 1st exon, body, and 3′ untranslated region. The authors generously shared their original data online for public use (GSE102952: https://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE102952). The methylation profile data of all nine patients with AMD and nine normal controls were included in this analysis.
Data processing to identify differentially-expressed genes (DEGs) and differentially-methylated genes (DMGs) We used the Limma package in R software (version 3.4.2; Bell Laboratories, formerly AT&T, now Lucent Technologies, Murray Hill, NJ, USA) to analyze GSE29801 and GSE102952 to identify DEGs and DMGs. P < 0.05 was regarded as statistically significant. For DEGs, we set the cut-off standard as P < 0.05 and the absolute value of the log (fold-change) > the median (fold-change). For DMGs, we set the cut-off criteria as P < 0.05 and absolute value of log (fold-change) > 3/4 sort [summary log (fold-change)]. For further analysis, hypomethylation-high expression genes (Hypo-HGs) were obtained by overlapping hypomethylated and upregulated genes; hypermethylation-low expression genes (Hyper-LGs) were obtained by overlapping hypermethylated and downregulated genes.

Functional enrichment analysis
Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed for the selected genes (Hypo-HGs and Hyper-LGs), and the enrichment results were illustrated using the R package "clusterProfiler". This package can be used to extract biological meaning from large gene lists. We performed GO term enrichment analysis under the following three sub-ontologies: biological process (BP), molecular function (MF), and cellular component (CC). The cut-off criterion of significantly-enriched KEGG pathways was P < 0.05.

Comprehensive PPI network
Determination of the comprehensive PPI network is important to detect the molecular mechanisms of AMD. In this study, we used the online STRING (version 11.0) tool to construct the network of Hypo-HGs and Hyper-LGs. STRING is an online database used to predict PPIs, which are essential to interpret the molecular mechanisms of key cellular activities in AMD. The cut-off standard was defined as an interaction score of 0.4. The results were visualized in Cytoscape software (version 3.5.1). Hub genes were defined as the top three genes that appeared most frequently according to all cyto-Hubba ranking methods using Cytoscape software. Subsequently, the Molecular Complex Detection (MCODE) algorithm in Cytoscape software was used to screen the modules. An MCODE score > 3 and node number > 3 were used as the criteria to define a module.

Identification of DEGs and DMGs in AMD
To identify DEGs or DMGs, we used the expression profile from GSE29801 (containing RPE-choroid and retina tissue samples from 41 patients with AMD and 42 normal samples) and the methylation profile from GSE102952 (containing peripheral blood samples of nine AMD patients and nine normal controls) after data preprocessing and quality assessment using R software. We identified 827 high-expression genes and 592 lowexpression genes and 4117 hypermethylated genes and 511 hypomethylated genes. The top 100 most significant DEGs of GSE29801 are shown in Table S1, and the top 100 most significant DMGs of GSE102952 are shown in Table S2.

Identification of aberrantly-methylated DEGs in AMD
To further explore the aberrantly-methylated DEGs, Hypo-HGs were obtained by overlapping hypomethylated and upregulated genes, whereas Hyper-LGs were obtained by overlapping hypermethylated and downregulated genes. We identified 24 Hypo-HGs and 153 Hyper-LGs. The flowchart of this study is presented in Fig. 1. All genes are shown in Table S3. To confirm the reliability of the results, we compared aberrantly MDEGs from the GSE29801 and GSE102952 datasets with genes in the Public Health Genomics and Precision Health Knowledge Base (PHGKB; version. 5.8) (https://phgkb. cdc.gov/PHGKB/startPagePhenoPedia.action) studied in the category of "macular degeneration"; the results are shown in Fig. 2. We screened four Hyper-LGs (CKB, Fig. 1 The flowchart of this study. The data of expression profiling GSE29801 and methylation profiling GSE102952 were obtained from Gene Expression Omnibus (GEO) database. As a result, we identified 827 high-expression genes and 592 low expression genes; 4117 hypermethylated genes and 511 hypomethylated genes. To further explore the aberrantly methylated differentially expressed genes, Hypo-HGs were obtained by overlapping hypomethylation and up-regulated genes; Hyper-LGs were obtained by overlapping hypermethylation and down-regulated genes. In total, we identified 24 Hypo-HGs and 153 Hyper-LGs. Then, functional enrichment analysis and protein-protein interaction (PPI) network analysis of screened genes were performed Fig. 2 The results of overlapping GSE29801 and GSE102952 datasets with the genes reported to be related to AMD in Public Health Genomics and Precision Health Knowledge Base (PHGKB). a Overlapping of hypomethylation-high expression genes (Hypo-HGs) with the genes reported to be related to AMD in PHGKB. b Overlapping of hypermethylation-low expression genes (Hyper-LGs) with the genes reported to be related to AMD in PHGKB PPP3CA, TGFB2, and SOCS2) that overlapped with potential AMD risk genes in the PHGKB. However, no Hypo-HG overlapped in the PHGKB.

Functional enrichment analysis
Functional enrichment analysis was conducted using the R package "clusterProfiler", and all significantly enriched KEGG pathways associated with the 24 Hypo-HGs and 153 Hyper-LGs are shown in Table 1. The results of KEGG pathway enrichment analysis indicated that Hypo-HGs were significantly enriched in the phosphatidylinositol signaling system and calcium signaling pathway, whereas Hyper-LGs were significantly enriched in amphetamine addiction, morphine addiction, and sphingolipid metabolism. The top five GO terms in each category in which the 24 Hypo-HGs and 153 Hyper-LGs were significantly involved are shown in Tables 2 and 3, respectively. Functional enrichment analysis suggested that the 24 Hypo-HGs were enriched in the BP of fibroblast migration and positive regulation of neurological system process. The GO CC category revealed enrichment in membrane raft and membrane microdomain. The MF category showed enrichment in factors involved in neuropeptide receptor activity and coenzyme binding   Table 2. Gene ontology (GO) pathway enrichment analyses were performed for the selected genes. The cut-off criterion was P < 0.05. The top 5 GO terms in each category in which the 24 hypo-methylated, high-expression genes were significantly involved are shown. They were enriched in the biological process of fibroblast migration and positive regulation of neurological system process. The cellular component category revealed enrichment in membrane raft and membrane microdomain. The molecular function category showed enrichment for factors involved in neuropeptide receptor activity and coenzyme binding   Table 3. Gene ontology (GO) pathway enrichment analyses were performed for the selected genes. The cut-off criterion was P < 0.05. The top 5 GO terms in each category in which the 153 hyper-methylated, lowexpression genes were significantly involved are shown. They were enriched in the biological process of mRNA transport, etc. The cellular component category revealed enrichment in nuclear speck, neuronal cell body, etc. The molecular function category indicated enrichment in DNA binding and phosphoric ester hydrolase activity ( Table 2). The 153 Hyper-LGs were enriched in the BP category of mRNA transport, among others. The CC category revealed enrichment in nuclear speck and neuronal cell body, among others. The MF category indicated enrichment in DNA binding and phosphoric ester hydrolase activity (Table 3).

Comprehensive gene regulation network
The STRING database was used to construct PPI networks. Ultimately, 23 nodes and two edges were established from the Hypo-HGs and 151 nodes and 73 edges were established from the Hyper-LGs. The results are shown in Figs. 3 and 4. We use Cytoscape software to determine the largest subgroup interaction network of Hyper-LG genes (Fig. 5). We identified hub genes using cytoHubba and confirmed that DHX9, MAPT, and PAX6 showed the greatest overlap. Two-core module analysis of this subgroup network of Hyper-LG genes was performed, including module1 comprising HNRNPA3, DHX9, SRSF11, and SLU7 and module2 comprising SOX1, PAX6, and DLX2.

Discussion
AMD is a disease with complex inheritance and epigenetic changes [5]. Identification of the underlying genes has been difficult. Both genomic screening (locational) and candidate gene (functional) approaches have been used. Based on numerous genetic studies of AMD, approximately 50% of the heritability of AMD can be explained by two major loci harboring coding and non-coding variations at chromosomes 1q (CFH) and 10q (ARMS2/HTRA1) [25][26][27][28]. Recently, a large GWAS highlighted new genes and pathways involved in the development of AMD, including complement activation, collagen synthesis, lipid metabolism/cholesterol transport, receptor-mediated endocytosis, endodermal cell differentiation, and extracellular matrix organization, indicating that many unknown genetic changes remain to be identified with respect to the initiation and development of AMD [20]. The application of novel drugs in the treatment of macular disease also indicated the complicated change of the micro-environment of the macular in the case of disease [29][30][31]. In this study, we screened novel biomarkers by combining microarray information from RPE-choroid and retinal tissue samples from patients with AMD, as well as peripheral blood samples, by overlapping relevant datasets (GSE29801 and GSE10295) using integrated bioinformatics analysis for available microarray data. This is the first study to employ this approach. The Hyper-LGs identified are potential biomarkers of AMD based on methylation microarrays for pre-clinical detection in peripheral blood. Among them, four Hyper-LGs (CKB, PPP3CA, TGFB2, and SOCS2) overlapped with risk genes in the category of "macular degeneration" in the PHGKB. One study revealed that CKB is unlikely to explain a significant portion of the risk of developing AMD in a family-based association dataset including 162 families and an independent case-control dataset of 399 cases and 159 fully evaluated controls [32]. PPP3CA is a druggable molecule that inactivates Fig. 3 The protein-protein interaction (PPI) networks of the 24 hypomethylation-high expression genes (Hypo-HGs). 23 nodes and 2 edges were established from the Hypo-HGs MAP3K5 but has not been widely investigated for its role in AMD. One previous study revealed AMD-related sequence variants in genes encoding PPP3CA, underlying its relationship with AMD [33]. TGFB2 induces RPE cell and collagen gel contraction. Subretinal fibrosis contributes to the loss of vision associated with AMD, and RPE cells play a key role in the fibrotic reaction [34]. Under hypoxic conditions, RPE cells can increase the secretion of TGFB2 and induce epithelial-mesenchymal transition, resulting in the formation of scar-like fibrous tissue in AMD [35]. Targeted inhibition of TGFB signaling might be an effective approach to retard AMD progression [36]. SOCS proteins are modulators of cytokine and growth factor signaling, and their aberrant regulation has been linked to a variety of inflammatory and neoplastic diseases [37]. In a GWAS of 919 patients with exudative AMD treated with intravitreal ranibizumab, SOCS2 was a candidate gene for which levels were associated with visual loss at month three [38]. These results provide insight into AMD pathogenesis but must be confirmed by in vivo and in vitro experiments. The methylation patterns of PPP3CA, TGFB2, and SOCS2 in AMD have not been previously described. We found that these genes were hypermethylated and expressed at low levels, suggesting that the aberrant methylation of these genes affects the pathogenesis of AMD. No Hypo-HGs overlapped in the PHGKB, likely because of the limited number of genes identified.
Among the top five pathways identified by KEGG and GO analyses, calcium signaling [39,40], sphingolipid  [41,42], fibroblast migration [43,44], membrane [45], coenzyme [46][47][48], and DNA binding [49] have been investigated in AMD. Calcium signaling, sphingolipid metabolism, and coenzyme categories showed strong relationships with AMD, whereas the others require further evaluation. Calcium signaling plays a vital role in RPE cell function. Intracellular calcium mobilization activates gene expression and the secretion of inflammatory cytokines such as interleukin-8 in human RPE cells [39]. Complement attack on RPE cells, leading to cell death, is also modulated by extracellular calcium and intracellular signaling mechanisms [40]. Sphingosine 1-phosphate is a potent lipid mediator that modulates inflammatory responses and proangiogenic factors, and it has been suggested that this protein upregulates CNV and is deeply involved in the pathogenesis of exudative AMD [42]. Free radicals play a pathogenic role in AMD, whereas coenzyme Q10 has a protective effect [48]. A combination of acetyl-L-carnitine, n-3 fatty acids, and coenzyme Q10 was shown to be beneficial for visual functions in early AMD [47]. However, drug metabolism pathways such as amphetamine addiction and morphine addiction could have been identified by chance and might not be related to AMD. The specific manner in which the other pathways affect AMD development and progression must be further investigated.
In the PPI network, 23 nodes and two edges were established from the Hypo-HGs and 151 nodes and 73 edges were established from the Hyper-LGs. DHX9, MAPT, and PAX6 were identified as hub genes. Two core modules for Hyper-LGs were structured, including module1 comprising HNRNPA3, DHX9, SRSF11, and SLU7 and module2 comprising SOX1, PAX6, and DLX2. Among the hub genes and core modules previously mentioned, PAX6 is expressed in retinal progenitor cells throughout retinogenesis [50]. PAX6 is a novel regulatory gene among RPE transcription factors that controls the timing of RPE differentiation and adjacent choroid maturation, suggesting that PAX6 is involved in choroid development during the pathogenesis of AMD [51]. Other genes have not been previously investigated with respect to AMD. This study aimed to find potential biomarkers of AMD based on public datasets and bioinformatics methods. However, the results of this study were not strong enough to switch the diagnosis and treatment of AMD so far. These years, ophthalmology has experienced significant developments with respect to imaging modalities. Optical coherence tomography (OCT) is a noninvasive imaging modality that produces high-resolution, cross-sectional images of ocular tissues. Compared to time-domain OCT, spectral-domain OCT yields a higher degree of axial resolution and provides more detailed views of intraretinal structure [52]. Swept-source OCT can offer improved images of the choroid and pigmented lesions [53]. The development of OCT benefits to the diagnosis and follow-up of AMD, and we guess the early detection based on MDEGs might help to identify AMD patients before the clinical symptoms appear. It might be possible to develop detection reagents in the blood for early detection and screening of AMD in the future.
There were some limitations of this study. First, we focused on Hyper-LGs and Hypo-HGs without analyzing contra-regulated genes; thus, further analysis is required to evaluate these genes. Second, our study was limited to only two datasets, and we did not conduct validation based on animals or patient samples. Thus, the results are preliminary and larger sample sizes as well as further fundamental experiments are needed to confirm these results. Third, the clinical characteristics of AMD patients included were not analyzed because these data were not available, and thus, the results should be conservatively interpreted.

Conclusions
In summary, data from gene expression profiling microarrays and gene methylation profiling microarrays of patients with AMD were integrated and analyzed using a series of bioinformatics tools. Our results indicated aberrantly MDEGs (PPP3CA, TGFB2, and SOCS2) and pathways (calcium signaling, sphingolipid metabolism, fibroblast migration, membrane, coenzyme, and DNA binding) associated with AMD. These genes might serve as biomarkers for the precise diagnosis and treatment of AMD. Further studies are needed to confirm the functional significance of the identified genes and pathways in AMD.
waived by the ethics committee of Shanghai General Hospital, Shanghai Jiao