Biopolym. Cell. 2017; 33(6):442-452.
Bioinformatics
Creation of gene expression database on preeclampsia-affected human placenta
1Lykhenko O., 1Frolova A. O., 1Obolenskaya M. Yu.
  1. Institute of Molecular Biology and Genetics, NAS of Ukraine
    150, Akademika Zabolotnoho Str., Kyiv, Ukraine, 03680

Abstract

Publication of gene expression raw data in public repositories made it possible to reuse these data for cross-experiment integrative analysis and make new insights into biological phenomena. However, data uploaded by independent contributors are not standardized, sometimes incomplete and need preprocessing before any further analysis. Aim. To create a specialized database of gene expression profiles, particularly in preeclampsia-affected human placenta as a cause of high rate of morbidity and mortality all over the world with un-known etiology and pathogenesis. Methods. All experiment and sample metadata were automati-cally extracted from ArrayExpress database via Bioservices. NCBI database was used to supple-ment the missing data along with the corresponding scientific articles and authors personal data. The experimental sample attributes were standardized according to MeSH term dictionary and Experimental Factor Ontology. Results. A database of more than 1000 samples of normal and preeclampsia-affected human placenta was created and supplied with metadata containing infor-mation on biological specimen, diagnosis, gestational age, mode of delivery and other sample characteristics. Conclusion. The samples in our newly created database now contain metadata for them to be comparable. The biological samples may be arranged in different case-control groups of larger size than in individual datasets for statistically significant analysis.
Keywords: genetic databases, placenta, preeclampsia, oligonucleotide array, sequence analysis, microarray data

References

[1] Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A. NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res. 2011;39(Database issue):D1005-10.
[2] Rustici G, Kolesnikov N, Brandizi M, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Ison J, Keays M, Kurbatova N, Malone J, Mani R, Mupo A, Pedro Pereira R, Pilicheva E, Rung J, Sharma A, Tang YA, Ternent T, Tikhonov A, Welter D, Williams E, Brazma A, Parkinson H, Sarkans U. ArrayExpress update--trends in database growth and links to data analysis tools. Nucleic Acids Res. 2013;41(Database issue):D987-90.
[3] Frolova A, Obolenska M. Integrative Approa-c-hes for Data Analysis in Systems Biology: Current Advances. In: Proceedings of Applied Physics and Engineering (YSF), 2016 II International Young Scientists Forum:194–98.
[4] Taminau J, Lazar C, Meganck S, Nowé A. Comparison of Merging and Meta-Analysis as Alternative Approaches for Integrative Gene Expression Analysis. ISRN Bioinform. 2014(2014): 345106.
[5] Walsh CJ, Hu P, Batt J, Santos CC. Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery. Microarrays (Basel). 2015;4(3):389-406.
[6] Sarkans U, Parkinson H, Lara GG, Oezcimen A, Sharma A, Abeygunawardena N, Contrino S, Holloway E, Rocca-Serra P, Mukherjee G, Shojatalab M, Kapushesky M, Sansone S-A, Farne A, Rayner T, Brazma A. The Ar-rayExpress gene expression database: A software engineering and implementation perspective. Bioinformatics. 2005; 21(8):1495–501.
[7] Leavey K, Bainbridge SA, Cox BJ. Large scale aggregate microarray analysis reveals three distinct molecular subclasses of human preeclampsia. PLoS One. 2015;10(2):e0116508.
[8] Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W, Zimmermann P. GENVESTIGATOR V.3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics. 2008; (2008):420747.
[9] Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W. GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol. 2004;136(1):2621-32. Erratum in: Plant Physiol. 2004 Dec;136(4):4335.
[10] Clauzel C, Foltête JC, Girardet X and Vuidel G. GENEVESTIGATOR, a High Performance Search Engine for Gene Expression. User Manual. 2016; no. May: 0–37.
[11] Speake C, Presnell S, Domico K, Zeitner B, Bjork A, Anderson D, Mason MJ, Whalen E, Vargas O, Popov D, Rinchai D, Jourde-Chiche N, Chiche L, Quinn C, Chaussabel D. An interactive web application for the dissemination of human systems immunology data. J Transl Med. 2015;13:196.
[12] Marr AK, Boughorbel S, Presnell S, Quinn C, Chaussabel D, Kino T. A curated transcriptome dataset collection to investigate the development and differentiation of the human placenta and its associated pathologies. Version 2. F1000Res. 2016 Mar 9 [revised 2016 Jan 1];5:305.
[13] Xia J, Fjell CD, Mayer ML, Pena OM, Wishart DS, Hancock RE. INMEX--a web-based tool for integrative meta-analysis of expression data. Nucleic Acids Res. 2013;41(Web Server issue):W63-70.
[14] Nelson DM, Burton GJ. A technical note to improve the reporting of studies of the human placenta. Placenta. 2011;32(2):195-6.