Biopolym. Cell. 2016; 32(1):70-79.
Bioinformatics
Computational analysis of microarray gene expression profiles of lung cancer
1Babichev S. A., 2Kornelyuk A. I., 3Lytvynenko V. I., 4Osypenko V. V.
  1. University of J. E. Purkyně in Ústí nad Labem
    1, Pasteur Str, Ústí nad Labem, Czech Republic, 400 96
  2. Institute of Molecular Biology and Genetics, NAS of Ukraine
    150, Akademika Zabolotnoho Str., Kyiv, Ukraine, 03680
  3. Kherson National Technical University
    24, Beryslavske sh, Kherson, Ukraine, 73008
  4. National University of Life and Environmental Sciences of Ukraine
    15, Heroyiv Oborony Str., Kyiv, Ukraine, 03041

Abstract

Aim. The article presents the researches on the optimization of the DNA microarray data processing, which is aimed at improving the quality of object clustering. Methods. Data preprocessing was performed with program R using Bioconductor package. Modelling the clustering process was made in the software environment KNIME using the program WEKA functions. Results. The data preprocessing is shown to be optimal while using such techniques as the background correction rma method, quantile normalization, mas PM correction and summarization by mas method. The simulation results have demonstrated a high effectiveness of the clustering algorithm Sota for this category of data. Conclusion. The results of the research have shown that improving the quality of biological object clustering is possible by means of hybridization and optimization of the methods and algorithms at different stages of data processing.
Keywords: clustering, gene expression, data preprocessing, DNA microchip

References

[1] Baldi P, Gatfield GW. DNA microarrays and gene expression: From experiments to data analysis modeling. Cambridge, Massachusetts, England: Cambridge University Press, 2002. 207 p.
[2] Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. Comput Methods Programs Biomed. 2015;119(3):163-80.
[3] Flores JL, Inza I, Larrañaga P, Calvo B. A new measure for gene expression biclustering based on non-parametric correlation. Comput Methods Programs Biomed. 2013;112(3):367-97.
[4] Kohane IS, Kho A, Butte AJ. Microarrays for an integrative genomics. Cambridge, Massachusetts, England: A Bradford book, the MIT press, 2003. 236 p.
[5] Ivakhno SS, Korneliuk OI. [Microarrays: technologies overview and data analysis]. Ukr Biokhim Zh (1999). 2004;76(2):5-19.
[6] Pontes B, Giráldez R, Aguilar-Ruiz JS. Biclustering on expression data: A review. J Biomed Inform. 2015 Jul 6. pii: S1532-0464(15)00138-0.
[7] Wang Z. Neuro-Fuzzy modeling for microarray cancer gene expression data. Thesis. Oxford University Computing Laboratory, 2005. 107 p.
[8] Loren van Themaat EV. On the use of learning bayesian networks to analyze gene expression data: classification and gene network reconstruction. University of Amsterdam, Master Thesis 2005. 73 p.
[9] Parrish RS, Spencer HJ 3rd. Effect of normalization on significance testing for oligonucleotide microarrays. J Biopharm Stat. 2004;14(3):575-89.
[10] Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–64.
[11] Dudoit S, Yang YH, Callow MJ, Speed TP. Statistical methods for identifying genes with differential expression in replicated cDNA microarray experiments. Statistica Sinica. 2002; 12(1): 111–28.
[12] Astrand M. Contrast normalization of oligonucleotide arrays. J Comput Biol. 2003;10(1):95-102.
[13] Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A. 2001;98(1):31-6.
[14] Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, Saxild HH, Nielsen C, Brunak S, Knudsen S. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 2002;3(9):research0048.
[15] Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185-93.
[16] Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8(8):816-24.