Biopolym. Cell. 2017; 33(5):379-392.
Objective clustering inductive technology of gene expression profiles based on sota clustering algorithm
1Babichev S. A., 2Gozhyj A., 3Kornelyuk A. I., 4Lytvynenko V. I.
  1. University of J. E. Purkyně in Ústí nad Labem
    1, Pasteur Str, Ústí nad Labem, Czech Republic, 400 96
  2. Petro Mohyla Black Sea State University
    10,68-Desantnykiv Str. Mykolayiv, 54003
  3. Institute of Molecular Biology and Genetics, NAS of Ukraine
    150, Akademika Zabolotnoho Str., Kyiv, Ukraine, 03680
  4. Kherson National Technical University
    24, Beryslavske sh, Kherson, Ukraine, 73008


Aim. Development of an inductive technology of objective clustering of gene expression profiles based on a self-organizing SOTA clustering algorithm. Methods. Inductive methods of complex system analysis were used to implement the inductive technology of objective clustering of gene expression profiles. The optimal parameters of clustering algorithm were estimated using internal clustering quality criteria, external criteria and complex balance criteria. Results. Here we present the architecture of the inductive technology of objective clustering based on SOTA clustering algorithm and step-by-step procedure of its implementation. Charts of the internal, external and complex balance criteria versus the algorithm parameters were obtained during simulation. This allowed us to determine the optimal parameters of the algorithm. Conclusion. We have shown a high efficiency of the proposed technology. In case of analysis of gene expression profiles, this approach allows to implement a step-by-step cluster-bicluster technology of data grouping at an early stage of gene regulatory network reconstruction.
Keywords: objective clustering, inductive modeling, SOTA algorithm, clustering quality criteria, gene expression profiles


[1] Pontes B, Giráldez R, Aguilar-Ruiz JS. Biclustering on expression data: A review. J Biomed Inform. 2015;57:163-80.
[2] Chi EC, Allen GI, Baraniuk RG. Convex Biclustering. Biometrics. 2017; 73(1):10-9.
[3] Madala HR, Ivakhnenko AG. Inductive Learning Algorithms for Complex Systems Modeling. CRC Press, 1993. 384 p.
[4] Osypenko VV. Two approaches to solving the problem of clustering in the broad sense from the standpoint of inductive modeling. Power and Automation. 2014; 1: 83-97.
[5] Sarycheva LV. Objective cluster analysis of the data on the basis of the group method of data handling. Problem of Management and Informatics. 2008; 2: 86-19.
[6] Ivakhnenko AG. Inductive method for self-organizing of complex systems models. Kiev: Naukova Duka, 1982. 296 p.
[7] Ivakhnenko AG. Objective clustering based on the theory of self-organizing models. Automatics. 1987; 5: 6-10.
[8] Babichev S, Lytvynenko V, Korobchynskyi M, Osypenko V. Objective clustering inductive technology of gene expression profiles features. Communications in Computer and Information Science. Proceeding of the 13th International Conference Beyond Databases, Architectures and Structures (BDAS 2017), Ustron, Poland. 2017; 359-14.
[9] Babichev S, Taif MA, Lytvynenko V. Inductive model of data clustering based on the agglomerative hierarchical algorithm. Proceeding of the 2016 IEEE First International Conference on Data Stream Mining and Processing (DSMP 2016), Lviv. 2016; 19-4.
[10] Babichev S, Taif MA, Lytvynenko V, Osypenko V. Criterial analysis of the gene expression sequences to create the objective clustering inductive technology. Proceeding of the 2017 IEEE 37th International Conference on Electronics and Nanotechnology (ELNANO 2017), Kiev, Ukraine. 2017; 244-5.
[11] Calinski T, Harabasz J. A dendrite method for cluster analysis. Commun Stat. 1974; 3(1): 1-27.
[12] Zhao Q, Xu M, Fränti P. Sum-of-squares based cluster validity index and significance analysis. Proceeding of International Conference on Adaptive and Natural Computing Algorithms. 2009; 313-10.
[13] Harrington J. The desirability function. Industrial Quality Control. 1965; 21(10): 494-5.
[14] Dopazo J, Carazo JM. Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J Mol Evol. 1997;44(2):226-33.
[15] Fritzke B. Growing cell structures a self-organizing network for unsupervised and supervised learning. Neural Netwo. 1994; 7(9): 1441-20.
[16] Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8(8):816-24.
[17] Charytanowicz M, Niewczas J, Kulczycki P, Kowalski PA, Lukasik S, Zak S. A complete gradient clustering algorithm for features analysis of X-ray Images. Information Technologies in Biomedicine. Springer-Verlag, Berlin-Heidelberg. 2002; 15-24.
[18] Fisher RA. The use of multiple measurements in taxonomic problems. Ann Hum Genet. 1936; 7(2): 179-188. DOI:
[19] Babichev SA, Kornelyuk AI, Lytvynenko VI, Osypenko VV. Computational analysis of microarray gene expression profiles of lung cancer. Biopolym Cell. 2016; 32(1): 70-9.