Gene Expression classification with topic models and IT kernels

Reference paper: M. Bicego, P. Lovato, A. Perina, M. Fasoli, M. Delledonne, M. Pezzotti, A. Polverari, V. Murino: "Investigating topic models' capabilities in expression microarray data classification", IEEE/ACM Trans. on Computational Biology and Bioinformatics, vol. 9(6), pp. 1831-1836, (2012)

Abstract: In recent years a particular class of probabilistic graphical models - called topic models - has proven to represent an useful and interpretable tool for understanding and mining microarray data. In this context, such models have been almost only applied in the clustering scenario, whereas the classification task has been disregarded by researchers. In this paper, we thoroughly investigate the use of topic models for classification of microarray data, starting from ideas proposed in other fields (e.g., computer vision). A classification scheme is proposed, based on highly interpretable features extracted from topic models, resulting in a hybrid generative-discriminative approach; an extensive experimental evaluation, involving 10 different literature benchmarks, confirms the suitability of the topic models for classifying expression microarray data.

Supplementary Material

Matlab Code

A MATLAB demo can be found here (unzip then launch Demo.m):

Needed libs:

  • PLSA:
  • LPD:
  • PRTools (academic license):

  • Datasets:

  • 11 Tumors:
  • 9 Tumors:
  • Colon Cancer:
  • Brain 1:
  • Brain 2:
  • Leukemia 1:
  • Leukemia 2:
  • Lung:
  • NCI 60:
  • Prostate Tumor: