International Journal of Pharma and Bio Sciences
    ISSN 0975-6299

Int J Pharm Bio Sci Volume 15 Issue 1, January - March, Pages:71-79

Identification of Biomarker in Lung Adenocarcinoma Using Machine Learning and Neural Network

Chaitali Dhande and Preenon Bagchi

Lung cancer is one of the leading causes globally. The survival rate is relatively low because symptoms of lungadenocarcinoma frequently do not appear until the illness has spread to every part of the lung. Early disease detection can preventthe spread of cancer and reduce cancer-related mortality. Biomarkers help in early disease detection. Machine learning and neuralnetwork approaches, which use mathematical techniques to train a model to learn from data for a particular task, have been widelyused in biomarker discovery because identifying biomarkers is a time-consuming procedure. Based on the expression ofbiomarkers in the various groups, the "Pathway analysis" service evaluates the enrichment of biological processes, gene sets, andsignaling pathways. The study aims to find the expressed gene sets, enriched signaling pathways, and biomarkers for lungadenocarcinoma. The "TCGA-LUAD" project's TCGA data is used to identify biomarkers, and PCA analysis reveals that most lungadenocarcinoma patients have no history of other malignancies. Our examination of the GO biological process over-representation reveals that 499/6818 represents the BgRatio of cytokine response, and 129 GOs have P values less than 0.0005,indicating that they are strongly affected by biological processes. The GO Molecular Function Over-Representation Analysis revealsthat 447 biomarkers with differential expression and 20 GOs with P values less than 0.001 are substantially affected by molecularfunction, with a BgRatio of transporter activity of 472/6790. Additionally, GO Cellular Component Over-Representation Analysisreveals that 339 biomarkers with differential expression and 21 GOs with P values less than 1e-07, where the BgRatio of CellSurface is 495/7043, are substantially affected cellular components.

Keywords: Lung Cancer, Biomarker, Machine Learning, Neural Network,GO
Full HTML:
  1. Myers DJ, Wallen JM. Lung adenocarcinoma. StatPearls.2023 Jan 29.
  2. García-Gutiérrez MS, Navarrete F, Sala F, Gasparyan A, Austrich-Olivares A, Manzanares J. Biomarkers in psychiatry: concept, definition, types and relevance to the clinical reality. Front Psychiatry. 2020 May 15;11. doi: 10.3389/fpsyt.2020.00432.
  3. Califf RM. Biomarker definitions and their applications. Exp Biol Med (Maywood). 2018 Feb;243(3):213-21. doi: 10.1177/1535370217750088, PMID 29405771.
  4. Ostrin EJ, Sidransky D, Spira A, Hanash SM. Biomarkers for lung cancer screening and detection. Cancer Epidemiol Biomarkers Prev. 2020 Dec;29(12):2411-5. doi: 10.1158/1055-9965.EPI-20-0865, PMID 33093160.
  5. Park MK, Lim JM, Jeong J, Jang Y, Lee JW, Lee JC et al. Deep-learning algorithm and concomitant biomarker identification for NSCLC prediction using multi-omics data integration. Biomolecules. 2022 Dec 8;12(12):1839. doi: 10.3390/biom12121839, PMID 36551266.
  6. Li Y, Wu X, Yang P, Jiang G, Luo Y. Machine. Genomics Proteomics Bioinformatics. 2022 Oct;20(5):850-66. doi: 10.1016/j.gpb.2022.11.003, PMID 36462630.
  7. Jagga Z, Gupta D. Machine learning for biomarker identification in cancer research - developments toward its clinical application. Pers Med. 2015 Aug;12(4):371-87. doi: 10.2217/pme.15.5, PMID 29771660.
  8. Zafeiris D, Rutella S, Ball GR. An artificial neural network-integrated pipeline for biomarker discovery using Alzheimer's disease as a case study. Comp Struct Biotechnol J. 2018 Feb 21;16:77-87. doi: 10.1016/j.csbj.2018.02.001, PMID 29977480.
  9. Huang Z, Chen L, Wang C. Classifying lung adenocarcinoma and squamous cell carcinoma using RNA-Seq data. Cancer Stud Mol Med Open J. 2017 Dec;3(2):27-31. doi: 10.17140/CSMMOJ-3-120.
  10. Tian S. Identification of subtype-specific prognostic genes for early-stage lung adenocarcinoma and squamous cell carcinoma patients using an embedded feature selection algorithm. PLOS ONE. 2015 Jul 30;10(7):e0134630. doi: 10.1371/journal.pone.0134630, PMID 26226392.
  11. Chen JW, Dhahbi J. Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods [scientific report]. Sci Rep. 2021 Jun 25;11(1). doi: 10.1038/s41598-021-92725-8.
  12. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764, PMID 24071849.
  13. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007 Oct;23(19):2507-17. doi: 10.1093/bioinformatics/btm344, PMID 17720704.
  14. Dudek G. Generating random weights and biases in feedforward neural networks with random hidden nodes. Inf Sci. 2019 May;481:33-56. doi: 10.1016/j.ins.2018.12.063.
  15. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. Available from:
  16. Yu G, Wang LG, Han Y, He QY. ClusterProfiler: an R package for comparing biological themes among gene clusters. OMICS A J Integr Biol. 2012;16(5):284-7. doi: 10.1089/omi.2011.0118, PMID 22455463.
  17. Buas MF, Li CI, Anderson GL, Pepe MS. Recommendation to use exact P-values in biomarker discovery research in place of approximate P-values. Cancer Epidemiol. 2018 Oct;56:83-9. doi: 10.1016/j.canep.2018.07.014, PMID 30099328.
  18. Chatzimichail E, Matthaios D, Bouros D, Karakitsos P, Romanidis K, Kakolyris S et al. gamma-H2AX: a novel prognostic marker in a prognosis prediction model of patients with early operable non-small cell lung cancer. Int J Genomics. 2014 Jan 8;2014:160236. doi: 10.1155/2014/160236, PMID 24527431.
  19. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000 May;25(1):25-9. doi: 10.1038/75556, PMID 10802651.
  20. López OAM, López AM, Crossa J. Fundamentals of artificial neural networks and deep learning. Multivariate Stat Mach Learn Methods Genom Prediction. 2022 Jan 14:379-425.
  21. Han M, Yan H, Yang K, Fan B, Liu P, Yang H. Identification of biomarkers and construction of a microRNA?mRNA regulatory network for clear cell renal cell carcinoma using integrated bioinformatics analysis. PLOS ONE. 2021 Jan 12;16(1):e0244394. doi: 10.1371/journal.pone.0244394, PMID 33434215.
  22. Zhang P, West NP, Chen PY, Thang MWC, Price G, Cripps AW et al. Selection of microbial biomarkers with genetic algorithm and principal component analysis. BMC Bioinformatics. 2019 Dec 10;20;Suppl 6:413. doi: 10.1186/s12859-019-3001-4, PMID 31823717.
  23. Kong Y, Yu TA. A Deep neural network model using random forest to extract feature representation for gene expression data classification. Sci Rep. 2018 Nov 7;8(1):16477. doi: 10.1038/s41598-018-34833-6, PMID 30405137
[Download PDF]
Welcome to IJPBS,Pharmaceutics, Novel, drug, delivery, system, Nanotechnology, Pharmacology, Pharmacognosy
Pharmaceutical Fields
Welcome to IJPBS,Pharmaceutics, Novel, drug, delivery, system, Nanotechnology, Pharmacology, Pharmacognosy Pharmaceutics
Welcome to IJPBS,Pharmaceutics, Novel, drug, delivery, system, Nanotechnology, Pharmacology, Pharmacognosy Novel drug delivery system
Welcome to IJPBS,Pharmaceutics, Novel, drug, delivery, system, Nanotechnology, Pharmacology, Pharmacognosy Nanotechnology
Welcome to IJPBS,Pharmaceutics, Novel, drug, delivery, system, Nanotechnology, Pharmacology, Pharmacognosy Pharmacology
Welcome to IJPBS,Pharmaceutics, Novel, drug, delivery, system, Nanotechnology, Pharmacology, Pharmacognosy Pharmacognosy
© Copyright 2009-2015 IJPBS, India. All rights reserved. Specialized online journals by ubijournal. Website by Ubitech Solutions
         Home I Contact I Terms & Conditions