ARTIFICIAL INTELLIGENCE IN THE DIAGNOSIS OF LARYNGEAL CANCER BASED ON ENDOSCOPIC IMAGES: A COMPREHENSIVE NARRATIVE REVIEW
Abstract
Background: Laryngeal cancer represents a significant global health burden with early detection being crucial for improved patient outcomes. The integration of artificial intelligence (AI) into medical diagnostics has shown tremendous potential in enhancing the accuracy and efficiency of laryngeal cancer detection through endoscopic image analysis.
Objective: This narrative review comprehensively examines the current state of AI applications in laryngeal cancer diagnosis using endoscopic imaging, focusing on deep learning methodologies, diagnostic performance, clinical validation and future perspectives.
Methods: A systematic analysis of peer-reviewed literature was conducted, examining studies published between 2016 and 2025 that investigated AI-based approaches for laryngeal cancer detection and classification using endoscopic images. The review encompassed various AI techniques including convolutional neural networks (CNNs), transfer learning, multimodal approaches and novel architectures applied to white light imaging, narrow-band imaging and other advanced endoscopic modalities.
Results: Current AI systems demonstrate remarkable diagnostic accuracy with sensitivity ranging from 71-98% and specificity from 86-98% across different studies and methodologies. Deep learning approaches, particularly CNNs such as ResNet, DenseNet, and YOLO architectures, have shown performance comparable to or exceeding that of experienced clinicians. Multimodal AI systems combining white light and narrow-band imaging data achieve superior performance compared to single-modality approaches. Real-time processing capabilities with inference times as low as 0.01-0.03 seconds per image enable practical clinical implementation.
Conclusions: AI-assisted endoscopic diagnosis represents a transformative technology for laryngeal cancer detection, offering the potential to standardize diagnostic protocols, reduce inter-observer variability and improve access to expert-level diagnostic capabilities globally.
References
Alabdalhussein, A., Al-Khafaji, M. H., Al-Busairi, R., Al-Dabbagh, S., Khan, W., Anwar, F., Raheem, T. S., Elkrim, M., Sahota, R. B., & Mair, M. (2025). Artificial intelligence in laryngeal cancer detection: A systematic review and meta-analysis. Current Oncology, 32(1), 338. https://doi.org/10.3390/curroncol32060338
Azam, M. A., Sampieri, C., Ioppi, A., Africano, S., Vallin, A., Mocellin, D., Fragale, M., Guastini, L., Moccia, S., Piazza, C., Mattos, L. S., & Peretti, G. (2022). Deep learning applied to white light and narrow band imaging videolaryngoscopy: Toward real-time laryngeal cancer detection. The Laryngoscope, 132(8), 1798-1806. https://doi.org/10.1002/lary.29960
Baldini, C., Migliorelli, L., Sampieri, C., Ioppi, A., & Mattos, L. S. (2024). Improving real-time detection of laryngeal lesions in endoscopic images using a decoupled super-resolution enhanced YOLO. Computer Methods and Programs in Biomedicine, 260, 108539. https://doi.org/10.1016/j.cmpb.2024.108539
Esmaeili, N., Illanes, A., Boese, A., Davaris, N., Arens, C., Navab, N., & Friebe, M. (2020). Laryngeal lesion classification based on vascular patterns in contact endoscopy and narrow band imaging: Manual versus automatic approach. Sensors, 20(14), 4049. https://doi.org/10.3390/s20144018
Fehling, M. K., Grosch, F., Schuster, M. E., Schick, B., & Lohscheller, J. (2020). Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional LSTM network. PLoS ONE, 15(2), e0227791. https://doi.org/10.1371/journal.pone.0227791
Halicek, M., Dormer, J. D., Little, J. V., Chen, A. Y., Myers, L., Sumer, B. D., & Fei, B. (2019). Hyperspectral imaging of head and neck squamous cell carcinoma for cancer margin detection in surgical specimens from 102 patients using deep learning. Cancers, 11(9), 1367. https://doi.org/10.3390/cancers11091367
Hamad, A., Haney, M., Lever, T. E., & Bunyak, F. (2019). Automated segmentation of the vocal folds in laryngeal endoscopy videos using deep convolutional regression networks. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops, 89-97. https://doi.org/10.1109/CVPRW.2019.00023
He, Y., Cheng, Y., Huang, Z., Xu, W., Hu, R., Cheng, L., He, S., Yue, C., Qin, G., Wang, Y., & Zhong, Q. (2021). A deep convolutional neural network-based method for laryngeal squamous cell carcinoma diagnosis. Annals of Translational Medicine, 9(20), 1553. https://dx.doi.org/10.21037/atm-21-6458
Kim, H., Lee, S., Jeon, J., Im, S., Han, Y. J., Joo, Y., & Lee, J. (2020). Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. Journal of Clinical Medicine, 9(11), 3415. https://doi.org/10.3390/jcm9113415
Kist, A. M., Razi, S., Groh, R., Gritsch, F., & Schützenberger, A. (2025). Predicting semantic segmentation quality in laryngeal endoscopy images. PLoS ONE, 20(1), e0314573. https://doi.org/10.1371/journal.pone.0314573
Kono, M., Inoue, T., Matsueda, K., Waki, K., Fukuda, H., Shimamoto, Y., Fujiwara, Y., & Tada, T. (2021). Diagnosis of pharyngeal cancer on endoscopic video images by Mask region-based convolutional neural network. Digestive Endoscopy, 33(4), 569-576. https://doi.org/10.1111/den.13800
Li, Y., Gu, W., Yue, H., Lei, G., Guo, W., Wen, Y., Tang, H., Luo, X., Tu, W., Ye, J., Hong, R., Cai, Q., Gu, Q., Liu, T., Miao, B., Wang, R., Ren, J., & Lei, W. (2023). Real-time detection of laryngopharyngeal cancer using an artificial intelligence-assisted system with multimodal data. Journal of Translational Medicine, 21, 698. https://doi.org/10.1186/s12967-023-04572-y
Marrero-Gonzalez, A. R., Meenan, K., O'Rourke, A., Diemer, T. J., Nguyen, S. A., & Camilon, T. J. M. (2024). Application of artificial intelligence in laryngeal lesions: A systematic review and meta- analysis. European Archives of Oto-Rhino-Laryngology, 282(3), 1543-1555. https://doi.org/10.1007/s00405-024-09075-0
Patrini, I., Ruperti, M., Moccia, S., Mattos, L. S., Frontoni, E., & De Momi, E. (2019). Transfer learning for informative-frame selection in laryngoscopic videos through learned features. Medical & Biological Engineering & Computing, 57(6), 1225-1238. https://doi.org/10.5281/zenodo.1162784
Ren, J., Jing, X., Wang, J., Ren, X., Xu, Y., Yang, Q., Ma, L., Sun, Y., Xu, C., Yang, R., Liu, B., Xiang, M., Liu, J., & Zhao, B. (2020). Automatic recognition of laryngoscopic images using a deep-learning technique. The Laryngoscope, 130(11), E686-E693. https://doi.org/ 10.1002/lary.28539
Wang, M. L., Tie, C. W., Wang, J. H., Zhu, J. Q., Chen, B. H., Li, Y., Zhang, S., Liu, L., Guo, L., Yang, L.,Yang, L. Q., Wei, J., Jiang, F., Zhao, Z. Q., Wang, G. Q., Zhang, W., Zhang, Q. M., & Ni, X. G. (2024). Multi-instance learning based artificial intelligence model to assist vocal fold leukoplakia diagnosis: A multicentre diagnostic study. American Journal of Otolaryngology–Head and Neck Medicine and Surgery, 45(4), 104342. https://doi.org/10.1016/j.amjoto.2024.104342
Wellenstein, D. J., Marres, H. A. M., Woodburn, J., & van den Broek, G. B. (2023). Detection of laryngeal carcinoma during endoscopy using artificial intelligence. Head & Neck, 45(8), 1943-1952. https://doi.org/10.1002/hed.27441
Xiong, H., Lin, P., Yu, J. G., Ye, J., Xiao, L., Tao, Y., Jiang, Z., Lin, W., Liu, M., Xu, J., Hu, W., Lu, Y., Liu, H., Li, Y., Zheng, Y., & Yang, H. (2019). Computer-aided diagnosis of laryngeal cancer via deep learning based on laryngoscopic images. EBioMedicine, 48, 92-99. https://doi.org/10.1016/j.ebiom.2019.08.075
Xu, Z. H., Fan, D. G., Huang, J. Q., Wang, J. W., Wang, Y., & Li, Y. Z. (2023). Computer-aided diagnosis of laryngeal cancer based on deep learning with laryngoscopic images. Diagnostics, 13(11), 1924. https://doi.org/10.3390/diagnostics13243669
Yao, P., Witte, D., Gimonet, H., German, A., Andreadis, K., Sulica, L., Elemento, O., Cheng, M., Barnes, J., & Rameau, A. (2021). Automatic classification of informative laryngoscopic images using deep learning. Laryngoscope Investigative Otolaryngology, 7(2), 313-322. https://doi.org/10.1002/lio2.754
Zhang, L., Wu, Y., Zheng, B., Su, L., Chen, Y., Ma, S., Hu, Q., Zou, X., Yao, L., Yang, Y., Chen, L., Mao, Y., Chen, Y., & Ji, M. (2019). Rapid histology of laryngeal squamous cell carcinoma with deep- learning based stimulated Raman scattering microscopy. Theranostics, 9(9), 2541-2554. https://doi.org/10.7150/thno.32655
Zhu, J. Q., Wang, M. L., Li, Y., Zhang, W., Li, L. J., Liu, L., Zhang, Y., Han, C. J., Tie, C. W., Wang, S. X.,Wang, G. Q., & Ni, X. G. (2025). Convolutional neural network based anatomical site identification for laryngoscopy quality control: A multicenter study. American Journal of Otolaryngology–Head and Neck Medicine and Surgery, 44(1), 103695. https://doi.org/10.1016/j.amjoto.2022.103695
Zurek, M., Jasak, K., Niemczyk, K., & Rzepakowska, A. (2022). Artificial intelligence in laryngeal endoscopy: Systematic review and meta-analysis. Journal of Clinical Medicine, 11(10), 2752. https://doi.org/10.3390/jcm11102752
Views:
16
Downloads:
14
Copyright (c) 2025 Tobiasz Sławiński, Oliwia Sójkowska-Sławińska, Anna Leśniewska, Patryk Macuk, Natalia Rutecka

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles are published in open-access and licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Hence, authors retain copyright to the content of the articles.
CC BY 4.0 License allows content to be copied, adapted, displayed, distributed, re-published or otherwise re-used for any purpose including for adaptation and commercial use provided the content is attributed.