Comparative Study of Lightweight YOLOv12 Models for Real-Time Underwater Object Detection

Hebron Prasetya; Revin R. Balo; Tasya Tumbal; Alwin M. Sambul; Muhamad Dwisnanto Putro

doi:10.61769/telematika.v20i2.799

Authors

Hebron Prasetya Universitas Sam Ratulangi
Revin R. Balo Universitas Sam Ratulangi
Tasya Tumbal Universitas Sam Ratulangi
Alwin M. Sambul Universitas Sam Ratulangi
Muhamad Dwisnanto Putro Universitas Sam Ratulangi

DOI:

https://doi.org/10.61769/telematika.v20i2.799

Keywords:

underwater, object detection, convolutional neural network, efficient model, lightweight YOLOv12

Abstract

Deep learning methods in computer vision play a crucial role in object localization using camera-based sensors, with Convolutional Neural Networks serving as the dominant approach for object detection. However, many existing models incur high computational costs due to deep architectures and complex operations, limiting their use for real-time deployment on low-cost, resource-constrained devices. The YOLOv12 architecture offers lightweight variants to improve computational efficiency. This study evaluates the trade-off between efficiency and detection performance by comparing model variants using the number of parameters, floating-point operations, and inference speed, while detection accuracy is measured using mean average precision. The results assess the suitability of lightweight models for real-time deployment in resource-constrained environments such as underwater monitoring and conservation. Experimental results on the Real-World Underwater Object Detection dataset demonstrate that YOLOv12-nano achieves 5.7% lower accuracy compared to YOLOv12-medium but requires only 2.57 million parameters and 6.5 GFLOPs, significantly less than YOLOv12-medium with 20.1 million parameters and 67.8 GFLOPs. Moreover, YOLOv12-small requires 9.26 million parameters and 21.5 GFLOPs, positioning it between nano and medium in terms of complexity while still maintaining competitive accuracy. In the inference process, YOLOv12-nano achieves 16.48 FPS on a 12th Gen Intel(R) Core (TM) i5-12450HX CPU. In comparison, YOLOv12-small runs at 6.28 FPS, while YOLOv12-medium runs at 2.36 FPS. These results indicate that YOLOv12-nano is the most suitable variant for real-time deployment on CPU-based platforms.

Author Biographies

Hebron Prasetya, Universitas Sam Ratulangi

Master Program of Informatics

Revin R. Balo, Universitas Sam Ratulangi

Master Program of Informatics

Tasya Tumbal, Universitas Sam Ratulangi

Master Program of Informatics

Alwin M. Sambul, Universitas Sam Ratulangi

Master Program of Informatics

Muhamad Dwisnanto Putro, Universitas Sam Ratulangi

Master Program of Informatics

References

A. Apprill et al., “Toward a new era of coral reef monitoring,” Environmental Science & Technology, vol. 57, no. 13, pp. 5117–5124, Apr. 2023, doi: 10.1021/acs.est.2c05369.

F. Wu, Z. Cai, S. Fan, R. Song, L. Wang, and W. Cai, “Fish target detection in underwater blurred scenes based on improved YOLOv5,” IEEE Access, vol. 11, pp. 122911–122925, 2023, doi: 10.1109/ACCESS.2023.3328940.

T.-N. Pham, V.-H. Nguyen, K.-R. Kwon, J.-H. Kim, and J.-H. Huh, “Improved YOLOv5-based deep learning system for jellyfish detection,” IEEE Access, vol. 12, pp. 87838–87849, 2024, doi: 10.1109/ACCESS.2024.3405452.

P. Pachaiyappan, G. Chidambaram, A. Jahid, and M. H. Alsharif, “Enhancing underwater object detection and classification using advanced imaging techniques: A novel approach with diffusion models,” Sustainability, vol. 16, no. 17, Art. no. 7488, 2024, doi: 10.3390/su16177488.

D. Song and H. Huo, “Lightweight underwater target detection algorithm based on YOLOv8n,” Electronics, vol. 14, no. 9, Art. no. 1810, 2025, doi: 10.3390/electronics14091810.

Y. Xu and X. Xiao, “Exploring the depth from EAST: Efficient aggregated state-space tanh-tuned model for underwater object detection,” IEEE Signal Processing Letters, vol. 32, pp. 3809–3813, 2025, doi: 10.1109/LSP.2025.3606841.

Z. Song, X. Zhang, and P. Tan, “YOLO-Fast: A lightweight object detection model for edge devices,” The Journal of Supercomputing, vol. 81, no. 5, Apr. 2025, doi: 10.1007/s11227-025-07172-3.

J. Lei, H. Wang, Z. Lei, J. Li, and S. Rong, “CNN–Transformer hybrid architecture for underwater sonar image segmentation,” Remote Sensing, vol. 17, no. 4, Art. no. 707, 2025, doi: 10.3390/rs17040707.

J. Liu, S. Liu, S. Xu, and C. Zhou, “Two-stage underwater object detection network using Swin Transformer,” IEEE Access, vol. 10, pp. 117235–117247, 2022, doi: 10.1109/ACCESS.2022.3219592.

L. Guo, X. Liu, D. Ye, X. He, J. Xia, and W. Song, “Underwater object detection algorithm integrating image enhancement and deformable convolution,” Ecological Informatics, vol. 89, Art. no. 103185, 2025, doi: 10.1016/j.ecoinf.2025.103185.

X. Qin, C. Yu, B. Liu, and Z. Zhang, “YOLOv8-FASG: A high-accuracy fish identification method for underwater robotic system,” IEEE Access, vol. 12, pp. 73354–73362, 2024, doi: 10.1109/ACCESS.2024.3404867.

W. Yi, J. Yang, and L. Yan, “Research on underwater small target detection technology based on single-stage USSTD-YOLOv8n,” IEEE Access, vol. 12, pp. 69633–69641, 2024, doi: 10.1109/ACCESS.2024.3400962.

J. Huang, C. Fang, X. Zheng, and J. Liu, “YOLOv8-UC: An improved YOLOv8-based underwater object detection algorithm,” IEEE Access, vol. 12, pp. 172186–172195, 2024, doi: 10.1109/ACCESS.2024.3496925.

Z. Li, H. Xie, J. Feng, Z. Wang, and Z. Yuan, “YOLOv7-PE: A precise and efficient enhancement of YOLOv7 for underwater target detection,” IEEE Access, vol. 12, pp. 133937–133951, 2024, doi: 10.1109/ACCESS.2024.3417322.

Y. Tian, Q. Ye, and D. Doermann, “YOLOv12: Attention-centric real-time object detectors,” arXiv preprint arXiv:2502.12524, Feb. 2025. [Online]. Available: http://arxiv.org/abs/2502.12524

S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759–8768, doi: 10.1109/CVPR.2018.00913.

X. Li et al., “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” arXiv preprint arXiv:2006.04388, Jun. 2020. [Online]. Available: http://arxiv.org/abs/2006.04388

Z. Zheng et al., “Enhancing geometric factors in model learning and inference for object detection and instance segmentation,” IEEE Transactions on Cybernetics, vol. 52, no. 8, pp. 8574–8586, 2022, doi: 10.1109/TCYB.2021.3095305.

A. U. Ruby, “Binary cross entropy with deep learning technique for image classification,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 4, pp. 5393–5397, Aug. 2020, doi: 10.30534/ijatcse/2020/175942020.

N. Shahadat and A. S. Maida, “Analyzing parameter-efficient convolutional neural network architectures for visual classification,” Sensors, vol. 25, no. 24, Art. no. 7663, Dec. 2025, doi: 10.3390/s25247663.

C. Fu et al., “Rethinking general underwater object detection: Datasets, challenges, and solutions,” Neurocomputing, vol. 517, pp. 243–256, 2023, doi: 10.1016/j.neucom.2022.10.039.

A. Wang et al., “YOLOv10: Real-time end-to-end object detection,” arXiv preprint arXiv:2405.14458, Oct. 2024. [Online]. Available: http://arxiv.org/abs/2405.14458