Analysis of the Influence of Text Input Characteristics on the Performance of MiniLMv2-L6-H384 and BERT-Base-Uncased on Quora Question Pairs
DOI:
https://doi.org/10.61769/telematika.v20i2.775Keywords:
knowledge distillation, MiniLM, BERT, semantic equivalence, quora question pairs, sequence length, token rarityAbstract
Knowledge distillation is a technique for simplifying large language models into more concise models while maintaining accuracy. Bidirectional encoder representations from transformers (BERT) offer strong performance but require significant computational resources, whereas mini language models (MiniLM) are five times smaller. This study aims to compare the performance of these two models on a Quora question-pair dataset, focusing on the effects of sequence length and token rarity on classification accuracy. Both models were trained using identical training parameters. Test results show that BERT achieves 91.22% accuracy and 88.17% F1-score, slightly outperforming MiniLM, which achieves 90.12% accuracy and 86.73% F1-score. However, MiniLM provides 5.3 times faster inference speed. These findings provide empirical guidance for model optimisation in environments with limited computational resources or real-time response requirements, where MiniLM's efficiency is acceptable with a slight decrease in accuracy. Future research is recommended to explore hybrid systems that delegate complex tasks to large models and general tasks to smaller models.
References
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008, 2017.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.
Google Research, “BERT: Bidirectional Encoder Representations from Transformers,” GitHub repository. [Online]. Available: https://github.com/google-research/bert
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint, arXiv:1503.02531, 2015.
W. Wang et al., “MiniLMv2: Multi-head self-attention relation distillation for compressing pretrained transformers,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 2140–2151, doi: 10.18653/v1/2021.findings-acl.188.
Microsoft, “UniLM: MiniLM—State-of-the-art natural language processing,” GitHub repository. [Online]. Available: https://github.com/microsoft/unilm/tree/master/minilm
S. Mukherjee et al., “Orca: Progressive learning from complex explanation traces of GPT-4,” arXiv preprint, arXiv:2306.02707, 2023.
P. Zhang, G. Zeng, T. Wang, and W. Lu, “TinyLlama: An open-source small language model,” arXiv preprint, arXiv:2401.02385, 2024.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT: A distilled version of BERT: Smaller, faster, cheaper and lighter,” in Proc. 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS), 2019. [Online]. Available: arXiv:1910.01108.
Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang, and D. Zhou, “MobileBERT: A compact task-agnostic BERT for resource-limited devices,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020, pp. 2158–2170, doi: 10.18653/v1/2020.acl-main.195.
J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, vol. 129, no. 6, pp. 1789–1819, 2021, doi: 10.1007/s11263-021-01453-z.
A. Wang et al., “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proc. NAACL-HLT 2018, 2018, pp. 353–355, doi: 10.18653/v1/N18-2017.
J.-T. Baillargeon and L. Lamontagne, “Assessing the impact of sequence length learning on classification tasks for transformer encoder models,” arXiv preprint, arXiv:2212.08399, 2024.
W. Yu et al., “Dict-BERT: Enhancing language model pre-training with dictionary,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 1907–1918, doi: 10.18653/v1/2022.findings-acl.150.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ken Ratri Retno Wardani, Inge Martina, Jimmy Fong Xin Wern

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.









