Preface: For years, OpenAI’s GPT series has been a dominant force, while NVIDIA’s Megatron-LM has provided a powerful framework for training these massive models.
NVIDIA Megatron-LM faces competition from several other frameworks especially Microsoft DeepSpeed, Hugging Face Accelerate, JAX/Flax and PyTorch Lightning.
Both PyTorch Lightning and NVIDIA Megatron-LM are built on top of the PyTorch library. PyTorch provides the fundamental tensor operations and deep learning primitives, while these frameworks add abstractions and tools for more efficient and scalable model development and training.
Background: The full GPT pre-training process:
A script such as pretrain_gpt[.]py orchestrates the following major steps to train the model from scratch on billions of parameters and terabytes of data. This structure contains four steps:
- Data preparation
- Distributed setup
- Core training loop
- Model saving and evaluation
The design objective of a script like orqa/unsupervised/nq.py is to prepare the GPT model for open-domain question answering (QA), a task that is not typically a part of standard, large-scale unsupervised pre-training. The script specifically uses the Natural Questions (NQ) dataset to enhance the model’s ability to retrieve information from a large corpus of documents and generate answers, all without the direct use of a labeled QA dataset for this step.
Vulnerability details:
CVE-2025-23348: NVIDIA Megatron-LM for all platforms contains a vulnerability in the pretrain_gpt script, where malicious data created by an attacker may cause a code injection issue. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.
CVE-2025-23349: NVIDIA Megatron-LM for all platforms contains a vulnerability in the tasks/orqa/unsupervised/nq.py component, where an attacker may cause a code injection. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.
Official announcement: Please refer to the link for more details –