Official Update 11/21/2025 04:36 PM
Preface: NeMo Curator is a Python library that includes a suite of modules for data-mining and synthetic data generation. They are scalable and optimized for GPUs, making them ideal for curating natural language data to train or fine-tune LLMs. With NeMo Curator, researchers in Natural Language Processing (NLP) can efficiently extract high-quality text from extensive raw web data sources.
NVIDIA NeMo Curator, particularly its image curation modules, requires a CUDA-enabled NVIDIA GPU and the corresponding CUDA Toolkit. The CUDA Toolkit is not installed as part of the NeMo Curator installation process itself, but rather is a prerequisite for utilizing GPU-accelerated features.
Background: NeMo Framework includes NeMo Curator because high-quality data is essential for training accurate generative AI models, and Curator provides a scalable, GPU-accelerated toolset for processing and preparing large datasets efficiently. It handles everything from cleaning and deduplicating text to generating synthetic data for model customization and evaluation, preventing data processing from becoming a bottleneck.
Potential risks under observation: The vulnerability arises when malicious files—such as JSONL files—are loaded by NeMo Curator. If these files are crafted to exploit weaknesses in how NeMo Curator parses or processes them, they can inject executable code.
Ref: Parser is related to predefined variables, as it can either parse data into variables or use predefined variables to perform its task.
Vulnerability details:
CVE-2025-33204 NVIDIA NeMo Framework for all platforms contains a vulnerability in the NLP and LLM components, where malicious data created by an attacker could cause code injection. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.
CVE-2025-33205 NVIDIA NeMo framework contains a vulnerability in a predefined variable, where an attacker could cause inclusion of functionality from an untrusted control sphere by use of a predefined variable. A successful exploit of this vulnerability may lead to code execution.
Official announcement: Please refer to the link for details –