Preface: MongoDB (specifically via its underlying C library, libbson) uses bson_validate to ensure that data blobs are correctly formatted and safe to process before they are committed to the database or parsed by applications.
Background: An invalid UTF-8 sequence is a series of bytes that does not follow the specific structural rules of the UTF-8 encoding standard.
Why Sequences Become Invalid
UTF-8 is a variable-width encoding where characters use 1 to 4 bytes. To be valid, these bytes must follow a strict bit pattern. Common reasons for invalidity include:
• Illegal Bytes: Certain bytes, like 0xC0, 0xC1, or anything from 0xF5 to 0xFF, can never appear in valid UTF-8 text.
• Encoding Mismatch: This is the most common real-world cause. It occurs when a file saved in a different format (like ISO-8859-1/Latin-1) is read as if it were UTF-8.
Vulnerability details: The bson_validate function may return early on specific inputs and incorrectly report success. This behavior could result in skipping validation for BSON data, allowing malformed or invalid UTF-8 sequences to bypass validation and be processed incorrectly. The issue may affect applications that rely on these functions to validate untrusted BSON data before further processing.
Impact: This issue affects MongoDB C Driver versions prior to 1.30.5, MongoDB C Driver version 2.0.0 and MongoDB C Driver version 2.0.1.
Official announcement: Please refer to link for details – https://www.tenable.com/cve/CVE-2026-6231