Preface: As companies and researchers leave Tensorflow and move to PyTorch, Google seems interested in moving its products to JAX to solve some of Tensorflow’s pain points, such as the complexity of the API and the complexity of training in custom chips such as TPUs.
PyTorch optimizes performance by taking advantage of Python’s native support for asynchronous execution. In TensorFlow, you have to manually code and fine-tune every operation to be performed on a specific device to allow for decentralized training.
Background: TensorFlow is open-source Python library designed by Google to develop Machine Learning models and deep learning neural networks.
Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research. Reverb is primarily used as an experience replay system for distributed reinforcement learning algorithms but the system also supports multiple data structure representations such as FIFO, LIFO, and priority queues.
Vulnerability details: This is a retroactive issue for the already-fixed security vulnerability. The Reverb Server stores Tensors represented by protos. These protos contain type information as well as a string field called “tensor_content”. When the Reverb client communicates with the server, it unpacks these protos by turning them back into tensors.
Reverb supports the VARIANT datatype, which is supposed to represent an arbitrary object in C++. When a tensor proto of type VARIANT is unpacked, memory is first allocated to store the entire tensor, and a ctor is called on each instance. Afterwards, Reverb copies the content in tensor_content
to the previously mentioned pre-allocated memory, which results in the bytes in tensor_content
overwriting the vtable pointers of all the objects which were previously allocated.
Reverb exposes 2 relevant gRPC endpoints: InsertStream and SampleStream. By default, neither is authenticated and there is no authorization. The attacker can insert this stream into the server’s database, then when the client next calls SampleStream they will unpack the tensor into RAM, and when any method on that object is called (including its destructor) the attacker gains control of the Program Counter.
Official announcement: Please refer to the vendor announcement for details – https://www.tenable.com/cve/CVE-2024-8375