CVE-2019-10099 Apache Spark Unencrypted Data Vulnerability Aug 2019

Background: Apache Spark is the tailor made for big data industry.Spark’s advanced acyclic processing engine can operating as a stand-alone mode or a cloud service.

Synopsis: Spark supports encrypting temporary data written to local disks. This covers shuffle files, shuffle spills and data blocks stored on disk (for both caching and broadcast variables). It does not cover encrypting output data generated by applications with APIs such as saveAsHadoopFile or saveAsTable. It also may not cover temporary files created explicitly by the user.

Vulnerability details: The vulnerability is due to a cryptographic issue in the affected software that allows user data to be written to the local disk unencrypted in certain situations, even if the spark.io.encryption.enabled property is set to true.

Security focus: This vulnerability did not category as critical. But the level of risk will be depends on the system architecture and classification level of data. For instance, it is a machine learning function and install on top of public cloud computer farm. If this is the case, a serious access restriction control to Spark infrastructure area must be apply.

Remedy: Apache has released software updates at the following link – https://spark.apache.org/downloads.html