CVE-2021-23654 – This affects all versions of package html-to-csv. The flaw let threat actor can embed or generate a malicious link or execute commands via CSV files (26-11-2021)

Preface: CSV file is a useful thing in today’s world when we are talking about machine learning, data handling, and data visualization.

Background: There are many Raw storage bucket for big data analytic. You might store it in a text format such as JavaScript Object Notation (JSON) or comma-separated values (CSV), or perhaps even Apache Avro. Most people prefer to store it in either JSON or CSV files. CSV format is about half the size of the JSON and another format file. It helps in reducing the bandwidth, and the size of the below would be very less. Therefore, csv is one of the important data types used in the field of data analysis.

Vulnerability details: When there is a formula embedded in a HTML page, it gets accepted without any validation and the same would be pushed while converting it into a CSV file. Through this a malicious actor can embed or generate a malicious link or execute commands via CSV files.

Impact: This affects all versions of package html-to-csv.

Official details: Please refer to the link for details – https://security.snyk.io/vuln/SNYK-PYTHON-HTMLTOCSV-1582784

Reference: BeautifulSoup parsing flaw – None of the parsing error is caused due to BeautifulSoup. It is because of external parser use (html5lib, lxml) since BeautifulSoup doesn’t contain any parser code. One way to resolve above parsing error is to use another parser.

Python built-in HTML parser causes two most common parse errors, HTMLParser.HTMLParserError: malformed start tag and HTMLParser.HTMLParserError: bad end tag and to resolve this, is to use another parser mainly: lxml or html5lib.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.