CSVSniffer
Class module developed as an attempt to sniff/guess CSV dialects without user intervention. In some preliminary tests, the sniffer was 100% accurate, but there is always the risk of facing ambiguous cases that can only be solved with human intervention. This class is inspired by the work of scientist Till Roman Döhmen, with some improvements to disambiguate the most complicated cases.
Members
Item | Type | Description |
---|---|---|
DetectDataType | Method | Attempts to detect the data type of a CSV field. The method can detect numeric, alphanumeric, currency, date and time, email, file system paths, IP v4, percentages, urls, structured data from programming languages (bytearray, frozenset, JS arrays). The method will return 1 when it can recognize the data type present in the specified field and 0 when the field contains an unknown data type. |
TableScore | Method | Calculates a score for the CSV data based on the congruence of the detected data type and the uniformity of the fields contained in each record. The score is in the range 0 < x <= 100 . The higher the score obtained, the higher the probability that the dialect used is the correct one for the data in the analyzed CSV file. The user can pass as ArrayList parameter the imported data or the Items stored through the Add2 method. |