During a forensic investigation, malicious activity may be discovered in logs. An incident responder may want to identify related malicious activity across a range of related log files.
This article explains how JsonHash, a novel fuzzy hashing algorithm, can be used to detect similarities in log files. JsonHash works by flattening the nested structures of the data record, tokenizing the values for each field, prepending the field name to the tokenized values, hashing the tokens for each field, placing the resultant hash for each token into a “bucket”, counting the frequency of each bucket, scaling the counts, and outputting a hash digest string. To compare two JsonHash digests, a dissimilarity measure is used to approximate the number of tokens shared by both digests, as a ratio to the total number of tokens from both digests combined.
To illustrate how JsonHash can be used, a scenario is provided in which a threat actor has managed to exploit a vulnerability in an internet facing IIS server operated by an organisation named Contoso. This allows for remote code execution and remote file upload to be performed. Through forensic analysis, incident responders retrieve various Indicators of Compromise (IOCs) including the hash of the webshell, and the user agents and IP addresses observed interacting with it. Using JsonHash, Contoso’s IR team is able to analyse the IIS logs retrieved from the server where the webshell was found. The IIS logs are converted to CSV files and a JsonHash is generated for each log line. The resulting digests are then analysed to identify similarities.
The article also explains how to calculate the dissimilarity between two strings using the unicode codepoints from the strings. The process begins by assigning the unicode codepoints from the strings to two variables, x and y. Then, the mv-apply function is used to typecast the codepoints to integers and calculate the maximum and minimum values for each codepoint. The maximum and minimum values are then used to calculate the dissimilarity between the two strings. Finally, the project-away function is used to remove the xlen, x, and y variables from the equation.
The article concludes with a validation to assess how accurate JsonHash is at finding webshell activity logs. The experiment showed JsonHash was effective at finding Web Shell activity for the sample set tested. JsonHash is an algorithm under further development and future refinements will include improving accuracy by ignoring fields such as timestamps and UUIDs from the hash computation and conducting further experiments to identify the best
📓 JsonHash: Fuzzy hashing logs to find malicious activity
👉🏽 JsonHash is a fuzzy hashing algorithm used to detect similarities in log files.
👉🏽 During a forensic investigation, malicious activity may be discovered in logs. An incident responder may want to identify related malicious activity across a range of related log files with different formats.
👉🏽 JsonHash flattens nested structures, tokenizes values, hashes tokens and places them in buckets. 👉🏽 The algorithm generates a hash digest string as output. 👉🏽 A dissimilarity measure is used to compare two JsonHash digests for similarities. 👉🏽 The algorithm is useful for the forensic investigation of cyberattacks. 👉🏽 It can identify Indicators of Compromise (IOCs) such as webshell activity. 👉🏽 Example for converting IIS logs to CSV files and generating a JsonHash for each log line. 👉🏽 The article explains how to calculate dissimilarity using unicode codepoints, amongst other things. 👉🏽 Example use case shows that JsonHash is effective at finding webshell activity logs. 👉🏽 Future refinements will improve accuracy and identify other uses for the algorithm.
#JsonHash #FuzzyHashing #LogSimilarities #IndicatorsOfCompromise #DissimilarityMeasure