Hybrid Analysis Interesting File Grab

In an effort to get more malware samples for reverse engineering practice, I recently created a simple bash script to go get file samples labeled as ‘interesting from Hybrid Analysis.

https://github.com/triw0lf/hybridanalysis_scripts/blob/master/hybridanalysis_interesting.sh

This was based on an article publish to the Public Knowledge Base on the Hybrid Analysis website. This article specifically called out ways to find emerging samples for security research, highlighting using the ‘isinteresting’ flag in the Hybrid Analysis API. The article also gives the helpful note, ‘Often, these samples will have a relatively low AV detection ratio, but a high threat score (which is basically AV independent, see more below).’

I decided to automate collection of these interesting samples for future analysis purposes by writing a quick little bash script that lives on my research server and executes on a nightly cron job. A lot of these samples are quickly identified by AV vendors, but it makes for great reverse engineering practice and for spotting malware trends.

This script uses the current date to check the Hybrid Analysis latest feed, which is a JSON feed of 250 reports over the last 24 hours. The script then utilizes jq to filter for any samples tagged as interesting (“interesting: true”,) file analysis only (“url_analysis: false”,) and publicly shared samples that can be downloaded (“shared_analysis: true”.) This ensures that only files that are publicly shared are downloaded for further research. As matches are found, the SHA256 hash is extracted and added to a temporary holding file. Upon completion, the temporary file is shuffled, then the top 40 samples are shuffled again and exported to the staging file. During testing, the number 40 was selected based on best results for avoiding API rate limiting, while still getting a good set of samples. Finally, the script uses the staging file to download each of the interesting samples from Hybrid Analysis, with a built in 24 second sleep condition, which yielded the best API rate limit avoidance.

For documentation purposes, each file is named as a combination of the download source, type of file, and SHA256.

Before using this script, make sure you have the following set up:

  1. A research server for collection

  2. Hybrid Analysis API Key - https://www.hybrid-analysis.com/docs/api/v2

  3. jq - https://stedolan.github.io/jq/download/ (or apt-get install jq)

  4. (NOT REQUIRED) Aliases set for scripts

Here is a quick look at what execution looks like:

First, set an alias to make running this command in the background a lot easier. Helps keep your cron jobs clean as well.

First, set an alias to make running this command in the background a lot easier. Helps keep your cron jobs clean as well.

After setting the alias, run interesting (or other selected alias name) in your malicious file store folder.

After setting the alias, run interesting (or other selected alias name) in your malicious file store folder.

You can always confirm the script is running by executing the jobs command.

You can always confirm the script is running by executing the jobs command.

Finally, upon completion all interesting files should be downloaded and waiting for further analysis. All files are gzip compressed, and not password protected. Each file is prepended with the download source (ha == Hybrid Analysis) and the type of …

Finally, upon completion all interesting files should be downloaded and waiting for further analysis. All files are gzip compressed, and not password protected. Each file is prepended with the download source (ha == Hybrid Analysis) and the type of file it is (interesting,) followed by the SHA256.

I am always open to feedback, questions, or general chat about security! Feel free to reach me via Contact or on Twitter - @jotunvillur.

Lauren Proehl