URLScan File Grab

Continuing my hoarding of suspicious and malicious files, I’ve written another simple bash script to grab files from public submission sites. This time I am leveraging the public results from urlscan.io.

https://github.com/triw0lf/urlscan_scripts/blob/master/urlscan_badfiles.sh

URLScan is an awesome free, public sandbox tool to analyze websites and see what a particular website is doing. From their ‘About’ page, URLScan is self-described as “[…] a service to scan and analyse websites. When a URL is submitted to urlscan.io, an automated process will browse to the URL like a regular user and record the activity that this page navigation creates.” I’ve used URLScan copiously throughout my years in IR and recommend it to everyone and anyone I meet!

So I sought to leverage this public resource for harvesting malicious samples, similar to my efforts with my Hybrid Analysis Interesting File script. One of the best things about URLScan is their robust API, which requires no authentication for public searches and results. This offers a low barrier of entry for individuals looking to get into research using URLScan’s public results or looking to explore API usage further.

URLScan provides excellent documentation on API usage, as well as references to both commercial and open source implementations of the API. Several other services also hook into URLScan, like OpenPhish, PhishTank, and CertStream, and provide additional context and indicators.

This script uses a passed file extension type to check all public submissions in URLScan. Once all results have been returned in a JSON, the script will parse through the matches using jq to search for publicly available URL submissions containing the submitted file extension, and will grab any hits for the current day, in order to get submissions that are still viable to probe. The script will then parse out the submitted URL and URLScan results link. Once the initial search is completed, each result will be run through the URLScan results API and will be filtered with jq again. This time the submission will once again be filtered based on public availability, but will also be filtered based on any malicious verdicts from sources like Google Safe Browsing. If the submission is public and has a malicious verdict, the originally submitted URL and the SHA256 of the file that was being served will be sent to text file for staging. Finally, the script will read the staging file line by line and attempt to download the suspicious file straight from the source, or the originally submitted URL, and then will rename the file as ‘SHA256.extension’ which allows for flexibility in what types of files the user wishes to download.

Before using this script, make sure you have the following set up:

  1. A research server for collection

  2. jq - https://stedolan.github.io/jq/download/ (or apt-get install jq)

Here is what use of the script looks like:

Run the script in the file location where you want to download the malicious files. Using nohup will allow you to keep an eye on what is happening while the script runs. Make sure to pass a file extension type with the script!

Run the script in the file location where you want to download the malicious files. Using nohup will allow you to keep an eye on what is happening while the script runs. Make sure to pass a file extension type with the script!

Upon script completion all interesting files should be downloaded and waiting for further analysis. Each file is names as the SHA25, followed by the submitted file extension type.

Upon script completion all interesting files should be downloaded and waiting for further analysis. Each file is names as the SHA25, followed by the submitted file extension type.

Occasionally, original URLs submitted to URLScan contain files that are empty or taken down already. These will come back as empty or will be unavailable to download. These are often a causality of submitting to a public sandbox tool.

Occasionally, original URLs submitted to URLScan contain files that are empty or taken down already. These will come back as empty or will be unavailable to download. These are often a causality of submitting to a public sandbox tool.

As always, I love to hear feedback or talk about security! You can find me at my Contact page or on Twitter - @jotunvillur.

Lauren Proehl