Bulk Scanning Stuff with VirusTotal

Jul 17, 2021

I have a number of files that I have accumulated of unknown and ambiguous origin. I’d like to scan these files for any potential malicious contents.

Virustotal is a website created by Spanish security company Hispasec Sistemas which was acquired by Google. Virustotal aggregates many antivirus products and online scan engines to check for viruses. It allows users to upload files up to 650MB and send suspected malicious URLs for scanning.

Looking at all the files I am interested in scanning, we have a total of 287 files that require scanning. It would be beyond tedious to submit each file manually one by one, wait for virustotal to scan it, give back results and then uploading the next file for scanning.

Using some bash magic to find all the files!

Luckily, virustotal also has an api and command line interface tool that we can use to automate this.

Inspecting the README file on the repository

First we need to download the repository

then make

For some reason, I’m not able to start up virustotal using the command vt. Not entirely sure what’s going on there.

Doing some digging around where virustotal was downloaded though, I find the main.go file, so we’ll just execute it that way

The documentation mentioned that we need to sign up to get a free apikey in order to use the command line tool. So let’s sign up and grab the api key. After signing up, I initialized the command line tool with my api key.

VirusTotal processes files in 2 steps. There is an upload step where files are uploaded to VirusTotal and a unique ID is given back to the user. The user can then use this unique ID to query results of the VirusTotal scan. The command line tool has commands for each of these steps.

We will need a list of all file names and their location on my computer and send this to a file so we have a list of all our files to be scanned.

Creating a list of all our files to be scanned

Inspecting this new file, we get a list of all our files we want to scan. We can then read off this file line by line and send a post request to the virustotal api endpoint to get our id.

The VirusTotal command line tool has a scan file function that will return us an id that we can then use with the vt analysis command. We can even send in our list of files to it!

Running this and tee-ing the results into a new file. (tee outputs results to the screen and also into a seperate file)

run the command and output the results into a new file

Inspecting the new file, we see that our results have given us our list of IDs!

We can do some bash processing to take only the IDs and compile them into a separate list to run analysis on them. The structure of the file is a space separated entries of file names and the virustotal ID.

Let’s place these into a new file and running a quick checksum to make sure we have all lines of our code.

let’s pull up the analysis command on the tool

we see that we can pass this command a list just as we did for the file scan command. We’ll throw the results into a file aptly named results

Once this completes, we’ll need a way to sift through the results in a meaningful way to look for any files that may have been flagged as potentially malicious.

Upon closer inspection of the results, it is not returned in JSON format. Probing around the tool, I wasn’t able to find a way to make it return the results as a JSON object. So we’ll have to to do some quick hacking in order to process these results. Note also that this screenshot shows the results for one file entry and we have 287 such entries. The first entry is 532 lines long. I’ve also collapsed the results field for brevity.

The results field just shows the details of the antivirus scanner used

The total length of the results file is about ~150 000 lines long! Imagine manually scrolling through the file line by line looking for any malicious or suspicious flags!

Looking at these results, we are interested in the id field, and the entries within the ‘stats’ field. Particularly, we are interested in any malicious or suspicious entries found.

We’ll use python to perform our analysis on the results. We will use python to read every line from our results file, and when it finds an id field, it will add that id and the 8 ‘stats’ we are interested in as an entry into a Python Dictionary.

#!/usr/bin/python3

f = open('results', 'r') #open the file names 'results' as read-only
results_dict = dict() #where we will store our final data

curr_file_id = '' #what current file we are processing

for line in f: #go through each line in our results file
    splitted_line = line.split() #split the line by any whitespace
    if (splitted_line[0] == '-'): #if first element is a -, then this line will give us an id 
        curr_file_id = splitted_line[2] #the 3rd element is the id
        results_dict[curr_file_id] = dict() #make an entry in our results_dict
    
    #if the first element is any of the 8 interesting fields, record the stats
    if (splitted_line[0] == 'confirmed-timeout:'):
        results_dict[curr_file_id]['confirmed-timeout'] = splitted_line[1]      
    if (splitted_line[0] == 'failure:'):
        results_dict[curr_file_id]['failure'] = splitted_line[1]      
    if (splitted_line[0] == 'harmless:'):
        results_dict[curr_file_id]['harmless'] = splitted_line[1]      
    if (splitted_line[0] == 'malicious:'):
        results_dict[curr_file_id]['malicious'] = splitted_line[1]      
    if (splitted_line[0] == 'suspicious:'):
        results_dict[curr_file_id]['suspicious'] = splitted_line[1]      
    if (splitted_line[0] == 'timeout:'):
        results_dict[curr_file_id]['timeout'] = splitted_line[1]      
    if (splitted_line[0] == 'type-unsupported:'):
        results_dict[curr_file_id]['type-unsupported'] = splitted_line[1]      
    if (splitted_line[0] == 'undetected:'):
        results_dict[curr_file_id]['undetected'] = splitted_line[1]

this will load only the data we are interested in into a python dictionary called results_dict and allow us to manipulate and ‘query’ this dictionary for any files that may have been flagged as malicious or suspicious.

we will go through each entry of our results_dict one by one, and check for any malicious or suspicious entries, if we find any, we will add these to an alert_list. Finally, we’ll print out the alert list

alert_list = list() #a list that will contain any malicious messages

for entry in results_dict: #go through each entry in our processed data
    alerts = ''
    #if we have any malicious or suspicious flags, add an error message to alerts
    if int(results_dict[entry]['suspicious']) > 0:
        alerts = alerts + entry + ' has ' + results_dict[entry]['suspicious'] + ' suspicious flags! '
    if int(results_dict[entry]['malicious']) > 0:
        alerts = alerts + entry + ' has ' + results_dict[entry]['malicious'] + ' malicious flags! '
    #if alerts has any errors in them, add them to our alert_list
    if alerts:
        alert_list.append(alerts)


#print every entry present in alert_list
for i in alert_list:
    print(i)

Our final simple script thus looks like

Running the program and boom, we find 2 potentially malicious files. The internet is a scary place.

This is not a full-proof solution as it could still be very possible that some malicious files have not been submitted or detected by the scanners used by virustotal. Nevertheless we have somewhat increased our confidence, at least a little bit, that besides these 2 files, there is a lower probability that the other files are malicious in nature.

The Beginnings of a personal website