Parsers and data extractors

Many Velociraptor artifacts rely on specialized parsing of file formats. This page outlines all the plugins and functions designed to allow the client to parse information for various files.

Simple file formats may be parsed using regular expressions and other generic rules. However some specialized file formats have dedicated parsers. These dedicated parsers are exported into VQL plugins so their results may be used in further queries.

binary_parse

Function

Parse a binary string with profile based parser.

This plugin extract binary data from strings. It works by applying a profile to the binary string and generating an object from that. Profiles use the same syntax as Rekall or Volatility. For example a profile might be:

{
  "StructName": [10, {
     "field1": [2, ["unsigned int"]],
     "field2": [6, ["unsigned long long"]],
  }]
}

The profile is compiled and overlayed on top of the offset specified, then the object is emitted with its required fields.

Arg Description Type
offset Start parsing from this offset. int64
string The string to parse. string (required)
profile The profile to use. string
iterator An iterator to begin with. string
target The target type to fetch. string

binary_parse

Plugin

Parse binary files using a profile.

This is the plugin version of the binary_parse() function.

Arg Description Type
offset Start parsing from this offset int64
file Filename to parse string (required)
accessor Accessor to use (e.g. ntfs, data) string
profile Profile to use. string
target The target to fetch. string (required)
args Args for the target class. Any
start The initial field in the target to fetch. string

grok

Function

Parse a string using a Grok expression.

Arg Description Type
grok Grok pattern. string (required)
data String to parse. string (required)
patterns Additional patterns. Any

olevba

Plugin

Extracts VBA Macros from Office documents.

This plugin parses the provided files as OLE documents in order to recover VB macro code. A single document can have multiple code objects, and each such code object is emitted as a row.

Arg Description Type
file A list of filenames to open as OLE files. list of string (required)
accessor The accessor to use. string
max_size Maximum size of file we load into memory. int64

parse_auditd

Plugin

Parse log files generated by auditd.

Arg Description Type
filename A list of log files to parse. list of string (required)
accessor The accessor to use. string

parse_csv

Plugin

Parses events from a CSV file.

Parses records from a CSV file. We expect the first row of the CSV file to contain column names. This parser specifically supports Velociraptor’s own CSV dialect and so it is perfect for post processing already existing CSV files.

The types of each value in each column is deduced based on Velociraptor’s standard encoding scheme. Therefore types are properly preserved when read from the CSV file.

For example, downloading the results of a hunt in the GUI will produce a CSV file containing artifact rows collected from all clients. We can then use the parse_csv() plugin to further filter the CSV file, or to stack using group by.

Example

The following stacks the result from a Windows.Applications.Chrome.Extensions artifact:

SELECT count(items=User) As TotalUsers, Name
FROM parse_csv(filename="All Windows.Applications.Chrome.Extensions.csv")
Order By TotalUsers
Group By Name
Arg Description Type
filename CSV files to open list of string (required)
accessor The accessor to use string

parse_ese

Plugin

Opens an ESE file and dump a table.

Arg Description Type
file string (required)
accessor The accessor to use. string
table A table name to dump string (required)

parse_evtx

Plugin

Parses events from an EVTX file.

This plugin parses windows events from the Windows Event log files (EVTX).

A windows event typically contains two columns. The EventData contains event specific structured data while the System column contains common data for all events - including the Event ID.

You should probably almost always filter by one or more event ids (using the System.EventID.Value field).

Example

SELECT System.TimeCreated.SystemTime as Timestamp,
       System.EventID.Value as EventID,
       EventData.ImagePath as ImagePath,
       EventData.ServiceName as ServiceName,
       EventData.ServiceType as Type,
       System.Security.UserID as UserSID,
       EventData as _EventData,
       System as _System
FROM watch_evtx(filename=systemLogFile) WHERE EventID = 7045
Arg Description Type
filename A list of event log files to parse. list of string (required)
accessor The accessor to use. string

parse_float

Function

Convert a string to a float.

Arg Description Type
string A string to convert to int string (required)

parse_json

Function

Parse a JSON string into an object.

Note that when VQL dereferences fields in a dict it returns a Null for those fields that do not exist. Thus there is no error in actually accessing missing fields, the column will just return nil.

Arg Description Type
data Json encoded string. string (required)

parse_json_array

Function

Parse a JSON string into an array.

This function is similar to parse_json() but works for a JSON list instead of an object.

Arg Description Type
data Json encoded string. string (required)

parse_json_array

Plugin

Parses events from a line oriented json file.

parse_lines

Plugin

Parse a file separated into lines.

Arg Description Type
filename A list of log files to parse. list of string (required)
accessor The accessor to use. string

parse_mft

Plugin

Scan the $MFT from an NTFS volume.

Arg Description Type
filename A list of event log files to parse. string (required)
accessor The accessor to use. string

parse_ntfs

Function

Parse an NTFS image file.

Arg Description Type
device The device file to open. This may be a full path - we will figure out the device automatically. string (required)
inode The MFT entry to parse in inode notation (5-144-1). string
mft The MFT entry to parse. int64
mft_offset The offset to the MFT entry to parse. int64

parse_ntfs_i30

Plugin

Scan the $I30 stream from an NTFS MFT entry.

Arg Description Type
device The device file to open. This may be a full path - we will figure out the device automatically. string (required)
inode The MFT entry to parse in inode notation (5-144-1). string
mft The MFT entry to parse. int64
mft_offset The offset to the MFT entry to parse. int64

parse_pe

Function

Parse a PE file.

Arg Description Type
file The PE file to open. string (required)
accessor The accessor to use. string

parse_records_with_regex

Plugin

Parses a file with a set of regexp and yields matches as records. The file is read into a large buffer. Then each regular expression is applied to the buffer, and all matches are emitted as rows.

The regular expressions are specified in the Go syntax. They are expected to contain capture variables to name the matches extracted.

For example, consider a HTML file with simple links. The regular expression might be:

regex='<a.+?href="(?P<Link>[^"]+?)"'

To produce rows with a column Link.

The aim of this plugin is to split the file into records which can be further parsed. For example, if the file consists of multiple records, this plugin can be used to extract each record, while parse_string_with_regex() can be used to further split each record into elements. This works better than trying to write a more complex regex which tries to capture a lot of details in one pass.

Example

Here is an example of parsing the /var/lib/dpkg/status files. These files consist of records separated by empty lines:

Package: ubuntu-advantage-tools
Status: install ok installed
Priority: important
Section: misc
Installed-Size: 74
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: all
Version: 17
Conffiles:
 /etc/cron.daily/ubuntu-advantage-tools 36de53e7c2d968f951b11c64be101b91
 /etc/update-motd.d/80-esm 6ffbbf00021b4ea4255cff378c99c898
 /etc/update-motd.d/80-livepatch 1a3172ffaa815d12b58648f117ffb67e
Description: management tools for Ubuntu Advantage
 Ubuntu Advantage is the professional package of tooling, technology
 and expertise from Canonical, helping organisations around the world
 manage their Ubuntu deployments.
 .
 Subscribers to Ubuntu Advantage will find helpful tools for accessing
 services in this package.
Homepage: https://buy.ubuntu.com

The following query extracts the fields in two passes. The first pass uses parse_records_with_regex() to extract records in blocks, while using parse_string_with_regex() to further break the block into fields.

SELECT parse_string_with_regex(
   string=Record,
   regex=['Package:\\s(?P<Package>.+)',
     'Installed-Size:\\s(?P<InstalledSize>.+)',
     'Version:\\s(?P<Version>.+)',
     'Source:\\s(?P<Source>.+)',
     'Architecture:\\s(?P<Architecture>.+)']) as Record
   FROM parse_records_with_regex(
     file=linuxDpkgStatus,
     regex='(?sm)^(?P<Record>Package:.+?)\\n\\n')
Arg Description Type
file A list of files to parse. list of string (required)
regex A list of regex to apply to the file data. list of string (required)
accessor The accessor to use. string

parse_string_with_regex

Function

Parse a string with a set of regex and extract fields. Returns a dict with fields populated from all regex capture variables.

Arg Description Type
string A string to parse. string (required)
regex The regex to apply. list of string (required)

parse_xml

Function

Parse an XML document into a dict like object.

Arg Description Type
file XML file to open. string (required)
accessor The accessor to use string

prefetch

Plugin

Parses a prefetch file.

regex_replace

Function

Search and replace a string with a regexp. Note you can use $1 to replace the capture string.

Arg Description Type
source The source string to replace. string (required)
replace The substitute string. string (required)
re A regex to apply string (required)

rot13

Function

Apply rot13 deobfuscation to the string.

Arg Description Type
string string

split_records

Plugin

Parses files by splitting lines into records.

Arg Description Type
filenames Files to parse. list of string (required)
accessor The accessor to use string
regex The split regular expression (e.g. a comma) string (required)
columns If the first row is not the headers, this arg must provide a list of column names for each value. list of string
first_row_is_headers A bool indicating if we should get column names from the first row. bool
count Only split into this many columns if possible. int

sqlite

Plugin

Opens an SQLite file and run a query against it.

Arg Description Type
file string (required)
accessor The accessor to use. string
query string (required)
args Any