Skip to content

The Lua environment

ShimmerCat uses Lua for editing headers, computing cache keys, and assisting dynamic resizig of images.

In addition to special use-case functionality like a GeoIP module, all instances of the Lua interpreter use the same base environment.

Available standard Lua libraries

In the base environment the following standard Lua libraries are included:

  • math: Documented here
  • table: Documented here
  • string: Documented here
  • utf8: Documented here. In addition to it, we bundle a function urlparse.check_utf8 documented below.

The following standard Lua 5.3 functions are available:

  • assert
  • collectgarbage
  • error
  • getmetatable
  • ipairs
  • next
  • pairs
  • pcall
  • rawequal
  • rawlen
  • rawget
  • rawset
  • select
  • setmetatable
  • tonumber
  • tostring
  • type
  • xpcall

but load, loadfile, loadstring and print are not available.

Regular expressions in Lua: regex module

In addition to Lua's string matching functions, instances of the interpreter have access to regular expressions via a regex module of ours. This module is simply a caching wrapper around the Oniguruma regular expression library.

The regular expressions module regex includes the following functions to compile a string to a regular expression:

  • regex.onig_re(regex_pattern): compiles a regular expression using Oniguruma syntax
  • regex.perl_re(regex_pattern), regex.python_re(regex_pattern): compiles a regular expression using Perl/Python/PCRE syntax.
  • regex.eposix_re(regex_pattern), regex.tdfa_re(regex_pattern): compiles a regular expression using extended Posix syntax.

and two functions for using the regular expressions and obtaining a match object:

  • regex.match(regex_pattern, search string, [start_pos]): tries to match the regular expression against search_string, optionally starting at start_pos; otherwise matching starts at the beginning of search_string. This is the same semantics of re.match in Python.

  • regex.search(regex_pattern, search_string): searches anywhere in search_string for the first match of regex_pattern.

In both cases, regex_pattern must be a successfully compiled regular expression using any of the regex.xxxx_re functions in the module. If a match is not found, the functions regex.match and regex.search return nil. Otherwise, they return a Lua table with the following subtables:

  • starts: a table with the starting positions of all sub-matches, with index 0 for the starting position of the full match.
  • ends: a table with the past-the end positions of all sub-matches, with index 0 for the position past-the-end of the full match.
  • groups: a table with all the matched groups, with 0 for the full match.
  • num_regs: the total number of matched groups, excluding the full match.

For example, the snippet

local my_re = regex.python_re("(?<dayname>Saturday)");
mo = regex.search(my_re, "They came a Saturday morning");
print(tprint(mo, 2))

outputs:

  {
    ends=       {
        [1] = 21,
        [0] = 21,
        dayname= 21,
      },
    groups=       {
        [1] = "Saturday",
        [0] = "Saturday",
        dayname= "Saturday",
      },
    starts=       {
        [1] = 13,
        [0] = 13,
        dayname= 13,
      },
    num_regs= 1,
  }

Here is a more convoluted example which is part of the tests for the directive change-headers-in:

<!--
shimmercat:
    content-disposition: replace
    change-headers-in: |
        local will_pass_path = headers[":path"]
        local path_match_re = regex.onig_re("/named/(?<name>[a-z]+(?!-san))/")
        local mo = regex.match(path_match_re, will_pass_path)
        local new_pass_path;
        if mo ~= nil then
            new_pass_path = string.format("/%s-san/", mo.groups['name'])
        else
            local path2_match_re = regex.onig_re("/named/(?<name>[a-z]+?)-san/")
            mo = regex.match(path2_match_re, will_pass_path)
            if mo ~= nil then
                new_pass_path = string.format("/%s-san/", mo.groups['name'])
            else
                new_pass_path = will_pass_path
            end
        end
        headers[":path"] = new_pass_path
        trace(string.format("Will pass path: %s", new_pass_path))
        return headers
-->

The urlparse module

The Lua environment also includes a urlparse module that helps a little with absolute URL paths, for example it can parse an URL path /japanese-quotes/%E6%B5%B7%E5%8D%83%E5%B1%B1%E5%8D%83 and express it as an UT8-encoded path with the Unicode contents "/japanese-quotes/海千山千".

It includes the following functions:

  • urlparse.parse_url_path(url_path): parses the string in url_path as if it were a URL path, that is the part of an URL after the host like /my/roses.jpg?background=clear in the URL https://www.example.com/my/roses.jpg?background=clear. It returns a "URL path parsed" table with the layout described below.
  • urlparse.percent_encode_url_path(utf8_encoded_string): Percent-encodes non URL-safe characters in an URL path, leaving the / characters alone.
  • urlparse.check_utf8(s): Checks if s represents a valid UTF-8 string. Returns urlparse.STATUS_SUCCESS if the passed string represents valid UTF-8, otherwise it returns urlparse.STATUS_INVALID_UTF8.

To describe the result of urlparse.parse_url_path(url_path), let's use the result of urlparse.parse_url("/literature/croatian/Nik%C5%A1a%20Ranjina%27s%20Miscellany?format=epub%20file") as an example. It returns a table with the following members:

  • fs_path: contains the percent-decoded, UTF-8 encoded part of the URL path before the first question mark, e.g. "/literature/croatian/Nikša Ranjina's Miscellany". This is usable as a file path in most operating systems.
  • decoded_query_string: contains the percent-decoded, UTF-8 encoded part of the URL path after the question mark, if any, including the question mark itself, e.g "?format=epub file". Can be nil for those cases where there is no query string.
  • issues: number indicating various errors as a bitwise OR-d combination of flags. Contains urlparse.STATUS_SUCCESS — the value 0 — if the argument could be percent-decoded and then UTF-8 encoded without issues.

For the issues member, the error flags are:

  • urlparse.STATUS_LOOSE_PERCENTS (1): The input contains invalid percent-encodings.
  • urlparse.STATUS_PATH_DISRUPTORS (2): The input contains sequences of characters that can be used to mask a path with an URL attack, for example /../.
  • urlparse.STATUS_INVALID_UTF8 (4): After percent-decoding, the obtained bytestring contains invalid UTF-8
  • urlparse.STATUS_INVALID_ASCII (8): The input string contains invalid ASCII, e.g., it contains control characters. This condition refers to the original string, before percent-decoding.

Correctly built URLs should always produce a value of zero for the issues, but some of these conditions are considered non-fatal by some browsers, for example, Chrome will percent-encode and pass invalid UTF-8 in query strings.

Debugging aids: trace and tprint

In addition, all instantiations of the Lua interpreter from ShimmerCat have access to a trace function that outputs its string argument as a log message. This function is not available from the standalone lua interpreter in the redistributable, but the standard print is available there.

To convert tables to its string representation, use the function tprint(atable). This function returns a string with a readable representation of a Lua table. It is available both in the distributed standalone lua interpreter and in the environments created by ShimmerCat.

Runtime of the Lua interpreter

For experimentation purposes, there is a standalone lua executable with the base environment in the /bin folder of the ShimmerCat redistributable.

ShimmerCat itself instances the interpreter directly via the included .so files, and in most cases Lua code is executed in a fresh Lua runtime. This in particular means that you can't pass data in global variables or the register between separate calls of Lua code from ShimmerCat.