35. The Lua environment¶
ShimmerCat uses Lua for editing headers, computing cache keys, and assisting dynamic resizig of images.
In addition to special use-case functionality like a GeoIP module, all instances of the Lua interpreter use the same base environment.
35.1. Available standard Lua libraries¶
In the base environment the following standard Lua libraries are included:
math
: Documented heretable
: Documented herestring
: Documented hereutf8
: Documented here. In addition to it, we bundle a functionurlparse.check_utf8
documented below.
The following standard Lua 5.3 functions are available:
assert
collectgarbage
error
getmetatable
ipairs
next
pairs
pcall
rawequal
rawlen
rawget
rawset
select
setmetatable
tonumber
tostring
type
xpcall
but load
, loadfile
, loadstring
and print
are not available.
35.2. Regular expressions in Lua: regex
module¶
In addition to Lua’s string matching functions,
instances of the interpreter have access to regular expressions via a regex
module of ours.
This module is simply a caching wrapper around the Oniguruma
regular expression library.
The regular expressions module regex
includes the following functions to compile a string to a regular expression:
regex.onig_re(regex_pattern)
: compiles a regular expression using Oniguruma syntaxregex.perl_re(regex_pattern)
,regex.python_re(regex_pattern)
: compiles a regular expression using Perl/Python/PCRE syntax.regex.eposix_re(regex_pattern)
,regex.tdfa_re(regex_pattern)
: compiles a regular expression using extended Posix syntax.
and two functions for using the regular expressions and obtaining a match object:
regex.match(regex_pattern, search string, [start_pos])
: tries to match the regular expression againstsearch_string
, optionally starting atstart_pos
; otherwise matching starts at the beginning ofsearch_string
. This is the same semantics ofre.match
in Python.regex.search(regex_pattern, search_string)
: searches anywhere insearch_string
for the first match ofregex_pattern
.
In both cases, regex_pattern
must be a successfully compiled regular expression using any of the regex.xxxx_re
functions
in the module.
If a match is not found, the functions regex.match
and regex.search
return nil
.
Otherwise, they return a Lua table with the following subtables:
starts
: a table with the starting positions of all sub-matches, with index 0 for the starting position of the full match.ends
: a table with the past-the end positions of all sub-matches, with index 0 for the position past-the-end of the full match.groups
: a table with all the matched groups, with 0 for the full match.num_regs
: the total number of matched groups, excluding the full match.
For example, the snippet
local my_re = regex.python_re("(?<dayname>Saturday)");
mo = regex.search(my_re, "They came a Saturday morning");
print(tprint(mo, 2))
outputs:
{
ends= {
[1] = 21,
[0] = 21,
dayname= 21,
},
groups= {
[1] = "Saturday",
[0] = "Saturday",
dayname= "Saturday",
},
starts= {
[1] = 13,
[0] = 13,
dayname= 13,
},
num_regs= 1,
}
Here is a more convoluted example which is part of the tests for the
directive change-headers-in
:
<!--
shimmercat:
content-disposition: replace
change-headers-in: |
local will_pass_path = headers[":path"]
local path_match_re = regex.onig_re("/named/(?<name>[a-z]+(?!-san))/")
local mo = regex.match(path_match_re, will_pass_path)
local new_pass_path;
if mo ~= nil then
new_pass_path = string.format("/%s-san/", mo.groups['name'])
else
local path2_match_re = regex.onig_re("/named/(?<name>[a-z]+?)-san/")
mo = regex.match(path2_match_re, will_pass_path)
if mo ~= nil then
new_pass_path = string.format("/%s-san/", mo.groups['name'])
else
new_pass_path = will_pass_path
end
end
headers[":path"] = new_pass_path
trace(string.format("Will pass path: %s", new_pass_path))
return headers
-->
35.3. The urlparse
module¶
The Lua environment also includes a urlparse
module that helps a little with
absolute URL paths, for example it can parse an URL path /japanese-quotes/%E6%B5%B7%E5%8D%83%E5%B1%B1%E5%8D%83
and express it as an UT8-encoded path with the Unicode contents “/japanese-quotes/海千山千”.
It includes the following functions:
urlparse.parse_url_path(url_path)
: parses the string inurl_path
as if it were a URL path, that is the part of an URL after the host like/my/roses.jpg?background=clear
in the URLhttps://www.example.com/my/roses.jpg?background=clear
. It returns a “URL path parsed” table with the layout described below.urlparse.percent_encode_url_path(utf8_encoded_string)
: Percent-encodes non URL-safe characters in an URL path, leaving the/
characters alone.urlparse.check_utf8(s)
: Checks ifs
represents a valid UTF-8 string. Returnsurlparse.STATUS_SUCCESS
if the passed string represents valid UTF-8, otherwise it returnsurlparse.STATUS_INVALID_UTF8
.
To describe the result of urlparse.parse_url_path(url_path)
, let’s use the
result of urlparse.parse_url("/literature/croatian/Nik%C5%A1a%20Ranjina%27s%20Miscellany?format=epub%20file")
as
an example.
It returns a table with the following members:
fs_path
: contains the percent-decoded, UTF-8 encoded part of the URL path before the first question mark, e.g."/literature/croatian/Nikša Ranjina's Miscellany"
. This is usable as a file path in most operating systems.decoded_query_string
: contains the percent-decoded, UTF-8 encoded part of the URL path after the question mark, if any, including the question mark itself, e.g"?format=epub file"
. Can benil
for those cases where there is no query string.issues
: number indicating various errors as a bitwise OR-d combination of flags. Containsurlparse.STATUS_SUCCESS
— the value0
— if the argument could be percent-decoded and then UTF-8 encoded without issues.
For the issues
member, the error flags are:
urlparse.STATUS_LOOSE_PERCENTS (1)
: The input contains invalid percent-encodings.urlparse.STATUS_PATH_DISRUPTORS (2)
: The input contains sequences of characters that can be used to mask a path with an URL attack, for example/../
.urlparse.STATUS_INVALID_UTF8 (4)
: After percent-decoding, the obtained bytestring contains invalid UTF-8urlparse.STATUS_INVALID_ASCII (8)
: The input string contains invalid ASCII, e.g., it contains control characters. This condition refers to the original string, before percent-decoding.
Correctly built URLs should always produce a value of zero
for the issues
, but some of these conditions are considered
non-fatal by some browsers, for example, Chrome will percent-encode and pass invalid UTF-8 in query strings.
35.4. Debugging aids: trace
and tprint
¶
In addition, all instantiations of the Lua interpreter from ShimmerCat have access to a trace
function that
outputs its string argument as a log message.
This function is not available from the standalone lua
interpreter in the redistributable, but the standard
print
is available there.
To convert tables to its string representation, use the function tprint(atable)
.
This function returns a string with a readable representation of a Lua table.
It is available both in the
distributed standalone lua
interpreter and in the environments created by ShimmerCat.
35.5. Runtime of the Lua interpreter¶
For experimentation purposes, there is a standalone lua
executable with the base
environment in the /bin
folder of the ShimmerCat redistributable.
ShimmerCat itself instances the interpreter directly via the included .so
files, and
in most cases Lua code is executed in a fresh Lua runtime.
This in particular means that you can’t pass data in global variables or the register
between separate calls of Lua code from ShimmerCat.