16. The re-write engine¶
16.1. Core ShimmerCat URL convention¶
As a web server, ShimmerCat QS serves static assets and forwards requests to
dynamic contents.
To distinguish both, at the core ShimmerCat QS uses an URL convention:
URL paths ending in a /
are for dynamic contents, URL paths ending in
a file extension are for static contents, and URL paths ending in a component
without a dot (.
) are to be redirected to the equivalent version with a
/
at the end.
Examples:
Example URL path | Action |
---|---|
/part/piece/ | To dynamic view |
/static/styles.css | Static fetch of /static/styles.css |
/part/piece | "Core" redirect to /part/piece/ |
Of these, the most interesting case is the one for dynamic contents.
When ShimmerCat’s core receives one of these, it goes in a search for a special
file in the views-dir
.
The views-dir
should always be in the local filesystem, and
it should be a relative or absolute folder configured in the devlove.yaml
file.
The views-dir
contains what we call “views”: files named
index.html
or __index.html
.
The contents of these files and the particular order in which ShimmerCat searches
these files for a given request are described in more detail in a separate page.
Here we explain how to make ShimmerCat work for
applications that do not follow the URL path convention given above.
16.2. Changing URLs¶
ShimmerCat comes with a URL path re-write engine that understands and processes standard URL structure. The engine is applied at two points:
Inside the
change-url
section at thedevlove.yaml
files for domains. This happens just before the processing where the convention of the previous section for URLs is applied.Inside the
change-url
section in the special view fragments that ShimmerCat interprets. This happens after the convention of the previous section for URLs is applied.
Therefore, the recipe for handling the application URLs in any way desirable is the following:
In the
devlove.yaml
file, use the rewrite rules atchange-url
to change the original URL you desire to proxy into a form that follows ShimmerCat’s convention, and write a corresponding view.In the view, write a
change-url
section that changes the URL path back to the form used by the application.
It looks like a “zig-zag”, in one hand, but on the other, it’s good enough for ShimmerCat to work as either an “accelerator” or an “accelerator+web-server”.
16.3. Basic rule structure¶
The basic rule structure is as follows:
<rule> ::=
<pattern>
[<qs-guard>]
'->'
[<action>]
<re-write-program>
where:
<pattern>
is something that should match a URL path, e.g./foo/bar
in the urlhttps://www.example.com/foo/bar?q=foobies
<qs-guard>
is an optional query-string guard, so that the rule only acts if the query string fulfills certain conditions.<action>
(optional) is can be used to e.g. indicate that instead of a re-write an actual redirect should be produced, or to mark a resource as generated.<re-write-program>
is a template string that creates a new URL from the input.
16.3.1. Rule processing sequence¶
Rules are tried one by one, in the order they have been written in the change-url
block.
If one of the rules matches, the URL path is changed according to the instructions of the rule,
and ShimmerCat does not try rules following in the same change-url
block.
You can use this fact of ShimmerCat stopping processing to create “stopping rules”,
see the specific section below.
16.4. Using the URL path handling debugger¶
ShimmerCat, from version qs 2315 comes with a URL path handling debugger, for those cases where it’s not clear what the program is doing with the received HTTP requests. The debugger shows the internal steps and particular transformations that ShimmerCat uses to answer an HTTP request.
To trigger the debugger, ensure that the request you want to debug comes with a sc-url-ask
cookie,
its value doesn’t matter.
The cookie can be set in the browser, or if using curl
, a syntax like the following will do:
curl ... -v -b sc-url-ask=true ...
Right now the debugger supports most URL handling pathways, but not all.
If your use case is supported, the response will come with an sc-note
header
that says if ShimmerCat handled the request as static or dynamic, and a blurb of
base-64 encoded data that describes the internal pathway that the request used
inside ShimmerCat.
Here is an example response:
... headers ...
sc-note: dynamic, urle=H4sIAAAAAAAAA2NgAAM2Bijg1C9OzEkt1jfUZ8AOMBQgCfwnABihWphhhrGCtcJ4jIYwCWYWGIstvSi/tMCQBaZGFG6bgq6dgj5EFt2tHDjEcckj+IR8wArVIQTTkZmXklqhl1GSm8MElYLR6E5ngvuJNy2xuCQ5PTPeQK8gowCmXBZmpL42xG9YVUEBH15Z/KrQRQn5mZmRw4EnIu2J1fI5AI/NFxMuAgAA
... more headers ...
In this case, the sc-note
header says that the request was handled as dynamic.
To decode the base64 fragment (everything after the urle=
fragment), use the
sc-urlpath
program and feed the string (without newlines) to its standard input:
echo H4sIAAAAAAAAA2NgAAM2Bijg1C9OzEkt1jfUZ8AOMBQgCfwnABihWphhhrGCtcJ4jIYwCWYWGIstvSi/tMCQBaZGFG6bgq6dgj5EFt2tHDjEcckj+IR8wArVIQTTkZmXklqhl1GSm8MElYLR6E5ngvuJNy2xuCQ5PTPeQK8gowCmXBZmpL42xG9YVUEBH15Z/KrQRQn5mZmRw4EnIu2J1fI5AI/NFxMuAgAA | ./sc-urlpath
The invocation above will produce an output like this one:
------------------------------
RAW decoded:
... blurb with internal representation, used to debug the debugger
... by our developers... will be removed.
------------------------------
# Start with URL path: /sales/1/
# Steps taken:
- Step 1: *devlove* changed URL path to /group1/
rule (at devlove) used: 0 : /sales/1/ -> /group1/
- Step 2: view selected for *dynamic* request
view file at: /group1/index.html
- Step 3: *view* changed URL path, will do *dynamic* request to /fastcgi_0.php
rule (at view) used: 0 : /group1//+/ -> /fastcgi_0.php
In the output above, whenever a re-write rule is used we say which rule we are using, and indicate its position in the rule block. In the example output above, both rules are the first in their block, and thus have index 0.
16.4.1. Limitations of the debugger¶
Besides the limitation described above of not all ShimmerCat pathways having a debugger output
yet, the sc-urlpath
tool doesn’t output the complete devlove.yaml
or view files.
Therefore, if you have mismatched configurations deployed in multiple edges or if you forget to reload
ShimmerCat after updating the devlove.yaml
file, you may obtain output which is not
consistent with what you expect.
Also, the binary format understood by the debugger changes from version to version of
ShimmerCat, so it’s important that you use the copy of sc-urlpath
that comes with the
version of ShimmerCat whose re-writes you want to inspect.
16.5. Re-write engine reference¶
16.5.1. A brief note about the syntax of the syntax, and the lexical structure of the rules¶
For this reference, we often write snippets describing the syntax of the rules using a variation of NBF:
Something written between angle brackets, as in
<rule>
, means a non-terminal part of the grammar which will be expanded elsewhere in the reference.Terminal literal sequences are given using single quotes, as in
'->'
.Something written between square brackets, as in
[<pattern-ending>]
means an element which is optional.A vertical bar in the expression at right side of a BNF
::=
means alternative. Sometimes the expansion of a non-terminal include many alternatives, if that’s the case we often describe one at a time and use ellipsis...
to denote the alternatives we are not covering in the particular snippet. We either write all elements of the alternative in the same line, or in lines below the bar with greater indentation.We use upper-case, as in
REGULAR_EXPRESSION
, means a terminal element of the grammar which is explained somewhere in this reference.We use a plus sign, ‘+’, to suffix a non-terminal or a terminal that can appear multiple times. The ‘+’ acts only in the immediately preceding element, unless parentheses imply something else.
And one last thing: the rules’ parser was written in a way that allows for optional whitespace in many places but not all. The whitespace can be any combination of spaces and newlines. In the NBF, we use newlines to mark boundaries between elements where spaces are acceptable. Conversely, if two or more elements are written in the same line, no whitespace can appear between them.
16.5.2. Regular expression syntax¶
Regular expressions are allowed at certain positions denoted by the terminator
REGULAR_EXPRESSION
in the NFB.
The regular expression syntax we admit by default is POSIX, though we may introduce
flags in the future to allow for other regular expression syntax.
Also, it is not possible, and otherwise pointless, to match /
using regular
expressions.
More details about POSIX regular expressions can be found at:
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
16.5.3. Top rule structure¶
<rule> ::=
<pattern>
[<qs-guard>]
'->'
[<action>]
<re-write-program-or-stop>
<pattern> ::=
<hook> +
[<pattern-ending>]
<re-write-program-or-stop> ::=
'<*>'
| <re-write-program>
<re-write-program> ::=
[<host>]
<rw-instr> +
[<pp-ending>]
[<query-string-program>]
Whenever we talk of elements to the left of the ->
, we refer to them as “hooks”
or “pattern parts”.
Whenever we talk about elements to the right of the ->
, we refer to them as
“instructions” or “program parts”.
The <*>
is a shortcut for a stop rule, we will explain later what a stop rule is.
16.5.4. Pattern terminators¶
<pattern-endings> ::=
| '/'
| '//+/'
| '//+'
| <file-ending-guard>
Pattern endings can only appear at the end of a pattern.
The simplest one is /
which can be used to ask for the URL path to end in /
.
For example,
/alpha / -> /beta
will change the URL path /alpha/
to /beta
, but it won’t match the
URL path /alpha
without the ending slash.
The next three pattern endings can match one or more path segments (never
zero path segments!).
The second one, //+/
can only match if the path ends in slash, and
the third one, //+
will only match if the path does not end in slash.
So:
- //+/
# matches /alpha/beta/gamma/, but not /alpha/beta/gamma
- /alpha //+
# matches /alpha/beta/gamma , but not /alpha/beta/gamma/ nor /a/b
Note the use of spaces around elements of the rules, they are not required but help make the rule easier to read.
The last terminator, the <file-ending-guard>
is used to capture
the rest of the path components,
if it ends with a filename that contains a match of the provided regular
expression.
<file-ending-guard> ::=
'//+</' REGULAR_EXPRESSION '/>'
Note that we have written the expansion above in a single line: no spaces are allowed between the pieces of this expansion.
Here are some examples of how file ending guards work:
- /alpha //+</\\.php/> -> /beta/<+>
# will match /alpha/beta/a.php.b and convert it to /beta/a.php.b, but it won't
# match /alpha/beta/a.Xhp.b
- /alpha //+</\\.php$/> -> /beta/<+>
# will match /alpha/beta/file.php but not /alpha/beta/a.php.b
In ShimmerCat, whenever there is a regular expression, a search is done,
not a full match, unless the regular expression starts of course with
either ^
or $
.
16.5.5. Ending path program parts¶
These are the counterparts of the pattern terminators, at the other side of
the ->
.
They are not necessarily the last element of the re-write program, as
there can also be query string dispositions.
<pp_ending> ::=
'/<+>'
| '/<+>/'
| '<+>_'
| '/'
| '//'
The first three elements write whatever any of the //+/
or //+
pattern terminators
acquired, the difference being in how they handle any ending slash:
/<+>
will preserve any ending slash captured by either//+/
or//+
, but it won’t add any./<+>/
will add an ending slash if required to ensure that the constructed URL path ends in slash./<+>_
will remove an ending slash if required to ensure that the constructed URL path does not end in slash.
Here are some examples:
- /a/b//+ -> /a/b/<+>
# matches "/a/b/c/d" and converts it to "/a/b/c/d"
- /a/b //+ -> /ab/<+>/
# matches "/a/b/c/d" and converts it to "/ab/c/d/"
- //+/ -> /<+>_
# matches "/a/b/c/d/" and converts it to "/a/b/c/d"
The last two, /
and //
are simpler and do both the same: they add a /
at the end of the constructed
URL path if there is none, or preserve one already there.
16.5.6. “Literal” pattern and program parts¶
<hook> ::=
...
|
'/'
URL_PATH_FRAGMENT
...
and
<rw_instr> ::=
...
| URL_PATH_FRAGMENT
...
(The vertical bar |
denotes “alternative” in this variation of the
NBF)
Here URL_PATH_FRAGMENT
denotes any valid URL path
fragment.
Here is an example of a rule using only literal pattern
parts to the left of the ->
and literal program parts
to the right of the sign:
/part1/part2/part3/ -> /new-part-1/new-part-2/new-part-3/new-part-4
16.5.7. “Capture” pattern parts and substitution program parts¶
<hook> ::=
...
|
'/'
'<' IDENTIFIER '>'
...
and
<rw_instr> ::=
...
| <use-capture>
...
<use-capture> ::=
'<' IDENTIFIER ['.' SUBCAPTURE_NO] '>'
A capture pattern part is simply an identifier inside angular brackets (note that there should be no spaces inside the angle brackets). It “captures” whatever path component exists in the matched in an equivalent position, and it always matches successfully said path component. The identifier can be used later with a substitution program part.
Here is a pattern that uses a literal form and “capture” pattern part in the pattern, and then a literal program part with a substitution program part to the right:
/admin/<mystery> -> /vuva/<mystery>
# matches /admin/death-in-the-clouds and converts it to /vuva/death-in-the-clouds
Note that the instructions side of this constructs supports an optional dot-number syntax that can be used to refer to specific sub-captures, this is useful for when regular expressions are used with “guarded capture patterns”, more about them further down.
16.5.8. Combining path program parts¶
In patterns (to the left of the ->
), the slash /
is a syntactic element that
starts each pattern part.
To the right of the ->
, the slash /
is a syntactic element that
starts a group of path program parts.
So, the following rule is a valid one:
- /shoes/blue/<type>/small -> /shoes/blue-<type>-small
# it matches "/shoes/blue/chan/small" and converts it to "/shoes/blue-chan-small"
Note that you can not use spaces between members of the same group of path program parts:
# Valid:
- |
/ shoes
/ blue
/ <type>
/small
->
/ shoes
# Note that the group below has a substitution program part
# in the middle of two literal parts, but there is no intervening
# space.
/ blue-<type>-small
# Not valid (observe the spaces in the middle of the group after the '/')
- /shoes/blue/<type>/small->/shoes/blue - <type> - small
16.5.9. “Guarded capture” pattern¶
<hook> ::=
...
| '/<' IDENTIFIER ':/' REGULAR_EXPRESSION '/>'
...
Similar to capture patterns, but this hook matches if the regular expression is found inside the corresponding path component, and associates the identifier with said path component.
You can use the identifier to refer later to the captured
expression, or the syntax IDENTIFIER.SUBCAPTURE_NO
, with SUBCAPTURE_NO
being
a number between 0 and 9, to refer to a subgroup of
the match:
- /dec/<version:/([0-9]+)\\.([0-9]+)/>/ -> /ver/v<version.1>/
# matches "/dec/1.2/" and converts it to "/dec/v1/"
As usual, sub-capture zero is everything captured by the regular expression,
and sub-capture one is for the left-most starting parenthesis and so on.
Note that sub-capture zero may by different than the value without the dot,
because unless the regular expression is anchored to the beginning and end
using ^
and $
, it may match only a part of the path component.
16.5.10. Query string guards¶
<qs-guard> ::=
'?[[' <boolean-expression> ']]'
From qs 3207, ShimmerCat comes with experimetal and limited support for triggering a
rule conditionally on the value (or absence thereof) of a query string.
Because the syntax and semantic of <boolean-expression>
will likely change, at
this point we are only documenting it informally via the examples below:
# In the exmaples below, the `...` is a placeholder for actual contents,
# not a valid syntactic element!
- /gen/imgs //+ ?[[ not isempty() ]] -> ...
# matches /gen/imgs/shoes/pink.jpeg?width=100&height=200
# but not /gen/imgs/shoes/pink.jpeg
- /gen/imgs //+ ?[[ has(`width`) ]] -> ...
# matches /gen/imgs/shoes/pink.jpeg?width=100&height=200
# but not /gen/imgs/shoes/pink.jpeg?method=thumbnail
- /gen/imgs //+ ?[[ kv(`method`, `thumbnail`) and not has(`width`) ]] -> ...
# matches /gen/imgs/shoes/pink.jpeg?method=thumbnail
# but not /gen/imgs/shoes/pink.jpeg?method=thumbnail&width=100
Basically, it’s a simple boolean DSL supporting a prefix not
operator with highest
precedence, and binary or
and and
.
The binary operators have the same precedence and associate to the right, thus
has(`a`) or not has(`b`)
works exactly as it sounds.
The basic predicates at the moment are very basic, and as follows:
has(..)
: Checks that its argument is present to the left of an equal sign (=
) in the query string.kv(.., ..)
: Checks that its arguments are present in a pair with an equal sign in the middle, e.g.isempty()
: Checks that the query string is empty
Note the use of back-quotes to delimit string literals.
This is to cause minimum interference with the JSON or YAML files where the rules are embedded.
Inside back-quotes, it’s possible to use URL-safe characters, URL-escapes, and the special
sequences \\
and \` to insert a literal slash or back-quote.
16.6. Creating redirects¶
The change-url
block in the devlove.yaml
file can also be used to create redirects,
(as opposed to rewrites, which are not noted by the visitor’s browser):
<action> ::=
REDIRECT_ACTION
|GENERATED_MARK
where REDIRECT_ACTION
can be created by joining the word redirect
using a -
or _
with an HTTP redirect code.
Example: redirect-301
is a valid REDIRECT_ACTION
.
Valid redirect codes are 301, 302, 303 and 307.
It’s also possible to produce redirects to external domains, even with different schemes:
<host> ::=
<scheme>
HOSTNAME
<scheme> ::=
'http://'
| 'https://'
For example, the following is valid:
- /my-secret-admin-entry -> /wp-admin/
# Bots love to scan this URL for weak passwords, let's send
# them on their way...
- /wp-admin -> redirect-301 http://www.police.us/i-want-to-hand-myself-in
16.7. Handling dynamically generated static assets¶
Caching dynamic contents in the general case is a complex topic, and we recommend accelerator users to deploy a specialized caching solution for dynamic contents that can be configured to suit their specific needs, and to put ShimmerCat in front of it. However, there are a few simple scenarios related to what we call “generated static assets” that ShimmerCat can handle on its own.
For example, if somebody decides to use an endpoint in their dynamic application to bundle their CSS and JS, or to re-scale images based on a query-string, ShimmerCat QS can be instructed to cache and re-use the response to those requests.
To mark a URL as something which is fetched from the backend on first retrieval and
cached thereafter, use the following syntax in the change-url
section of a domain
in the devlove file:
<action> ::=
REDIRECT_ACTION
|GENERATED_MARK
where GENERATED_MARK
is simply the word generated
.
This flags the request as being for a dynamically generated static asset, and the first time the URL, with query strings and everything, is requested, ShimmerCat fetches it from the backend, and from there on, it fetches it from the local cache for static assets. The generated URL even gets to participate in automatically generated push rules.
Here is an example rule for generated assets:
# ...
change-url:
- /skins/skin_9/css/<bundled:/[A-Za-z]+bundled/> -> generated /generated-css/skins/skin_9/css/<bundled>/
Note that you would need an accompanying view, e.g. a file at <views-dir>/generated-css/__index.html
that does the usual thing.
For the example above, the following could be used at <views-dir>/generated-css/__index.html
:
<!--
shimmercat:
content-disposition: replace
change-url:
- /generated-css//+/ -> /<+>_
-->
Generated assets use two values for the header sc-note
: g-first
and g-cached
.
The value g-first
is used to indicate that the asset was fetched directly from the
backend.
The value g-cached
is used to indicate that the asset was fetched from the local
cache.
Generated assets are retrieved from the backend using the URL passed by the browser,
including the original query string.
Other headers are also forwarded, with the exception of Accept-Encoding
, which is
removed or replaced by Accept-Encoding: identity
, as ShimmerCat handles compression
and any further processing of the asset.
Note that this simple caching mechanism is not suitable for more complex scenarios, e.g. keying the response on a cookie or on a general URL expression is not supported.
16.8. Forbidding pages¶
Equally, it’s possible to forbid access to a page by using the word forbidden
, a -
or _
, and
the code 403
: forbidden-403
.
This will create a forbidden page with the correct code whenever the pattern matches.
16.9. The stop condition¶
Take a look to the following rule:
- /alpha/beta.js -> /alpha/beta.js
it seems to do nothing, as it converts a specific URL path in itself. However, it does something: it prevents further rule evaluation when the URL path happens to match the pattern.
Let’s see how that can come handy, in a slightly more complicated example:
# First rule
- /static //+</^[^.]+\.(js|css)/> -> /static /<+>
# Second rule in the same block
- //+ -> /dynamic-views/<+>/
# Third rule
- //+/ -> redirect-301 /<+>_
The first rule above will match for example /static/a/b/c/d/geranio.css
and stop
rule processing.
The second rule on the other hand will catch everything else that does not
end in /
and create a request to a view.
We can use <*>
to write stop rules more easily, this symbol can appear alone instead
of a rewrite program to mean “just create the original URL”.
In the previous example:
# First rule
- /static //+</^[^.]+\.(js|css)/> -> <*>
# ...
16.10. Query strings¶
In addition to query string guards as discussed above, ShimmerCat supports rudimentary query string edits. Among other things, these allow to handle the common case when it’s necessary to move URL path parts to a query string (the way PHP application authors usually need to handle things).
Usually, query strings are carried verbatim in path transformations:
- /a/b -> /alpha/beta/
# Will match "/a/b?e=5" and convert it to "/alpha/beta/?e=5"
Some applications use query strings in a non-trivial way, for example OpenCart wants
the web server to convert the URL path from /my-category/my-product
to
index.php?_=/my-category/my-product
.
Here is a simple way to write this transformation with the re-write engine:
- //+</^[^\.]+$> -> /index.php ?? _=<+>
In general, here is the syntax ShimmerCat admits for moving query strings:
<query-string-program> ::=
<query-string-disposition>
[
<query-string-action-fragment>
[
(
'&'
<query-string-action-fragment>
)+
]
]
<query_string_disposition> ::=
'??'
| '?'
The <query-string-disposition>
determines what to do with the original query string
that comes in the request: a single ?
preserves and combines it with the build instructions,
and a double ‘??’ just discards the original query string.
When combining two query strings, ShimmerCat treats the query strings as a dictionary of lists, and joins the dictionaries, concatenating the lists of matching keys. For example:
- /alpha -> /a/?article=alphanic
# matches "/alpha?article=deviant" and converts it to "/a/?article=deviant,alphanic"
The construction grammar for query strings is as follow:
<query-string-action-fragment> ::=
<q-literal-assign>
| <q-substitution-assign>
| <q-unassigned-substitution>
<q-literal-assign> ::=
QNAME
'='
QFRAGMENT
<q-substitution-assign> ::=
QNAME
| '=' <q-substitution>
| '<.>=' <q-substitution>
<q-unassigned-substitution> ::=
<use-capture>
| '<+>'
<q-substitution> ::= <q-constructor> +
<q-constructor> ::=
<use-capture>
| '<+>'
| <q-lit-fragment>
<q-lit-fragment> ::=
QFRAGMENT
| ','
| '+''
The <.>=
operator does the same than the equal sign, but it’s needed in some cases
due to ambiguities in the grammar; this is a known infelicity and it will be fixed at
some point.