URL handling and re-writes
Core ShimmerCat URL convention
As a web server, ShimmerCat QS serves static assets and forwards requests to
To distinguish both, at the core ShimmerCat QS uses an URL convention:
URL paths ending in a
/ are for dynamic contents, URL paths ending in
a file extension are for static contents, and URL paths ending in a component
without a dot (
.) are to be redirected to the equivalent version with a
/ at the end.
|Example URL path||Action|
|/part/piece/||To dynamic view|
|/static/styles.css||Static fetch of /static/styles.css|
|/part/piece||"Core" redirect to /part/piece/|
Of these, the most interesting case is the one for dynamic contents.
When ShimmerCat's core receive one of these, it goes in a search for a special
file in the
views-dir should always be in the local filesystem, and
it should be a relative or absolute folder configured in the
views-dir contains what we call "views": files named
The contents of these files and the particular order in which ShimmerCat searches
these files for a given request are described in more detail elsewhere.
For now, we are more interested in explaining how to make ShimmerCat work for
applications that do not follow the URL path convention given above.
The re-write engine
How to use
ShimmerCat comes with a URL path re-write engine that understands and processes standard URL structure. The engine is applied at two points:
- Inside the
change-urlsection at the
devlove.yamlfiles for domains. This happens just before the processing where the convention of the previous section for URLs is applied.
- Inside the
change-urlsection in the special view fragments that ShimmerCat interprets. This happens after the convention of the previous section for URLs is applied.
Therefore, the recipe for handling the application URLs in any way desirable is the following:
devlove.yamlfile, use the rewrite rules at
change-urlto change the original URL you desire to proxy into a form that follows ShimmerCat's convention, and write a corresponding view.
In the view, write a
change-urlsection that changes the URL path back to the form used by the application.
It looks like a "zig-zag", in one hand, but on the other, it's good enough for ShimmerCat to work as either an "accelerator" or an "accelerator+web-server".
Basic rule structure
The basic rule structure is as follows:
<rule> ::= <pattern> '->' [<action>] <re-write-program>
<pattern>is something that should match a URL path
<action>(optional) is can be used to e.g. indicate that instead of a re-write an actual redirect should be produced.
<re-write-program>is a template string that creates a new URL from the input.
Rule processing sequence
Rules are tried one by one, in the order they have been written in the
If one of the rules matches, the URL path is changed according to the instructions of the rule,
and ShimmerCat does not try rules following in the same
You can use this fact of ShimmerCat stopping processing to create "stopping rules",
see the specific section below.
Using the URL path handling debugger
ShimmerCat, from version qs 2315 comes with a URL path handling debugger, for those cases where it's not clear what the program is doing with the received HTTP requests. The debugger shows the internal steps and particular transformations that ShimmerCat uses to answer an HTTP request.
To trigger the debugger, ensure that the request you want to debug comes with a
its value doesn't matter.
The cookie can be set in the browser, or if using
curl, a syntax like the following will do:
curl ... -v -b sc-url-ask=true ...
Right now the debugger supports most URL handling pathways, but not all.
If your use case is supported, the response will come with an
that says if ShimmerCat handled the request as static or dynamic, and a blurb of
base-64 encoded data that describes the internal pathway that the request used
Here is an example response:
... headers ... sc-note: dynamic, urle=H4sIAAAAAAAAA2NgAAM2Bijg1C9OzEkt1jfUZ8AOMBQgCfwnABihWphhhrGCtcJ4jIYwCWYWGIstvSi/tMCQBaZGFG6bgq6dgj5EFt2tHDjEcckj+IR8wArVIQTTkZmXklqhl1GSm8MElYLR6E5ngvuJNy2xuCQ5PTPeQK8gowCmXBZmpL42xG9YVUEBH15Z/KrQRQn5mZmRw4EnIu2J1fI5AI/NFxMuAgAA ... more headers ...
In this case, the
sc-note header says that the request was handled as dynamic.
To decode the base64 fragment (everything after the
urle= fragment), use the
sc-urlpath program and feed the string (without newlines) to its standard input:
echo H4sIAAAAAAAAA2NgAAM2Bijg1C9OzEkt1jfUZ8AOMBQgCfwnABihWphhhrGCtcJ4jIYwCWYWGIstvSi/tMCQBaZGFG6bgq6dgj5EFt2tHDjEcckj+IR8wArVIQTTkZmXklqhl1GSm8MElYLR6E5ngvuJNy2xuCQ5PTPeQK8gowCmXBZmpL42xG9YVUEBH15Z/KrQRQn5mZmRw4EnIu2J1fI5AI/NFxMuAgAA | ./sc-urlpath
The invocation above will produce an output like this one:
------------------------------ RAW decoded: ... blurb with internal representation, used to debug the debugger ... by our developers... will be removed. ------------------------------ # Start with URL path: /sales/1/ # Steps taken: - Step 1: *devlove* changed URL path to /group1/ rule (at devlove) used: 0 : /sales/1/ -> /group1/ - Step 2: view selected for *dynamic* request view file at: /group1/index.html - Step 3: *view* changed URL path, will do *dynamic* request to /fastcgi_0.php rule (at view) used: 0 : /group1//+/ -> /fastcgi_0.php
In the output above, whenever a re-write rule is used we say which rule we are using, and indicate its position in the rule block. In the example output above, both rules are the first in their block, and thus have index 0.
Limitations of the debugger
Besides the limitation described above of not all ShimmerCat pathways having a debugger output
sc-urlpath tool doesn't output the complete
devlove.yaml or view files.
Therefore, if you have mismatched configurations deployed in multiple edges or if you forget to reload
ShimmerCat after updating the
devlove.yaml file, you may obtain output which is not
consistent with what you expect.
Re-write engine reference
A brief note about the syntax of the syntax, and the lexical structure of the rules
For this reference, we often write snippets describing the syntax of the rules using a variation of NBF:
- Something written between angle brackets, as in
<rule>, means a non-terminal part of the grammar which will be expanded elsewhere in the reference.
- Terminal literal sequences are given using single quotes, as in
- Something written between square brackets, as in
[<pattern-ending>]means an element which is optional.
- A vertical bar in the expression at right side of a BNF
::=means alternative. Sometimes the expansion of a non-terminal include many alternatives, if that's the case we often describe one at a time and use ellipsis
...to denote the alternatives we are not covering in the particular snippet. We either write all elements of the alternative in the same line, or in lines below the bar with greater indentation.
- We use upper-case, as in
REGULAR_EXPRESSION, means a terminal element of the grammar which is explained somewhere in this reference.
- We use a plus sign, '+', to suffix a non-terminal or a terminal that can appear multiple times. The '+' acts only in the immediately preceding element, unless parentheses imply something else.
- And one last thing: the rules' parser was written in a way that allows for optional whitespace in many places but not all. The whitespace can be any combination of spaces and newlines. In the NBF, we use newlines to mark boundaries between elements where spaces are acceptable. Conversely, if two or more elements are written in the same line, no whitespace can appear between them.
Regular expression syntax
Regular expressions are allowed at certain positions denoted by the terminator
REGULAR_EXPRESSION in the NFB.
The regular expression syntax we admit by default is POSIX, though we may introduce
flags in the future to allow for other regular expression syntax.
Also, it is not possible, and otherwise pointless, to match
/ using regular
More details about POSIX regular expressions can be found at:
Top rule structure
<rule> ::= <pattern> '->' [<action>] <re-write-program-or-stop> <pattern> ::= <hook> + [<pattern-ending>] <re-write-program-or-stop> ::= '<*>' | <re-write-program> <re-write-program> ::= [<host>] <rw-instr> + [<pp-ending>] [<query-string-program>]
Whenever we talk of elements to the left of the
->, we refer to them as "hooks"
or "pattern parts".
Whenever we talk about elements to the right of the
->, we refer to them as
"instructions" or "program parts".
<*> is a shortcut for a stop rule, we will explain later what a stop rule is.
<pattern-endings> ::= | '/' | '//+/' | '//+' | <file-ending-guard>
Pattern endings can only appear at the end of a pattern.
The simplest one is
/ which can be used to ask for the URL path to end in
/alpha / -> /beta
will change the URL path
/beta, but it won't match the
/alpha without the ending slash.
The next three pattern endings can match one or more path segments (never
zero path segments!).
The second one,
//+/ can only match if the path ends in slash, and
the third one,
//+ will only match if the path does not end in slash.
- //+/ # matches /alpha/beta/gamma/, but not /alpha/beta/gamma - /alpha //+ # matches /alpha/beta/gamma , but not /alpha/beta/gamma/ nor /a/b
Note the use of spaces around elements of the rules, they are not required but help make the rule easier to read.
The last terminator, the
<file-ending-guard> is used to capture
the rest of the path components,
if it ends with a filename that contains a match of the provided regular
<file-ending-guard> ::= '//+</' REGULAR_EXPRESSION '/>'
Note that we have written the expansion above in a single line: no spaces are allowed between the pieces of this expansion.
Here are some examples of how file ending guards work:
- /alpha //+</\\.php/> -> /beta/<+> # will match /alpha/beta/a.php.b and convert it to /beta/a.php.b, but it won't # match /alpha/beta/a.Xhp.b - /alpha //+</\\.php$/> -> /beta/<+> # will match /alpha/beta/file.php but not /alpha/beta/a.php.b
In ShimmerCat, whenever there is a regular expression, a search is done,
not a full match, unless the regular expression starts of course with
Ending path program parts
These are the counterparts of the pattern terminators, at the other side of
They are not necessarily the last element of the re-write program, as
there can also be query string dispositions.
<pp_ending> ::= '/<+>' | '/<+>/' | '<+>_' | '/' | '//'
The first three elements write whatever any of the
//+ pattern terminators
acquired, the difference being in how they handle any ending slash:
/<+>will preserve any ending slash captured by either
//+, but it won't add any.
/<+>/will add an ending slash if required to ensure that the constructed URL path ends in slash.
/<+>_will remove an ending slash if required to ensure that the constructed URL path does not end in slash.
Here are some examples:
- /a/b//+ -> /a/b/<+> # matches "/a/b/c/d" and converts it to "/a/b/c/d" - /a/b //+ -> /ab/<+>/ # matches "/a/b/c/d" and converts it to "/ab/c/d/" - //+/ -> /<+>_ # matches "/a/b/c/d/" and converts it to "/a/b/c/d"
The last two,
// are simpler and do both the same: they add a
/ at the end of the constructed
URL path if there is none, or preserve one already there.
"Literal" pattern and program parts
<hook> ::= ... | '/' URL_PATH_FRAGMENT ...
<rw_instr> ::= ... | URL_PATH_FRAGMENT ...
(The vertical bar
| denotes "alternative" in this variation of the
URL_PATH_FRAGMENT denotes any valid URL path
Here is an example of a rule using only literal pattern
parts to the left of the
-> and literal program parts
to the right of the sign:
/part1/part2/part3/ -> /new-part-1/new-part-2/new-part-3/new-part-4
"Capture" pattern parts and substitution program parts
<hook> ::= ... | '/' '<' IDENTIFIER '>' ...
<rw_instr> ::= ... | <use-capture> ... <use-capture> ::= '<' IDENTIFIER ['.' SUBCAPTURE_NO] '>'
A capture pattern part is simply an identifier inside angular brackets (note that there should be no spaces inside the angle brackets). It "captures" whatever path component exists in the matched in an equivalent position, and it always matches successfully said path component. The identifier can be used later with a substitution program part.
Here is a pattern that uses a literal form and "capture" pattern part in the pattern, and then a literal program part with a substitution program part to the right:
/admin/<mystery> -> /vuva/<mystery> # matches /admin/death-in-the-clouds and converts it to /vuva/death-in-the-clouds
Note that the instructions side of this constructs supports an optional dot-number syntax that can be used to refer to specific sub-captures, this is useful for when regular expressions are used with "guarded capture patterns", more about them further down.
Combining path program parts
In patterns (to the left of the
->), the slash
/ is a syntactic element that
starts each pattern part.
To the right of the
->, the slash
/ is a syntactic element that
starts a group of path program parts.
So, the following rule is a valid one:
- /shoes/blue/<type>/small -> /shoes/blue-<type>-small # it matches "/shoes/blue/chan/small" and converts it to "/shoes/blue-chan-small"
Note that you can not use spaces between members of the same group of path program parts:
# Valid: - | / shoes / blue / <type> /small -> / shoes # Note that the group below has a substitution program part # in the middle of two literal parts, but there is no intervening # space. / blue-<type>-small # Not valid (observe the spaces in the middle of the group after the '/') - /shoes/blue/<type>/small->/shoes/blue - <type> - small
"Guarded capture" pattern
<hook> ::= ... | '/<' IDENTIFIER ':/' REGULAR_EXPRESSION '/>' ...
Similar to capture patterns, but this hook matches if the regular expression is found inside the corresponding path component, and associates the identifier with said path component.
You can use the identifier to refer later to the captured
expression, or the syntax
a number between 0 and 9, to refer to a subgroup of
- /dec/<version:/([0-9]+)\\.([0-9]+)/>/ -> /ver/v<version.1>/ # matches "/dec/1.2/" and converts it to "/dec/v1/"
As usual, sub-capture zero is everything captured by the regular expression,
and sub-capture one is for the left-most starting parenthesis and so on.
Note that sub-capture zero may by different than the value without the dot,
because unless the regular expression is anchored to the beginning and end
$, it may match only a part of the path component.
change-url block in the
devlove.yaml file can also be used to create redirects,
(as opposed to rewrites, which are not noted by the visitor's browser):
<action> ::= REDIRECT_ACTION |GENERATED_MARK
REDIRECT_ACTION can be created by joining the word
redirect using a
with an HTTP redirect code.
redirect-301 is a valid
Valid redirect codes are 301, 302, 303 and 307.
It's also possible to produce redirects to external domains, even with different schemes:
<host> ::= <scheme> HOSTNAME <scheme> ::= 'http://' | 'https://'
For example, the following is valid:
- /my-secret-admin-entry -> /wp-admin/ # Bots love to scan this URL for weak passwords, let's send # them on their way... - /wp-admin -> redirect-301 http://www.police.us/i-want-to-hand-myself-in
Handling dynamically generated static assets
Caching dynamic contents in the general case is a complex topic, and we recommend accelerator users to deploy a specialized caching solution for dynamic contents that can be configured to suit their specific needs, and to put ShimmerCat in front of it. However, there are a few simple scenarios related to what we call "generated static assets" that ShimmerCat can handle on its own.
For example, if somebody decides to use an endpoint in their dynamic application to bundle their CSS and JS, or to re-scale images based on a query-string, ShimmerCat QS can be instructed to cache and re-use the response to those requests.
To mark a URL as something which is fetched from the backend on first retrieval and
cached thereafter, use the following syntax in the
change-url section of a domain
in the devlove file:
<action> ::= REDIRECT_ACTION |GENERATED_MARK
GENERATED_MARK is simply the word
This flags the request as being for a dynamically generated static asset, and the first time the URL, with query strings and everything, is requested, ShimmerCat fetches it from the backend, and from there on, it fetches it from the local cache for static assets. The generated URL even gets to participate in automatically generated push rules.
Here is an example rule for generated assets:
# ... change-url: - /skins/skin_9/css/<bundled:/[A-Za-z]+bundled/> -> generated /generated-css/skins/skin_9/css/<bundled>/
Note that you would need an accompanying view, e.g. a file at
that does the usual thing.
For the example above, the following could be used at
<!-- shimmercat: content-disposition: replace change-url: - /generated-css//+/ -> /<+>_ -->
Generated assets use two values for the header
g-first is used to indicate that the asset was fetched directly from the
g-cached is used to indicate that the asset was fetched from the local
Generated assets are retrieved from the backend using the URL passed by the browser,
including the original query string.
Other headers are also forwarded, with the exception of
Accept-Encoding, which is
removed or replaced by
Accept-Encoding: identity, as ShimmerCat handles compression
and any further processing of the asset.
Note that this simple caching mechanism is not suitable for more complex scenarios, e.g. keying the response on a cookie or on a general URL expression is not supported.
Equally, it's possible to forbid access to a page by using the word
_ , and
This will create a forbidden page with the correct code whenever the pattern matches.
The stop condition
Take a look to the following rule:
- /alpha/beta.js -> /alpha/beta.js
it seems to do nothing, as it converts a specific URL path in itself. However, it does something: it prevents further rule evaluation when the URL path happens to match the pattern.
Let's see how that can come handy, in a slightly more complicated example:
# First rule - /static //+</^[^.]+\.(js|css)/> -> /static /<+> # Second rule in the same block - //+ -> /dynamic-views/<+>/ # Third rule - //+/ -> redirect-301 /<+>_
The first rule above will match for example
/static/a/b/c/d/geranio.css and stop
The second rule on the other hand will catch everything else that does not
/ and create a request to a view.
We can use
<*> to write stop rules more easily, this symbol can appear alone instead
of a rewrite program to mean "just create the original URL".
In the previous example:
# First rule - /static //+</^[^.]+\.(js|css)/> -> <*> # ...
ShimmerCat can not match yet in query strings, but it supports rudimentary edits. Among other things, these allow to handle the common case when it's necessary to move URL path parts to a query string (the way PHP application authors usually need to handle things).
Usually, query strings are carried verbatim in path transformations:
- /a/b -> /alpha/beta/ # Will match "/a/b?e=5" and convert it to "/alpha/beta/?e=5"
Some applications use query strings in a non-trivial way, for example OpenCart wants
the web server to convert the URL path from
Here is a simple way to write this transformation with the re-write engine:
- //+</^[^\.]+$> -> /index.php ?? _=<+>
In general, here is the syntax ShimmerCat admits for moving query strings:
<query-string-program> ::= <query-string-disposition> [ <query-string-action-fragment> [ ( '&' <query-string-action-fragment> )+ ] ] <query_string_disposition> ::= '??' | '?'
<query-string-disposition> determines what to do with the original query string
that comes in the request: a single
? preserves and combines it with the build instructions,
and a double '??' just discards the original query string.
When combining two query strings, ShimmerCat treats the query strings as a dictionary of lists, and joins the dictionaries, concatenating the lists of matching keys. For example:
- /alpha -> /a/?article=alphanic # matches "/alpha?article=deviant" and converts it to "/a/?article=deviant,alphanic"
The construction grammar for query strings is as follow:
<query-string-action-fragment> ::= <q-literal-assign> | <q-substitution-assign> | <q-unassigned-substitution> <q-literal-assign> ::= QNAME '=' QFRAGMENT <q-substitution-assign> ::= QNAME '=' <q-substitution> <q-unassigned-substitution> ::= <use-capture> | '<+>' <q-substitution> ::= <q-constructor> + <q-constructor> ::= <use-capture> | '<+>' | <q-lit-fragment> <q-lit-fragment> ::= QFRAGMENT | ',' | '+''