33. Processing and caching images: HAPs¶
The same image in a page is often needed with different sizes, formats and transformations, depending on the viewing device. ShimmerCat QS offers a mechanism whereby images can be heavily transformed with an external service or mechanism, and thereafter cached for ulterior serving.
ShimmerCat doesn’t do the optimization itself, but it does the following:
Identifies an asset as being interesting for a particular type of optimization, according to their use on the site. For example, over-the-fold JPEGs are marked to run in a pipeline that leaves them as progressive JPEGs.
Creates and posts a processing record for the optimization
Periodically checks to see if the optimized version is ready, or if the specific pipeline needs to be disabled due to unmet requests.
An external program is in care of consuming and fulfilling the image processing records;
we provide one of such programs as part of sc_pack
.
We call this subsystem HAPs, for “Heavy Asset Pipelines”.
33.1. Enabling and disabling HAPs¶
HAPs need to be enabled in the tweaks.yaml at the scratch folder file, like this:
hapsSettings:
enabled: true
If they are not enabled there, then ShimmerCat won’t queue images for optimization, nor it will use optimized images. Think of it as a kill switch that will disable HAPs everywhere.
Once enabled there, they will be active for each domains in the devlove file, unless explicitly disabled
for a domain.
HAPs can be disabled for a domain using a haps-settings
key inside the domain configuration:
shimmercat-devlove:
domains:
elec www.proba.com:
root-dir: files
cache-key: A33939
elec www2.proba.com:
root-dir: files
cache-key: A33939
haps-settings:
enabled: false
As you can see in the example above, this works even if two domains share the same cache, so that you can have one domain with HAPs enabled and the other without. This is handy to compare the effect of HAPs in two different domains which very similar contents, e.g. localized shops.
33.2. How ShimmerCat posts that an image needs to be optimized¶
ShimmerCat just adds a JSON string to the end of a Redis list. The key is similar to other keys used at Redis, including a fragment with the cache name, see subsection below for more details about the Redis key.
A component, called “the usher”, is in charge of popping elements from the front of this list, and executing or starting the image optimization according to the description in said element. In other words, the usher reads from a “job queue” which is just a glorified Redis list.
Here is how a string element of the list at Redis looks like looks like:
{
"bid": {},
"precursor-name": "webp0",
"output-folder": "/home/alcides/projects_fast/shimmercat/td_C584CD/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;",
"ident": "597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;",
"input-file": "/home/alcides/projects_fast/shimmercat/td_C584CD/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:e;ri:33f316697edc2004e312c6e17f5130e6638a140e;e:identity;pn:fetch;/data",
"blocked-mark": "/home/alcides/projects_fast/shimmercat/td_C584CD/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;/BLOCKED",
"adjustments": null,
"timestamp": 1571675380
}
With the following meaning for the fields:
precursor-name
: The name of the module inside ShimmerCat that asked for this optimization. Can match a particular image format or a named subset of optimizations, but currently is just the former.bid
: this is a free form JSON object coming from the module inside ShimmerCat that identified the optimization.output-folder
: The absolute path of a folder inside ShimmerCat’s cache where the optimized asset should be put, when that asset is ready.input-file
: The original, unchanged asset, which will serve as input for the optimization pipeline. Do not delete it or change it!blocked-mark
: A special file that the usher can create to disable this optimization.adjustments
: A free-form JSON object produced in a configurable way with scaling or fitting adjustment requests. The goal of this field is to make it possible to leverage the usher in sites which already use some convention to convey pixel size demands in the requested URL. More about its format in a bit.timestamp
: The Unix POSIX timestamp (that is, seconds since the Unix Epoch) when this job record was created. This is useful to track latency of image optimization pipelines.
The usher is expected to read the file from input-file
, do the optimization if it recognizes the precursor-name
value, and when it’s done, it needs to write back the optimized result to a file whose name is obtained by concatenating output-folder
with the file-name data
.
If something comes in precursor-name
that the usher doesn’t recognize, it should immediately create the BLOCKED
file referred to in the JSON, to avoid ShimmerCat wasting CPU and I/O in that image format[^3].
Note that all ShimmerCat has to know when the optimized asset is ready is this filename, and there is some risk that ShimmerCat will read it before the file is completely written to disk.
To avoid ShimmerCat sending faulty data to a browser, it’s best to create the file with another name initially, in the same folder[^2], say .data
, and when the file is completely written and closed, rename it to the final name.
33.3. The Redis key that holds the records¶
Here is an example with some comments for the key:
597bfe4e:ush;singleton
^ ^-- everything else is common
|
|-- cache identifier at redis
Here is the Redis command ShimmerCat uses for adding an element (the JSON string) to the list:
"RPUSH" "597bfe4e:ush;singleton" "bid: {}\nprecursor-name: webp0\noutput-folder: /home/alcides/projects_fast/shimmercat/td_453355/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;\nident: 597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;\ninput-file: /home/alcides/projects_fast/shimmercat/td_453355/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:e;ri:33f316697edc2004e312c6e17f5130e6638a140e;e:identity;pn:fetch;/data\nblocked-mark: /home/alcides/projects_fast/shimmercat/td_453355/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;/BLOCKED\n"
To handle as a queue, one needs to use LPOP in the usher to get the first element.
33.4. Adjustments¶
Adjustments is an optional feature that allows instructing the usher to create more efficient derivatives
of an original image for specific cases.
For example, if there is an original image of pixel size 4096x3892 at URL path /images/plastic-baubles/cerezzo.jpg
but one wants to serve downscaled copies of that image using specially built URLs, e.g.
/images/plastic-baubles/cerezzo.jpg?width=350px
or /images/plastic-baubles/cerezzo_w350px.jpg
.
Adjustments make it possible to parse URLs like the above using all the capabilities
of the Lua interpreter and fill-in extra arguments for the record that the usher
receives.
Here is an example of the syntax, which we explain in more detail immediately after:
shimmercat-devlove:
domains:
elec www.exampleshop.com:
change-url:
- //+</w([0-9]+)px\.(jpg|jpeg|png)$/> -> (*)get_width_from_filename <*>
- //+</\.(jpg|jpeg|png)$/> -> (*)get_width_from_querystring <*>
adjusters:
get_width_from_querystring: |
local obtain_width_regexp
= regex.onig_re("\\?width=([0-9]+)px$")
local parsed_url_path
= urlparse.parse_url_path(url_path_before_rewrite)
local query_part
= parsed_url_path.decoded_query_string
if query_part then
local match_object
= regex.match(obtain_width_regexp, query_part)
if match_object then
local width
= tonumber(match_object.groups[1])
return {
origin_fs_path
= parsed_url_path.fs_path,
origin_url_path
= urlparse.percent_encode_url_path(parsed_url_path.fs_path),
adjustments = {
pixel_width
= width
}
}
end
end
get_width_from_filename: |
local obtain_width_regexp
= regex.onig_re("(.*?)_w([0-9]+)px\\.(jpg|png|jpeg)$")
local parsed_url_path
= urlparse.parse_url_path(url_path_before_rewrite)
local fspath_part
= parsed_url_path.fs_path
local match_object
= regex.match(obtain_width_regexp, fspath_part)
if match_object then
local width
= tonumber(match_object.groups[2])
local origin_fs_path
= match_objects.groups[1] .. "." .. match_objects.groups[3]
return {
origin_fs_path
= origin_fs_path,
origin_url_path
= urlparse.percent_encode_url_path(origin_fs_path),
adjustments = {
pixel_width
= width
}
}
end
end
The Lua snippet is evaluated with the following:
url_path_before_rewrite
: the URL path before re-writeurl_path_after_rewrite
: the URL path after re-write; for the example above is the sameis_generated
: says if the resource will be eventually fetched by URL or by path
ShimmerCat QS can fetch and cache a resource using a filesystem-like path with normal syntax or an URL path with generated syntax. Among other differences, the former syntax admits straight UTF-8, while the later must use URL-encoding and can include query strings.
To conserve the two possibilities, one can invoke an adjustment as either (*)adjuster_key
or
(*g)adjuster_key
, with the later being used for generated assets.
If the former, the location of the original resource can be adjusted using a field
origin_fs_path
in the returned table, otherwise, it can be adjusted using origin_url_path
.
Both origin_fs_path
and origin_url_path
are optional fields, if they are not provided the
result of the re-write rule is employed as usual.
This effectively provides an extremely flexible way to transform paths before fetching them from the
origin, even more flexible than ShimmerCat’s rewrite engine.
However, since ShimmerCat’s rewrite engine is an order of magnitude faster, the result of the re-write
rule is always used for building the cache key.
In any case, additional instructions to the usher can be returned in an adjustments
subtable.
Here is a summary about adjustments:
Adjustments can be used both for resources which are resolved to a file-path under
root-dir
, or for generated assets.The adjuster’s code is only run when the file is brought to the cache. Thereafter, the assets are cached based on the URL path that the rewrite rule created, i.e. the value of
url_path_after_rewrite
. This value can not be changed from Lua.The adjuster code must return a table, with either a field
origin_fs_path
or, if the scope variableis_generated
containstrue
, a fieldorigin_url_path
. There can also be a fieldadjustments
whose value will be translated to JSON and passed to the usher.