Skip to content

Processing and caching images: HAPs

The same image in a page is often needed with different sizes, formats and transformations, depending on the viewing device. ShimmerCat QS offers a mechanism whereby images can be heavily transformed with an external service or mechanism, and thereafter cached for ulterior serving.

ShimmerCat doesn't do the optimization itself, but it does the following:

  • Identifies an asset as being interesting for a particular type of optimization, according to their use on the site. For example, over-the-fold JPEGs are marked to run in a pipeline that leaves them as progressive JPEGs.
  • Creates and posts a processing record for the optimization
  • Periodically checks to see if the optimized version is ready, or if the specific pipeline needs to be disabled due to unmet requests.

An external program is in care of consuming and fulfilling the image processing records; we provide one of such programs as part of sc_pack.

We call this subsystem HAPs, for "Heavy Asset Pipelines".

Enabling and disabling HAPs

HAPs need to be enabled in the tweaks.yaml at the scratch folder file, like this:

hapsSettings:
  enabled: true

If they are not enabled there, then ShimmerCat won't queue images for optimization, nor it will use optimized images. Think of it as a kill switch that will disable HAPs everywhere.

Once enabled there, they will be active for each domains in the devlove file, unless explicitly disabled for a domain. HAPs can be disabled for a domain using a haps-settings key inside the domain configuration:

shimmercat-devlove:
  domains:
    elec www.proba.com:
      root-dir: files
      cache-key: A33939
    elec www2.proba.com:
      root-dir: files
      cache-key: A33939
      haps-settings:
        enabled: false

As you can see in the example above, this works even if two domains share the same cache, so that you can have one domain with HAPs enabled and the other without. This is handy to compare the effect of HAPs in two different domains which very similar contents, e.g. localized shops.

How ShimmerCat posts that an image needs to be optimized

ShimmerCat just adds a JSON string to the end of a Redis list. The key is similar to other keys used at Redis, including a fragment with the cache name, see subsection below for more details about the Redis key.

A component, called "the usher", is in charge of popping elements from the front of this list, and executing or starting the image optimization according to the description in said element. In other words, the usher reads from a "job queue" which is just a glorified Redis list.

Here is how a string element of the list at Redis looks like looks like:

{
    "bid": {},
    "precursor-name": "webp0",
    "output-folder": "/home/alcides/projects_fast/shimmercat/td_C584CD/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;",
    "ident": "597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;",
    "input-file": "/home/alcides/projects_fast/shimmercat/td_C584CD/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:e;ri:33f316697edc2004e312c6e17f5130e6638a140e;e:identity;pn:fetch;/data",
    "blocked-mark": "/home/alcides/projects_fast/shimmercat/td_C584CD/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;/BLOCKED",
    "adjustments": null,
    "timestamp": 1571675380
}

With the following meaning for the fields:

  • precursor-name: The name of the module inside ShimmerCat that asked for this optimization. Can match a particular image format or a named subset of optimizations, but currently is just the former.
  • bid: this is a free form JSON object coming from the module inside ShimmerCat that identified the optimization.
  • output-folder: The absolute path of a folder inside ShimmerCat's cache where the optimized asset should be put, when that asset is ready.
  • input-file: The original, unchanged asset, which will serve as input for the optimization pipeline. Do not delete it or change it!
  • blocked-mark: A special file that the usher can create to disable this optimization.
  • adjustments: A free-form JSON object produced in a configurable way with scaling or fitting adjustment requests. The goal of this field is to make it possible to leverage the usher in sites which already use some convention to convey pixel size demands in the requested URL. More about its format in a bit.
  • timestamp: The Unix POSIX timestamp (that is, seconds since the Unix Epoch) when this job record was created. This is useful to track latency of image optimization pipelines.

The usher is expected to read the file from input-file, do the optimization if it recognizes the precursor-name value, and when it's done, it needs to write back the optimized result to a file whose name is obtained by concatenating output-folder with the file-name data.
If something comes in precursor-name that the usher doesn't recognize, it should immediately create the BLOCKED file referred to in the JSON, to avoid ShimmerCat wasting CPU and I/O in that image format[^3]. Note that all ShimmerCat has to know when the optimized asset is ready is this filename, and there is some risk that ShimmerCat will read it before the file is completely written to disk. To avoid ShimmerCat sending faulty data to a browser, it's best to create the file with another name initially, in the same folder[^2], say .data, and when the file is completely written and closed, rename it to the final name.

The Redis key that holds the records

Here is an example with some comments for the key:

597bfe4e:ush;singleton
^       ^-- everything else is common
|
|-- cache identifier at redis

Here is the Redis command ShimmerCat uses for adding an element (the JSON string) to the list:

 "RPUSH" "597bfe4e:ush;singleton" "bid: {}\nprecursor-name: webp0\noutput-folder: /home/alcides/projects_fast/shimmercat/td_453355/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;\nident: 597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;\ninput-file: /home/alcides/projects_fast/shimmercat/td_453355/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:e;ri:33f316697edc2004e312c6e17f5130e6638a140e;e:identity;pn:fetch;/data\nblocked-mark: /home/alcides/projects_fast/shimmercat/td_453355/.shimmercat.loves.devs/r-cache/597bfe4e/597bfe4e:H;ri:33f316697edc2004e312c6e17f5130e6638a140e;pn:webp0;ds:one;/BLOCKED\n"

To handle as a queue, one needs to use LPOP in the usher to get the first element.

Adjustments

Adjustments is an optional feature that allows instructing the usher to create more efficient derivatives of an original image for specific cases. For example, if there is an original image of pixel size 4096x3892 at URL path /images/plastic-baubles/cerezzo.jpg but one wants to serve downscaled copies of that image using specially built URLs, e.g. /images/plastic-baubles/cerezzo.jpg?width=350px or /images/plastic-baubles/cerezzo_w350px.jpg. Adjustments make it possible to parse URLs like the above using all the capabilities of the Lua interpreter and fill-in extra arguments for the record that the usher receives.

Here is an example of the syntax, which we explain in more detail immediately after:

shimmercat-devlove:
  domains:
    elec www.exampleshop.com:
      change-url:
        - //+</w([0-9]+)px\.(jpg|jpeg|png)$/> -> (*)get_width_from_filename <*>
        - //+</\.(jpg|jpeg|png)$/>  -> (*)get_width_from_querystring <*>
      adjusters:

        get_width_from_querystring: |
          local obtain_width_regexp 
            = regex.onig_re("\\?width=([0-9]+)px$")
          local parsed_url_path 
            = urlparse.parse_url_path(url_path_before_rewrite)
          local query_part 
            = parsed_url_path.decoded_query_string
          if query_part then
            local match_object 
              = regex.match(obtain_width_regexp, query_part)
            if match_object then
              local width 
                = tonumber(match_object.groups[1])
              return {
                origin_fs_path 
                  = parsed_url_path.fs_path,
                origin_url_path 
                  = urlparse.percent_encode_url_path(parsed_url_path.fs_path),
                adjustments = {
                  pixel_width 
                    = width
                  }
                }
              end
            end

        get_width_from_filename: |
          local obtain_width_regexp 
            = regex.onig_re("(.*?)_w([0-9]+)px\\.(jpg|png|jpeg)$")
          local parsed_url_path 
            = urlparse.parse_url_path(url_path_before_rewrite)
          local fspath_part 
            = parsed_url_path.fs_path
          local match_object 
            = regex.match(obtain_width_regexp, fspath_part)
          if match_object then
            local width 
               = tonumber(match_object.groups[2])
            local origin_fs_path
               = match_objects.groups[1] .. "." .. match_objects.groups[3]
            return {
              origin_fs_path 
                  = origin_fs_path,
              origin_url_path 
                  = urlparse.percent_encode_url_path(origin_fs_path),
              adjustments = {
                pixel_width 
                  = width
                  }
                }
              end
            end          

The Lua snippet is evaluated with the following:

  • url_path_before_rewrite: the URL path before re-write
  • url_path_after_rewrite: the URL path after re-write; for the example above is the same
  • is_generated: says if the resource will be eventually fetched by URL or by path

ShimmerCat QS can fetch and cache a resource using a filesystem-like path with normal syntax or an URL path with generated syntax. Among other differences, the former syntax admits straight UTF-8, while the later must use URL-encoding and can include query strings.

To conserve the two possibilities, one can invoke an adjustment as either (*)adjuster_key or (*g)adjuster_key, with the later being used for generated assets. If the former, the location of the original resource can be adjusted using a field origin_fs_path in the returned table, otherwise, it can be adjusted using origin_url_path. Both origin_fs_path and origin_url_path are optional fields, if they are not provided the result of the re-write rule is employed as usual. This effectively provides an extremely flexible way to transform paths before fetching them from the origin, even more flexible than ShimmerCat's rewrite engine. However, since ShimmerCat's rewrite engine is an order of magnitude faster, the result of the re-write rule is always used for building the cache key.

In any case, additional instructions to the usher can be returned in an adjustments subtable.

Here is a summary about adjustments:

  • Adjustments can be used both for resources which are resolved to a file-path under root-dir, or for generated assets.
  • The adjuster's code is only run when the file is brought to the cache. Thereafter, the assets are cached based on the URL path that the rewrite rule created, i.e. the value of url_path_after_rewrite. This value can not be changed from Lua.
  • The adjuster code must return a table, with either a field origin_fs_path or, if the scope variable is_generated contains true, a field origin_url_path. There can also be a field adjustments whose value will be translated to JSON and passed to the usher.