General

ShimmerCat caches static assets. Because not all origins are HTTP backends and because most developers set long caching times to their assets to look good in website performance scoring tools like Google's Lighthouse and Pingdom, we need one or more independent mechanisms to remove cache entries from ShimmerCat deployments.

This document explains a simple protocol that uses origin-maintained lists of changes

The Algorithm

Assumption: the remote asset store keeps a list of changed assets ("changelist") in a certain sliding time range, given from t_old to tnew -- which are variables used in this explanation and set in the domain configuration file. This changelist can be implemented manually or via a tracking Fuse filesystem front.

This algorithm guarantees that for a file that has not changed since t_old, no more than one individual freshness check will be done per deployment, and that if the file changes, the change will be picked up as soon as it slides behind t_new.

Algorithm for: Is file X fresh?

The Changelist

The changelist itself should be represented in a format amenable for incremental read and stored in files under a suitable suffix, which we have chosen to be a directory named __sc_changelist__ below the root of the exposed web root directory in the origin.

We use the 64 bit representation of the Posix time, and create hierarchical directories named after the hex representation of the 64-bit word in big-endian order, with the four higher octets abbreviated as 0, and where the second smallest octet is instead a file, including all the changes that happen in the 256 seconds it covers.

Here is an example, the changes to the filesystem between between Monday, January 22, 2018 1:45:00 PM and Monday, January 22, 2018 1:50:00 PM would be registered by taking the timestamps of those two dates:

1516628700
1516629000

converting them to hex and including the padding (to avoid the Year 2038 problem):

0x 00 00 00 00 5a 65 ea dc
0x 00 00 00 00 5a 65 ec 08

We abbreviate the first two octets as a folder named 0 in the filesystem, and that results in the following files:

__sc_changelist__ / 0 / 5a / 65 / ea 
__sc_changelist__ / 0 / 5a / 65 / eb
__sc_changelist__ / 0 / 5a / 65 / ec

Notice that the lowest octet in the hex timestamp is not part of the filename.

It's OK if files that otherwise would be empty are missing.

Each of the files has the following format:

file ::= line *

line ::=  change_entry 
        | delete_entry
        | reset_cmd

change_entry ::= '~' TOKEN_TIMESTAMP TOKEN_PATH_LITERAL NEWLINE

delete_entry ::= '-' TOKEN_TIMESTAMP TOKEN_PATH_LITERAL NEWLINE

reset_cmd    ::= 'RESET' TOKEN_TIMESTAMP NEWLINE

There are two types of entries: one that it is used when a file is created or changed, and another which is used when a file is deleted. And one command, RESET, which can be used by the changelist generator when it starts running to tell ShimmerCat to invalidate its whole internal store. Not that as of QS version 2945, RESET is not acted upon by ShimmerCat

So, here are some example contents for the file __sc_changelist__/0/5a/65/ea used as an example above, assuming a German-Japanese store owner wants to upload items branded after the anime Banner Tail using a whimsical mixed alphabet[^1] :

RESET 5a65ec00
~ 5a65ec02 wp-content/upload/Plüsch_rosa_Höschen_für_das_Eichhörnchen.jpeg
~ 5a65ec02 wp-content/upload/睡眠マット_mat.jpeg

About FUSE for recording the Changelist

Fuse is acceptable in this scenario because it only needs to front whatever source of mutation/writes is there in the system, and that's most likely users uploading images or developers updating script/CSS files, which in most cases shouldn't represent a heavy I/O load.

Library snippets

This section contains small fragments of code that you can use to register changes manually

Python

import datetime as dt
import os.path


class StampsChangelist(object):

    def __init__(self, document_root):

        """

        :param document_root: The place where all the served files are

        """

        self._document_root = document_root

    @staticmethod
    def _format_timestamp(t):
        tint = int(t)
        as_str = "{0:016x}".format(tint)
        pieces = ["__sc_changelist__", "0"]
        reading_zeros = True
        for i in range(8):
            pstr = as_str[2 * i: 2 * i + 2]
            if pstr == '00' and reading_zeros:
                continue
            else:
                reading_zeros = False
                pieces.append(pstr)
        return pieces

    def invalidate_entry(self, relative_file_path):
        """
        Removes an entry from all ShimmerCat caches, by writing a special file
        on the web deployment.
        :param relative_file_path: a file path relative to the document_root of the site.
        """
        now = dt.datetime.now()
        posix_now = now.timestamp()

        # Get directory where things must happen
        happens_at_dir = self._document_root
        # Get a directory name, by formatting the timestamp
        pieces_timestamp = self._format_timestamp(posix_now)
        # Must exclude seconds (inside the file) and filename
        final_directory = os.path.join(happens_at_dir,  *pieces_timestamp[:-2])
        os.makedirs(final_directory, exist_ok=True)

        # Now get the final filename
        final_filename = os.path.join(happens_at_dir,  *pieces_timestamp[:-1])

        infile_timestamp = ''.join(pieces_timestamp[2:])
        with open(final_filename, 'a') as out:
            print('-', infile_timestamp, relative_file_path, file=out)

PHP

class StampsChangelist {
  public function __construct() {

  }

  public function format_timestamp($t){
    $tint = strtotime($t);
    $as_str = sprintf('%016x', $tint);
    $pieces = ["__sc_changelist__", "0"];
    $reading_zeros = True;
    foreach (range(0, 7) as $i) {
      $pstr = substr($as_str, 2 * $i, 2);
      if ($pstr == '00' && $reading_zeros) {
        continue;
      }else {
        $reading_zeros = False;
        array_push($pieces, $pstr);
      }
    }
    return $pieces;
  }

  public function invalidate_entry($relative_file_path){
    $happens_at_dir = $_SERVER['DOCUMENT_ROOT'];
    $posix_now = date('Y-m-d H:i:s');
    $pieces_timestamp = $this->format_timestamp($posix_now);
    $final_directory = $happens_at_dir.'/'.$relative_file_path.'/'.implode("/", array_slice($pieces_timestamp, 0, -2));
    $old = umask(0000);
    if (!file_exists($final_directory)) {
      mkdir($final_directory, 0777, true);
    }
    umask($old);
    $final_filename = $happens_at_dir.'/'.$relative_file_path.'/'.implode("/", array_slice($pieces_timestamp, 0, -1));
    $infile_timestamp = implode("", array_slice($pieces_timestamp, 2));
    $myfile = fopen($final_filename, "w");
    $txt = "- $infile_timestamp $relative_file_path";
    fwrite($myfile, $txt);
    fclose($myfile);
  }
}