Skip to content

Enable bot blocking

ShimmerCat has a mechanism to automatically identify an IP address as a prospective bot, and also to white-list the good bots, like Google's crawlers. Once bots are identified, the IP addresses that they use are automatically spread to all the deployments, so you don't need to worry about that.

The bot blocking mechanism is disabled by default when a deployment is created, but it is easy enough to enable.

How to enable it?

Update your sc_pack.conf.yaml to have the values properly set for:

  • enable_bots_blocking: will enable the bot blocking mechanism.
  • shimmercat/bots-views-dir: this directory contains ShimmerCat views for the bot blocking feature. It is only required when enable_bots_blocking is True, and there is not a views-dir defined in devlove.yaml. If there is a views-dir in devlove.yaml, it is used by default.
  • humanity_validator_host: host where the service to enable the humanity validator runs. We need it because when an IP address has been automatically detected as a prospector bot, we ensure that it is actually a bot with a mechanism using Google reCAPTCHA. That way a user can still access the website after defeating the CAPTCHA challenge, if an IP address has been wrongly classified as bot. Note that this host should be set in the devlove.yaml consultants too.
  • humanity_validator_port: port that will use the service to enable the humanity validator. Note that this host should be set in the devlove.yaml consultants too.
  • google_recaptcha_site_key: As mentioned before we use the Google reCAPTCHA mechanism to validate human visitors vs bots. You can create the key through Google reCAPTCHA admin.
  • google_recaptcha_site_secret: As mentioned before we use the Google reCAPTCHA mechanism to validate human visitors vs bots. You can create the secret through Google reCAPTCHA admin.

All values are set on the example followed in the Getting Started article, and the google_recaptcha_site_key and google_recaptcha_site_secret are real credentials we use for testing. We added the domain test-accelerator.shimmercat.com there, so you should be able to see the Google reCAPTCHA in place for this example.

Update your devlove.yaml to:

---
shimmercat-devlove:
      domains:
          elec vlad-accelerator.shimmercat.com:
              root-dir: www
              views-dir: views-dir
              change-url:
                  - / -> /target/+common/
              consultants:
                  bots:
                      connect-to: 127.0.0.1:8080
                      application-protocol: http

We do this so that ShimmerCat can know what service that should be requested when a prospector bot makes a request to your website.

And with that it is all ready to test!

Run the sc_pack supervisord command, open a browser and visit: https://example-accelerator.shimmercat.com:4043/.well-known/shimmercat/bot-blocking/?wants=/. Defeat the challenge!

We want to clarify that we need to access this specific URL for two reasons. Firstly, because the Accelerator has not added the prospector bots IP addresses to the deployment you have used for those tutorials, and secondly because we hope you are not a bot if you are reading it. In reality, when a prospector bot tries to access to any URL on your website it will have to defeat the CAPTCHA challenge on the URL we posted above, and in fact it will be redirected to it.

The humanity validator page we use by default can be replaced by a different one if you decide to use a new one, in that case you just have to contact us at ops@shimmercat.com or through our ticket system. and we will help you with it.

Thanks a lot for your time, and keep reading!