2. Enable bot blocking¶
ShimmerCat has a mechanism to automatically identify an IP address as a prospective bot, and also to white-list the good bots, like Google’s crawlers. Once bots are identified, the IP addresses that they use are automatically spread to all the deployments, so you don’t need to worry about that.
The bots blocking mechanism is disabled by default when a deployment is created, because it requires extra configuration to decide what to do when a suspected bot visits the site.
When bot-blocking is enabled, ShimmerCat will redirect the suspected bot
to the URL path /.well-known/shimmercat/bot-blocking?wants=<original-path>
.
That page should present the visitor a challenge, and forbid the visitor
from accessing any other resources in the site until it has proven
herself to be human.
The challenge page itself may need some static media to render correctly,
so any requests whose URL path start with /.well-known
won’t be blocked
by ShimmerCat, even if they come from a suspected bot.
If this exception for static assets is not enough, it is also possible to
host the static assets for the challenge page in a separate (sub-)domain.
2.1. How to enable it?¶
We provide a ready-made challenge application and all the underlying logic in sc_pack
,
just follow the instructions below:
Update your sc_pack.conf.yaml
to have the values properly set for:
enable_bots_blocking
: <True/False> will enable the bot blocking mechanism.shimmercat/bots-views-dir
: this directory contains ShimmerCat views for the bot blocking feature. It is only required when enable_bots_blocking is True, and there is not a views-dir defined in devlove.yaml. If there is a views-dir in devlove.yaml, it is used by default.humanity_validator_host
: host where the service to enable the humanity validator runs. We need it because when an IP address has been automatically detected as a prospector bot, we ensure that it is actually a bot with a mechanism using Google reCAPTCHA. That way a user can still access the website after defeating the CAPTCHA challenge, if an IP address has been wrongly classified as bot. Note that this host should be set in the devlove.yaml consultants too.humanity_validator_port
: port that will use the service to enable the humanity validator. Note that this host should be set in the devlove.yaml consultants too.google_recaptcha_site_key
: As mentioned before we use the Google reCAPTCHA mechanism to validate human visitors vs bots. You can create the key through Google reCAPTCHA admin.google_recaptcha_site_secret
: As mentioned before we use the Google reCAPTCHA mechanism to validate human visitors vs bots. You can create the secret through Google reCAPTCHA admin.
All values are set on the example followed in the
Getting Started article, and the google_recaptcha_site_key
and google_recaptcha_site_secret
are real credentials we use for testing. We added the domain
test-accelerator.shimmercat.com
there, so you should be able to see the Google reCAPTCHA in place for this example.
Update your devlove.yaml
so that it contains the
humanity_validator_host
and humanity_validator_port
as the bots
consultant.
In the example below we are using the combination 127.0.0.1:8080
for host and port:
---
shimmercat-devlove:
domains:
elec vlad-accelerator.shimmercat.com:
root-dir: www
views-dir: views-dir
change-url:
- / -> /target/+common/
consultants:
bots:
connect-to: 127.0.0.1:8080
application-protocol: http
We do this so that ShimmerCat can know what service that should be requested when a prospector bot makes a request to your website. Now run the sc_pack supervisord
command to ensure the changes propagate to the supervisor.conf
. After the bot-blocking challenge page is enabled, you should find a line like
the one below in your supervisor/supervisor.conf
file:
command=/srv/test-accelerator.shimmercat.com/venv/bin/python3 \
/srv/test-accelerator.shimmercat.com/venv/lib/python3.5/site-packages/humanity_validator/server.py \
--host localhost \
--port 8096 \
--site-key xxxx...xxx \
--site-secret xxxx...xxx
And with that it is all ready to test! Open a browser and visit: https://example-accelerator.shimmercat.com:4043/.well-known/shimmercat/bot-blocking/?wants=/. Defeat the challenge!
We want to clarify that we need to access this specific URL for two reasons. Firstly, because the Accelerator has not added the prospector bots IP addresses to the deployment you have used for those tutorials, and secondly because we hope you are not a bot if you are reading it. In reality, when a prospector bot tries to access to any URL on your website it will have to defeat the CAPTCHA challenge on the URL we posted above, and in fact it will be redirected to it.
The humanity validator page we use by default can be replaced by a different one if you decide to use a new one, in that case you just have to contact us at ops@shimmercat.com or through our ticket system. and we will help you with it.