diff --git a/README.md b/README.md index 278863f..5245a9a 100644 --- a/README.md +++ b/README.md @@ -9,13 +9,13 @@ Despite the deployment of botlists, limiters, and manually blocking the most obv By combining the captcha system with a cookie, only once in 90 days the user has to solve the captcha. A small price to pay for access to a wonderfull instance! # Features -- No modification of the SearxNG code is necessary. -- The captcha must be solved the first time and stays valid for 90 days. +- No modification of the SearxNG code is necessary, the captcha system runs entirely within Nginx. +- The captcha, once solved, stays valid for 90 days. - No puzzles to solve, just a confirmation click. - It is possible and encouraged to self host the captcha system, so no information leaks to the outside world. - The privacy and security of SearxNG are maintained if a self hosted captcha system is used. - Optional automatic reporting to AbuseIPDB. -- Optionally, Cloudflare Turnstile can be used as captcha provider. +- Optionally, Cloudflare Turnstile can be used as captcha provider instead. - Everything is script based, no compilation is necessary. - In an emergency, all existing cookies can be invalidated at once. @@ -28,7 +28,7 @@ I have not made an attempt to add subdirectories to this git, so you have to dow - Lua and some dependencies needs to be installed (apt install lua) - It is recommended to self host the captcha engine, see: [github.com/tiagozip/cap](https://github.com/tiagozip/cap). - A site/secret key set and URL/API-key for the captcha engine. -- Optionally, an API key at AbuseIPDB. +- Optionally, an API key for AbuseIPDB. ## 00-captcha-init.conf *This file resides on my system in "/etc/nginx/conf.d".* @@ -40,14 +40,16 @@ It configures lua and also creates an extended log format. This log format is op The COOKIE_SECRET must be generated with "openssl rand -hex 32" -COOKIE_SECRET= -ABUSEIPDB_API_KEY=<*Optional! obtain a key at abuseipdb for automated bot reporting, or leave empty for no bot reporting*> -CAP_API_URL=https://captcha.thefloatinglab.world # *self hosted site* -CAP_SITE_KEY=1a9933aa22 -CAP_SECRET_KEY=sk-TF8Gn4KKMSC0h46j83AqZWNnga6nlc5v4hoHwn7nE +~~~ +COOKIE_SECRET = +ABUSEIPDB_API_KEY = <*Optional! obtain a key at abuseipdb for automated bot reporting, or leave empty for no bot reporting*> +CAP_API_URL = https://captcha.thefloatinglab.world # *self hosted site* +CAP_SITE_KEY = 1a9933aa22 # Example, change this! +CAP_SECRET_KEY = sk-TF8Gn4KKMSC0h46j83AqZWNnga6nlc5v4hoHwn7nE # Example, change this! \# *Leave the CAP entries empty to use the Turnstile captcha.* -TURNSTILE_SITE_KEY=0x4AAAAAADisco1ig4Qu4hPJ -TURNSTILE_SECRET_KEY=0x4AAAAAADisca-OEq9hnPskVM6G57pTXsM +TURNSTILE_SITE_KEY = 0x4AAAAAADisco1ig4Qu4hPJ # Example, change this! +TURNSTILE_SECRET_KEY = 0x4AAAAAADisca-OEq9hnPskVM6G57pTXsM # Example, change this! +~~~ ## captcha.conf *This file is in my /etc/nginx/snippets directory.* @@ -62,50 +64,40 @@ This file contains the core of the code. ## SearxNG vhost You have to modify your nginx searxng vhost file to run the captcha. -The lines in bold are the additions. +- Add the optional line "access_log /var/log/nginx/searx.access.log ts;" for enhanced logging features. +- Add the required line "include snippets/captcha.conf;" +- Add the line "access_by_lua_block { require("captcha").guard() }" in every "Location" block that needs protection. +- Add "Location" blocks for locations that should not be protected, such as the ones used for health checks/monitoring. +- Duplicate your root location "/" to "/searxng/" to catch the bots that will entry from there. You likely have no "location" for "/searxng/stats" yet, but if you use a health checker or monitor bot on the /stats directory, you can add it without the reference to the captcha system, so it remains accessible without the need for solving the captcha first. -Most bots search by using "/?q=" but some also from "/searxng/?q=". So both locations must be listed here. +Most bots search by using "/?q=" but some also from "/searxng/?q=". So both locations should be listed here. ~~~ - **access_log /var/log/nginx/searx.access.log ts**; # <-- Optional! The "ts" suffix indicates the extended log format so captcha status is shown. + access_log /var/log/nginx/searx.access.log ts; # <-- Optional! The "ts" suffix indicates the extended log format so captcha status is shown. - **include snippets/captcha.conf;** # <-- REQUIRED! + include snippets/captcha.conf; # <-- REQUIRED! # Add this location if you want to keep /searxng/stats captcha free. A reason to do this is that you might have it checked by a monitor bot (uptime). location = /searxng/stats { proxy_pass http://127.0.0.1:8886; - proxy_set_header Host $host; - proxy_set_header Connection $http_connection; - proxy_set_header X-Scheme $scheme; - proxy_set_header X-Script-Name /searxng; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } # You need to mention this location specifically to catch the bots that do not search via the root but via /searxng. location /searxng/ { - **access_by_lua_block { require("captcha").guard() }** # <-- Add this! + access_by_lua_block { require("captcha").guard() } # <-- Add this! proxy_pass http://127.0.0.1:8886; - proxy_set_header Host $host; - proxy_set_header Connection $http_connection; - proxy_set_header X-Scheme $scheme; - proxy_set_header X-Script-Name /searxng; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } location / { - **access_by_lua_block { require("captcha").guard() }** # <-- Add this! + access_by_lua_block { require("captcha").guard() } # <-- Add this! proxy_pass http://127.0.0.1:8886; - proxy_set_header Host $host; - proxy_set_header Connection $http_connection; ~~~ # Things worth knowing - Single-use tokens + your 90-day cookie. Both providers issue tokens that are good for one verify call, after which your cookie carries the user. The cookie is provider-agnostic, so an existing __ts_verified cookie continues to work after you switch providers — if the same COOKIE_SECRET is still in the env file. Rotating that secret invalidates all passes regardless of who issued them. -- I'm not affiliated in any way with the CAP self hosted captcha provider, but it looks like a sound project to me. You can fall back on Cloudflare Turnstile if you have more confident in them, but beware that they do some logging and analysis which partly defeats the purpose of SearxNG. +- I'm not affiliated in any way with the CAP self hosted captcha provider, but it looks like a sound project to me. You can fall back on Cloudflare Turnstile if you have more confidence in them, but beware that they do some logging and analysis which partly defeats the purpose of SearxNG. ## Logging - You will not see everything in your logs! Bots are immediately redirected to the captcha system, before an entry in the nginx log is made. Many bots are not even capable of properly interfacing with this redirection and simply nevere make it to the captcha, and vanish without leaving a trail. @@ -120,7 +112,8 @@ Most bots search by using "/?q=" but some also from "/searxng/?q=". So both loca - Performance & UX. Cap's PoW is invisible-style: the user clicks one checkbox, then watches a brief spinner. Solve time depends on the client device (Cap reports a default-difficulty solve at roughly 2–3s on modern hardware) — much snappier than image puzzles, but slightly more "interactive" than Turnstile's typical zero-click case. ## AbuseIPDB -- Threshold of 10 per two hours** is a reasonable default but tweak WALKAWAY_THRESHOLD and WALKAWAY_TTL to taste. With WALKAWAY_TTL = 3600, the counter auto-expires after an hour of silence, so a slow trickle never builds up. +- Reporting to AbuseIPDB is not just for others but it benefits you too! Abusers have their own lists, and you might end up on their lists for "sites to avoid because they report" and it might carry over to other services on your site(s) as well. +- Threshold of 10 per two hours is a reasonable default but tweak WALKAWAY_THRESHOLD and WALKAWAY_TTL to taste. With WALKAWAY_TTL = 3600, the counter auto-expires after an hour of silence, so a slow trickle never builds up. I have my treshold set on two hours. - One report per IP per 15 minutes. The ts_reported:add() with REPORT_COOLDOWN makes sure you don't spam AbuseIPDB if a botnet member keeps hitting you. Free tier caps at 1000 reports/day; with this design you'd need ~700 distinct repeat-offender IPs/day to come close. - Behind a CDN / reverse proxy? ngx.var.remote_addr would be the proxy's IP, not the client's. Either configure ngx_http_realip_module (set_real_ip_from/real_ip_header X-Forwarded-For) so $remote_addr reflects the real client, or change the calls to ngx.var.http_x_forwarded_for (and parse out the first hop yourself). Don't ship to AbuseIPDB without verifying which IP you're sending — reporting your CDN's IP would be embarrassing. - Reset on solve is per-IP. If a real human eventually solves from the same NAT/IP, the counter clears for the whole address. Good for shared exits.