Enhanced the README file, added picture
This commit is contained in:
parent
f95517d3a1
commit
40a1fa1e42
2 changed files with 15 additions and 11 deletions
26
README.md
26
README.md
|
|
@ -8,13 +8,15 @@ My SearxNG instance on [searx.thefloatinglab.world](https://searx.thefloatinglab
|
|||
Despite the deployment of botlists, limiters, and manually blocking the most obvious bots, it still remained an endless battle. Most of the time my instance was useless. So either I had to give up on this project, or find a way to block the bots and let the genuine users through. A captcha system might not be popular, but on the other hand, a useless site is well, pretty useless.
|
||||
By combining the captcha system with a cookie, only once in 90 days the user has to solve the captcha. A small price to pay for access to a wonderfull instance!
|
||||
|
||||

|
||||
|
||||
# Features
|
||||
- No modification of the SearxNG code is necessary, the captcha system runs entirely within Nginx.
|
||||
- The captcha, once solved, stays valid for 90 days.
|
||||
- No puzzles to solve, just a confirmation click.
|
||||
- It is possible and encouraged to self host the captcha system, so no information leaks to the outside world.
|
||||
- The privacy and security of SearxNG are maintained if a self hosted captcha system is used.
|
||||
- Optional automatic reporting to AbuseIPDB.
|
||||
- Optional automatic reporting to [AbuseIPDB](https://www.abuseipdb.com).
|
||||
- Optionally, Cloudflare Turnstile can be used as captcha provider instead.
|
||||
- Everything is script based, no compilation is necessary.
|
||||
- In an emergency, all existing cookies can be invalidated at once.
|
||||
|
|
@ -28,7 +30,7 @@ I have not made an attempt to add subdirectories to this git, so you have to dow
|
|||
- Lua and some dependencies needs to be installed (apt install lua)
|
||||
- It is recommended to self host the captcha engine, see: [github.com/tiagozip/cap](https://github.com/tiagozip/cap).
|
||||
- A site/secret key set and URL/API-key for the captcha engine.
|
||||
- Optionally, an API key for AbuseIPDB.
|
||||
- Optionally, an API key for [AbuseIPDB](https://www.abuseipdb.com).
|
||||
|
||||
## 00-captcha-init.conf
|
||||
*This file resides on my system in "/etc/nginx/conf.d".*
|
||||
|
|
@ -43,12 +45,14 @@ The COOKIE_SECRET must be generated with "openssl rand -hex 32"
|
|||
~~~
|
||||
COOKIE_SECRET = <generate your own key>
|
||||
ABUSEIPDB_API_KEY = <*Optional! obtain a key at abuseipdb for automated bot reporting, or leave empty for no bot reporting*>
|
||||
CAP_API_URL = https://captcha.thefloatinglab.world # *self hosted site*
|
||||
CAP_SITE_KEY = 1a9933aa22 # Example, change this!
|
||||
CAP_SECRET_KEY = sk-TF8Gn4KKMSC0h46j83AqZWNnga6nlc5v4hoHwn7nE # Example, change this!
|
||||
\# *Leave the CAP entries empty to use the Turnstile captcha.*
|
||||
TURNSTILE_SITE_KEY = 0x4AAAAAADisco1ig4Qu4hPJ # Example, change this!
|
||||
TURNSTILE_SECRET_KEY = 0x4AAAAAADisca-OEq9hnPskVM6G57pTXsM # Example, change this!
|
||||
# Enter here the url of your self hosted CAP captcha provider.
|
||||
CAP_API_URL = https://captcha.thefloatinglab.world
|
||||
# Enter here your own keys:
|
||||
CAP_SITE_KEY = 1a9933aa22
|
||||
CAP_SECRET_KEY = sk-TF8Gn4KKMSC0h46j83AqZWNnga6nlc5v4hoHwn7nE
|
||||
# Leave the CAP entries empty to use the Turnstile captcha instead.
|
||||
TURNSTILE_SITE_KEY = 0x4AAAAAADisco1ig4Qu4hPJ
|
||||
TURNSTILE_SECRET_KEY = 0x4AAAAAADisca-OEq9hnPskVM6G57pTXsM
|
||||
~~~
|
||||
|
||||
## captcha.conf
|
||||
|
|
@ -100,7 +104,7 @@ Most bots search by using "/?q=" but some also from "/searxng/?q=". So both loca
|
|||
- I'm not affiliated in any way with the CAP self hosted captcha provider, but it looks like a sound project to me. You can fall back on Cloudflare Turnstile if you have more confidence in them, but beware that they do some logging and analysis which partly defeats the purpose of SearxNG.
|
||||
|
||||
## Logging
|
||||
- You will not see everything in your logs! Bots are immediately redirected to the captcha system, before an entry in the nginx log is made. Many bots are not even capable of properly interfacing with this redirection and simply nevere make it to the captcha, and vanish without leaving a trail.
|
||||
- You will not see everything in your logs! Bots are immediately redirected to the captcha system, before an entry in the nginx log is made. Many bots are not even capable of properly interfacing with this redirection and simply never make it to the captcha, and vanish without leaving a trail.
|
||||
- You will see a sharp decline in bots. This is not a malfunction but the intention. Some bots learn quickly, and getting listed in AbuseIPDB doesn't encourage them. It looks like they are coded to detect reporting, or some bot owners might receive automated notifications if they get listed, but one way or the other, they avoid sites that put and keep them on public blacklists.
|
||||
|
||||
## Self hosted CAP
|
||||
|
|
@ -112,7 +116,7 @@ Most bots search by using "/?q=" but some also from "/searxng/?q=". So both loca
|
|||
- Performance & UX. Cap's PoW is invisible-style: the user clicks one checkbox, then watches a brief spinner. Solve time depends on the client device (Cap reports a default-difficulty solve at roughly 2–3s on modern hardware) — much snappier than image puzzles, but slightly more "interactive" than Turnstile's typical zero-click case.
|
||||
|
||||
## AbuseIPDB
|
||||
- Reporting to AbuseIPDB is not just for others but it benefits you too! Abusers have their own lists, and you might end up on their lists for "sites to avoid because they report" and it might carry over to other services on your site(s) as well.
|
||||
- Reporting to [AbuseIPDB](https://www.abuseipdb.com) is not just for others but it benefits you too! Abusers have their own lists, and you might end up on their lists for "sites to avoid because they report" and it might carry over to other services on your site(s) as well.
|
||||
- Threshold of 10 per two hours is a reasonable default but tweak WALKAWAY_THRESHOLD and WALKAWAY_TTL to taste. With WALKAWAY_TTL = 3600, the counter auto-expires after an hour of silence, so a slow trickle never builds up. I have my treshold set on two hours.
|
||||
- One report per IP per 15 minutes. The ts_reported:add() with REPORT_COOLDOWN makes sure you don't spam AbuseIPDB if a botnet member keeps hitting you. Free tier caps at 1000 reports/day; with this design you'd need ~700 distinct repeat-offender IPs/day to come close.
|
||||
- Behind a CDN / reverse proxy? ngx.var.remote_addr would be the proxy's IP, not the client's. Either configure ngx_http_realip_module (set_real_ip_from/real_ip_header X-Forwarded-For) so $remote_addr reflects the real client, or change the calls to ngx.var.http_x_forwarded_for (and parse out the first hop yourself). Don't ship to AbuseIPDB without verifying which IP you're sending — reporting your CDN's IP would be embarrassing.
|
||||
|
|
@ -124,6 +128,6 @@ Most bots search by using "/?q=" but some also from "/searxng/?q=". So both loca
|
|||
- Audit trail. Every report logs to error.log at notice level with the IP and the count, so you can grep 'reported.*AbuseIPDB' /var/log/nginx/error.log | wc -l for a daily tally. If you want richer accounting (which paths the bot hit, user-agent, ASN), you can pass them through to the timer and stitch them into the comment field — AbuseIPDB shows the comment verbatim on the IP's public page.
|
||||
|
||||
# License
|
||||
See the license file.
|
||||
See the [license file](LICENSE).
|
||||
The original of this project can be found at [git.thefloatinglab.world/TheFloatingLab/SearxNG-Captcha](https://git.thefloatinglab.world/TheFloatingLab/SearxNG-Captcha) which is part of [www.thefloatinglab.world](https://www.thefloatinglab.world)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue