Reverse Engineering Vercel's BotID

81 hazebooth 13 6/30/2025, 12:19:45 PM nullpt.rs ↗

Comments (13)

ATechGuy · 3h ago

> At the moment, it seems Basic mode is so basic that it allows everything to pass as human. That’ll likely change as they gather more telemetry to better identify what a bot signal looks like.

So they are basically collecting telemetry in the name of "free basic anti-bot" solution.

cchance · 2h ago

free basic anti-bot solution that literally NEVER BLOCKS A BOT, like what the actual fuck

codedokode · 6h ago

Note that the bot detection script uses WebGL to obtain GPU name. I assume this (fingerprinting) is the most popular use of WebGL. Sad that independent browsers like Firefox do not supply fake values.

nullpt_rs · 6h ago

Sadly, spoofing GPU vendor & renderer can be an even larger flag since they can hash the resulting image of the canvas to compare it with a database of collected fingerprints[0]

[0]: https://research.google/pubs/picasso-lightweight-device-clas...

reaperducer · 5h ago

Until a major player gets on board. Then it works.

Apple does this by sending an imposter user agent from Safari on iPads.

If only that was expanded to iPhones, too. And then send rotating, or randomized user agents.

nerdsniper · 5h ago

Apple does it because they don’t have a vested financial interest in internet-wide tracking.

Google does.

And while Mozilla does too because the vast majority of their funding comes from Google, it’s more pertinent that they don’t have the market share to pull this off. Firefox would just stop working on major websites if they did this.

andrewmcwatters · 5h ago

It’s funny that trying to click on the Google Scholar link there falsely identifies me as a bot.

b0a04gl · 3h ago

why is bot detection even happening at render time instead of request time. why can't tell you’re a bot from your headers, UA, IP, TLS fingerprint. imo making it a surveillance. 'you're a bot, ok not just go away, let’s fingerprint your GPU and assign you a behavioral risk score anyway'

n2d4 · 3h ago

It's really hard to detect it at request time. It's practically trivial for an attacker to fake headers to resemble a real browser.

baby_souffle · 1h ago

You absolutely have options at request time. Arguably, some of the things you can only do at request time are part of a full and complete mitigation strategy.

You can fingerprint the originating TCP stack with some degree of confidence. If the request looks like it came from a Linux server but the user agent says Windows, that's a signal.

Likewise, the IP address making the request has geographic information associated with it. If my IP address says I'm in Romania but my browser is asking for the English language version of the page... That's a signal.

Similar to basic IP/Geo, you can do DNS and STUN based profiling, too. This helps you catch people that are behind proxies or VPNs.

To blur the line, you can use JavaScript to measure request timing. Proxies that are going to tamper with the request to hide its origins or change its fingerprint will add a measurable latency.

cAtte_ · 2m ago

> If my IP address says I'm in Romania but my browser is asking for the English language version of the page... That's a signal.

jesus christ don't give them ideas. it's annoying enough to have my country's language forced on me (i prefer english) when there's a perfectly good http header for that. now blocking me based on this?!

n2d4 · 43m ago

None of these are conclusive by any means. The IP address check you mentioned would mark anyone using a VPN, or English speakers living abroad. Modern bot detection combines lots of heuristics like these together, and being able to run JavaScript in the browser (at render-time) adds a lot more data that can be used to make a better prediction.

indrora · 1h ago

Anubis does it pretty decently.