Adding lookbehinds to rust-lang/regex

56 emschwartz 15 7/15/2025, 3:37:15 PM systemf.epfl.ch ↗

Comments (15)

RadiozRadioz · 2h ago
From a user perspective, this is extremely valuable. What an amazing improvement; unbounded especially. I do hope this would make it into actual RE2 & go.

When I use regex, I expect to be able to lookbehind, so I am routinely hit by RE2's limitations in places where it's used. Sometimes the software uses the entire matched string and you can't use non-capturing groups to work around it.

I understand go's reasons, ReDoS etc, but the "purism" of RE2 does fly in the face of practicality to an irksome degree. This is not uncommon for go.

masklinn · 6m ago
The authors’ previous article (linked in this one) was about doing this in re2 (https://systemf.epfl.ch/blog/re2-lookbehinds/), and they have a fork with those changes though I don’t know that they have a PR.

> the "purism" of RE2 does fly in the face of practicality to an irksome degree

It’s not purism tho. There are very practical reasons to want an FA-based engine, and if you compromise that to get additional features then the engine is pointless, you could have just used a backtracking engine in the first place.

hnlmorg · 1h ago
The point of standard libraries is to provide sane default behaviours. Go’s regexp package is a sensible default.

For instances where you need something more sophisticated than what’s in the standard library, you reach for 3rd party modules. And there are regex libraries for Go which support backtracking et al.

There’s definitely some irksome defaults in Go, but the choose of regex engine in the regexp library isn’t one of them

hu3 · 1h ago
Interesting. I have used look behind before without knowing their specifics. AI generated a regex and unit tests passed so I carried on with life.

Searching for a simple explanation of how it works, I found this which also explains negative look behind and look ahead. TIL:

https://www.phptutorial.net/php-tutorial/regex-lookbehind/

singron · 2h ago
I don't think there is discussion of the snort-2 and snort-3 benchmarks, which the linear engine handily beats the python re for once (70-80x faster). I'm guessing they are cases where backtracking is painfully quadratic in re, but it would have been nice to hear about those successes. [In the rest of the benchmarks, python re is 2-5x faster]
LegionMammal978 · 1h ago
> However, as a downside our lookbehinds do not support containing capture groups which are a feature allowing to extract a substring that matched a part of the regex pattern.

I wonder in what situation someone would even be tempted to put a capture group into a lookbehind expression, except unintentionally by using () instead of (?:) for grouping. Maybe in an attempt to obtain capture groups from overlapping matches? But even in that case, lookaheads would be clearer, when available.

CJefferson · 2h ago
Great! I enjoyed reading through, and I'm going to come back later and read a little more carefully.

If anyone knows (to let me be lazy), is this the same regex engine used by ripgrep? Or is that an independent implementation?

flaghacker · 2h ago
Yes, the `regex` crate is also the regex engine used by ripgrep, both were developed by https://github.com/burntsushi.
cbarrick · 2h ago
Same engine as ripgrep
d3m0t3p · 2h ago
Nice to see a master thesis highlighted on the research groupe page
librasteve · 1h ago
It’s odd to see such a widely adopted language as Rust only just getting some regex basics. Whereas Raku (https://raku.org) has made a strong forward step in regex syntax over PCRE, made by the same language designer with implementation of modern unicode savvy features like Grapheme and Diacritic handling that are essential to building consistent code to handle multilingual needs.

  say "Cool" ~~ /<:Letter>* <:Block("Emoticons")>/; # 「Cool」
  say "Cześć" ~~ m:ignoremark/ Czesc /;               # 「Cześć」
  say "WEIẞE" ~~ m:ignorecase/ weisse /;              # 「WEIẞE」
  say "หนูแฮมสเตอร์" ~~ /<:Letter>+/;                    # 「หนูแฮมสเตอร์」
burntsushi · 58m ago
It's not only just getting some "regex basics." The `fancy-regex` crate has provided look-behind for years. The OP is about adopting look-behind to the linear time guarantee required by the `regex` crate.

My main focus for the `regex` crate has been on performance: https://github.com/BurntSushi/rebar

How does Raku's regex performance compare to Perl?

quotemstr · 4m ago
This right here is one of the foundational splits in the programming community. This article is all about how cool an _implementation_ is. This comment is about some other engine's cool _syntax_. Deep versus superficial. The two camps can't stand each other.
shawn_w · 34m ago
I don't think Philip Hazel, who wrote PCRE, has anything to do with perl or raku development.
librasteve · 1h ago
huh … guess HN blocks emojis