> However, as a downside our lookbehinds do not support containing capture groups which are a feature allowing to extract a substring that matched a part of the regex pattern.
I wonder in what situation someone would even be tempted to put a capture group into a lookbehind expression, except unintentionally by using () instead of (?:) for grouping. Maybe in an attempt to obtain capture groups from overlapping matches? But even in that case, lookaheads would be clearer, when available.
RadiozRadioz · 7m ago
From a user perspective, this is extremely valuable. What an amazing improvement; unbounded especially. I do hope this would make it into actual RE2 & go.
When I use regex, I expect to be able to lookbehind, so I am routinely hit by RE2's limitations in places where it's used. Sometimes the software uses the entire matched string and you can't use non-capturing groups to work around it.
I understand go's reasons, ReDoS etc, but the "purism" of RE2 does fly in the face of practicality to an irksome degree. This is not uncommon for go.
singron · 11m ago
I don't think there is discussion of the snort-2 and snort-3 benchmarks, which the linear engine handily beats the python re for once (70-80x faster). I'm guessing they are cases where backtracking is painfully quadratic in re, but it would have been nice to hear about those successes. [In the rest of the benchmarks, python re is 2-5x faster]
CJefferson · 15m ago
Great! I enjoyed reading through, and I'm going to come back later and read a little more carefully.
If anyone knows (to let me be lazy), is this the same regex engine used by ripgrep? Or is that an independent implementation?
flaghacker · 10m ago
Yes, the `regex` crate is also the regex engine used by ripgrep, both were developed by https://github.com/burntsushi.
cbarrick · 13m ago
Same engine as ripgrep
d3m0t3p · 9m ago
Nice to see a master thesis highlighted on the research groupe page
I wonder in what situation someone would even be tempted to put a capture group into a lookbehind expression, except unintentionally by using () instead of (?:) for grouping. Maybe in an attempt to obtain capture groups from overlapping matches? But even in that case, lookaheads would be clearer, when available.
When I use regex, I expect to be able to lookbehind, so I am routinely hit by RE2's limitations in places where it's used. Sometimes the software uses the entire matched string and you can't use non-capturing groups to work around it.
I understand go's reasons, ReDoS etc, but the "purism" of RE2 does fly in the face of practicality to an irksome degree. This is not uncommon for go.
If anyone knows (to let me be lazy), is this the same regex engine used by ripgrep? Or is that an independent implementation?