前瞻断言:(?=...), (?!...)

前瞻断言 "展望未来":它尝试将后续输入与给定模式进行匹配,但它不会消耗任何输入 - 如果匹配成功,输入中的当前位置保持不变。

¥A lookahead assertion "looks ahead": it attempts to match the subsequent input with the given pattern, but it does not consume any of the input — if the match is successful, the current position in the input stays the same.

语法

¥Syntax

regex
(?=pattern)
(?!pattern)

参数

¥Parameters

pattern

由你可以在正则表达式文字中使用的任何内容组成的模式,包括 disjunction

描述

¥Description

正则表达式一般从左到右匹配。这就是为什么 Lookahead 和 lookbehind 断言被这样称呼的原因 — Lookahead 断言右侧的内容,而 Lookbehind 断言左侧的内容。

¥A regular expression generally matches from left to right. This is why lookahead and lookbehind assertions are called as such — lookahead asserts what's on the right, and lookbehind asserts what's on the left.

为了使 (?=pattern) 断言成功,pattern 必须匹配当前位置之后的文本,但当前位置不会更改。(?!pattern) 形式否定断言 - 如果 pattern 在当前位置不匹配,则断言成功。

¥In order for a (?=pattern) assertion to succeed, the pattern must match the text after the current position, but the current position is not changed. The (?!pattern) form negates the assertion — it succeeds if the pattern does not match at the current position.

pattern 可以包含 捕获组。有关此情况下行为的更多信息,请参阅捕获组页面。

¥The pattern can contain capturing groups. See the capturing groups page for more information on the behavior in this case.

与其他正则表达式运算符不同,它不会回溯到前瞻 — 此行为是从 Perl 继承的。仅当 pattern 包含 捕获组 并且前瞻后面的模式包含这些捕获的 backreferences 时,这才重要。例如:

¥Unlike other regular expression operators, there's no backtracking into a lookahead — this behavior is inherited from Perl. This only matters when the pattern contains capturing groups and the pattern following the lookahead contains backreferences to those captures. For example:

js
/(?=(a+))a*b\1/.exec("baabac"); // ['aba', 'a']
// Not ['aaba', 'a']

上述模式的匹配发生如下:

¥The matching of the pattern above happens as follows:

  1. 前瞻 (a+)"baabac" 中的第一个 "a" 之前成功,并且 "aa" 被捕获,因为量词是贪婪的。
  2. a*b"baabac" 中的 "aab" 匹配,因为先行查找不会消耗其匹配的字符串。
  3. \1 与后面的字符串不匹配,因为这需要 2 个 "a",但只有 1 个可用。因此匹配器会回溯,但不会进入前瞻,因此捕获组无法减少到 1 "a",此时整个匹配失败。
  4. exec() 在第二个 "a" 之前的下一个位置重新尝试匹配。这次,前瞻匹配 "a"a*b 匹配 "ab"。反向引用 \1 与捕获的 "a" 匹配,匹配成功。

如果正则表达式能够回溯到前瞻并修改其中所做的选择,则匹配将在步骤 3 中成功,即 (a+) 匹配第一个 "a"(而不是前两个 "a"),a*b 匹配 "aab",甚至无需重新尝试 下一个输入位置。

¥If the regex is able to backtrack into the lookahead and revise the choice made in there, then the match would succeed at step 3 by (a+) matching the first "a" (instead of the first two "a"s) and a*b matching "aab", without even re-attempting the next input position.

负向前瞻也可以包含捕获组,但反向引用仅在 pattern 内有意义,因为如果继续匹配,pattern 必然不匹配(否则断言失败)。这意味着在 pattern 之外,对负向前查找中的那些捕获组的反向引用始终会成功。例如:

¥Negative lookaheads can contain capturing groups as well, but backreferences only make sense within the pattern, because if matching continues, pattern would necessarily be unmatched (otherwise the assertion fails). This means outside of the pattern, backreferences to those capturing groups in negative lookaheads always succeed. For example:

js
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac"); // ['baaabaac', 'ba', undefined, 'abaac']

上述模式的匹配发生如下:

¥The matching of the pattern above happens as follows:

  1. (.*?) 模式是非贪婪的,因此它从不匹配任何内容开始。但是,下一个字符是 a,它无法匹配输入中的 "b"
  2. (.*?) 模式与 "b" 匹配,因此模式中的 a"baaabaac" 中的第一个 "a" 匹配。
  3. 在此位置,先行匹配成功,因为如果 (a+) 匹配 "aa",则 (a+)b\2c 匹配 "aabaac"。这会导致断言失败,因此匹配器会回溯。
  4. (.*?) 模式与 "ba" 匹配,因此模式中的 a"baaabaac" 中的第二个 "a" 匹配。
  5. 在此位置,前瞻无法匹配,因为剩余输入不遵循“任意数量的 "a"、一个 "b"、相同数量的 "a"、一个 c”模式。这会导致断言成功。
  6. 但是,由于断言中没有任何内容匹配,因此 \2 反向引用没有值,因此它与空字符串匹配。这会导致剩余的输入最终被 (.*) 消耗。

通常,断言不能是 quantified。然而,在 Unicode 不识别模式 中,前瞻断言可以被量化。这是 已弃用的 Web 兼容性语法,你不应该依赖它。

¥Normally, assertions cannot be quantified. However, in Unicode-unaware mode, lookahead assertions can be quantified. This is a deprecated syntax for web compatibility, and you should not rely on it.

js
/(?=a)?b/.test("b"); // true; the lookahead is matched 0 time

示例

¥Examples

匹配字符串而不消耗它们

¥Matching strings without consuming them

有时,验证匹配的字符串后面是否跟着某些内容而不将其作为结果返回是很有用的。以下示例匹配后跟逗号/句点的字符串,但结果中不包含标点符号:

¥Sometimes it's useful to validate that the matched string is followed by something without returning that as the result. The following example matches a string that is followed by a comma/period, but the punctuation is not included in the result:

js
function getFirstSubsentence(str) {
  return /^.*?(?=[,.])/.exec(str)?.[0];
}

getFirstSubsentence("Hello, world!"); // "Hello"
getFirstSubsentence("Thank you."); // "Thank you"

通过 capturing 你感兴趣的子匹配可以实现类似的效果。

¥A similar effect can be achieved by capturing the submatch you are interested in.

模式减法和交集

¥Pattern subtraction and intersection

使用前瞻,你可以将字符串与不同的模式进行多次匹配,这使你可以表达复杂的关系,例如减法(是 X 但不是 Y)和交集(既是 X 又是 Y)。

¥Using lookahead, you can match a string multiple times with different patterns, which allows you to express complex relationships like subtraction (is X but not Y) and intersection (is both X and Y).

以下示例匹配任何不是 保留字identifier(为简洁起见,此处仅显示三个保留字;可以将更多保留字添加到此析取中)。[$_\p{ID_Start}][$\u200c\u200d\p{ID_Continue}]* 语法准确地描述了语言规范中的一组标识符字符串;你可以在 词汇语法 中阅读有关标识符的更多信息,并在 Unicode 字符类转义 中阅读有关 \p 转义的更多信息。

¥The following example matches any identifier that's not a reserved word (only showing three reserved words here for brevity; more reserved words can be added to this disjunction). The [$_\p{ID_Start}][$\u200c\u200d\p{ID_Continue}]* syntax describes exactly the set of identifier strings in the language spec; you can read more about identifiers in lexical grammar and the \p escape in Unicode character class escape.

js
function isValidIdentifierName(str) {
  const re =
    /^(?!(?:break|case|catch)$)[$_\p{ID_Start}][$\u200c\u200d\p{ID_Continue}]*$/u;
  return re.test(str);
}

isValidIdentifierName("break"); // false
isValidIdentifierName("foo"); // true
isValidIdentifierName("cases"); // true

以下示例匹配一个既是 ASCII 又可用作标识符部分的字符串:

¥The following example matches a string that's both ASCII and can be used as an identifier part:

js
function isASCIIIDPart(char) {
  return /^(?=\p{ASCII}$)\p{ID_Start}$/u.test(char);
}

isASCIIIDPart("a"); // true
isASCIIIDPart("α"); // false
isASCIIIDPart(":"); // false

如果你要对有限多个字符进行交集和减法,你可能需要使用通过 v 标志启用的 字符集交集 语法。

¥If you are doing intersection and subtraction with finitely many characters, you may want to use the character set intersection syntax enabled with the v flag.

规范

Specification
ECMAScript Language Specification
# prod-Assertion

¥Specifications

浏览器兼容性

BCD tables only load in the browser

¥Browser compatibility

也可以看看