正则表达式

正则表达式是用于匹配字符串中的字符组合的模式。在 JavaScript 中,正则表达式也是对象。这些模式与 RegExpexec()test() 方法以及 Stringmatch()matchAll()replace()replaceAll()search()split() 方法一起使用。本章介绍 JavaScript 正则表达式。

¥Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp, and with the match(), matchAll(), replace(), replaceAll(), search(), and split() methods of String. This chapter describes JavaScript regular expressions.

创建正则表达式

¥Creating a regular expression

你可以通过以下两种方式之一构造正则表达式:

¥You construct a regular expression in one of two ways:

  • 使用正则表达式文字,它由包含在斜杠之间的模式组成,如下所示:
    js
    const re = /ab+c/;
    
    正则表达式文字在加载脚本时提供正则表达式的编译。如果正则表达式保持不变,使用它可以提高性能。
  • 或者调用 RegExp 对象的构造函数,如下:
    js
    const re = new RegExp("ab+c");
    
    使用构造函数提供正则表达式的运行时编译。当你知道正则表达式模式将发生更改,或者你不知道该模式并从其他来源(例如用户输入)获取它时,请使用构造函数。

编写正则表达式模式

¥Writing a regular expression pattern

正则表达式模式由简单字符(例如 /abc/)或简单字符和特殊字符的组合(例如 /ab*c//Chapter (\d+)\.\d*/)组成。最后一个示例包括括号,用作存储设备。与模式的这一部分进行的匹配会被记住以供以后使用,如 使用组 中所述。

¥A regular expression pattern is composed of simple characters, such as /abc/, or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/. The last example includes parentheses, which are used as a memory device. The match made with this part of the pattern is remembered for later use, as described in Using groups.

注意:如果你已经熟悉正则表达式的形式,你还可以阅读 备忘单 来快速查找特定模式/结构。

¥Note: If you are already familiar with the forms of a regular expression, you may also read the cheat sheet for a quick lookup for a specific pattern/construct.

使用简单的图案

¥Using simple patterns

简单模式由你想要查找直接匹配的字符构成。例如,仅当精确序列 "abc" 出现时(所有字符在一起并按该顺序),模式 /abc/ 才会匹配字符串中的字符组合。这样的匹配在字符串 "Hi, do you know your abc's?""The latest airplane designs evolved from slabcraft." 中会成功。在这两种情况下,匹配都是与子字符串 "abc" 匹配。字符串 "Grab crab" 中没有匹配项,因为虽然它包含子字符串 "ab c",但不包含确切的子字符串 "abc"

¥Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /abc/ matches character combinations in strings only when the exact sequence "abc" occurs (all characters together and in that order). Such a match would succeed in the strings "Hi, do you know your abc's?" and "The latest airplane designs evolved from slabcraft.". In both cases the match is with the substring "abc". There is no match in the string "Grab crab" because while it contains the substring "ab c", it does not contain the exact substring "abc".

使用特殊字符

¥Using special characters

当搜索匹配项需要的不仅仅是直接匹配时,例如查找一个或多个 b,或查找空格,你可以在模式中包含特殊字符。例如,要匹配单个 "a" 后跟零个或多个 "b" 后跟 "c",你可以使用模式 /ab*c/"b" 后面的 * 表示 "前一项出现 0 次或多次。" 在字符串 "cbbabbbbcdebc" 中,该模式将匹配子字符串 "abbbbc"

¥When the search for a match requires something more than a direct match, such as finding one or more b's, or finding white space, you can include special characters in the pattern. For example, to match a single "a" followed by zero or more "b"s followed by "c", you'd use the pattern /ab*c/: the * after "b" means "0 or more occurrences of the preceding item." In the string "cbbabbbbcdebc", this pattern will match the substring "abbbbc".

以下页面提供了适合每个类别的不同特殊字符的列表,以及说明和示例。

¥The following pages provide lists of the different special characters that fit into each category, along with descriptions and examples.

断言 指南

断言包括边界,它指示行和单词的开头和结尾,以及以某种方式指示可能匹配的其他模式(包括向前查找、向后查找和条件表达式)。

字符类 指南

区分不同类型的字符。例如,区分字母和数字。

组和反向引用 指南

组将多个模式分组为一个整体,并且在使用正则表达式模式匹配字符串时,捕获组提供额外的子匹配信息。反向引用引用同一正则表达式中先前捕获的组。

量词 指南

指示要匹配的字符或表达式的数量。

如果你想在单个表中查看可在正则表达式中使用的所有特殊字符,请参阅以下内容:

¥If you want to look at all the special characters that can be used in regular expressions in a single table, see the following:

Special characters in regular expressions.
角色/结构 对应文章
[xyz], [^xyz], ., \d, \D, \w, \W, \s, \S, \t, \r, \n, \v, \f, [\b], \0, \cX, \xhh, \uhhhh, \u{hhhh}, x|y

字符类

^, $, \b, \B, x(?=y), x(?!y), (?<=y)x, (?<!y)x

断言

(x), (?<Name>x), (?:x), \n, \k<Name>

组和反向引用

x*, x+, x?, x{n}, x{n,}, x{n,m}

量词

注意:还提供更大的备忘单(仅汇总这些单独文章的部分内容)。

¥Note: A larger cheat sheet is also available (only aggregating parts of those individual articles).

转义

¥Escaping

如果你需要按字面意思使用任何特殊字符(例如,实际上搜索 "*"),则必须通过在其前面放置反斜杠来转义它。例如,要搜索 "a",然后是 "*",然后是 "b",你可以使用 /a\*b/ - 反斜杠 "escapes" 和 "*",使其成为字面值而不是特殊值。

¥If you need to use any of the special characters literally (actually searching for a "*", for instance), you must escape it by putting a backslash in front of it. For instance, to search for "a" followed by "*" followed by "b", you'd use /a\*b/ — the backslash "escapes" the "*", making it literal instead of special.

同样,如果你正在编写正则表达式文字并且需要匹配斜杠 ("/"),则需要对其进行转义(否则,它会终止该模式)。例如,要搜索字符串 "/example/" 后跟一个或多个字母字符,你可以使用 /\/example\/[a-z]+/i - 每个斜杠之前的反斜杠使它们成为文字。

¥Similarly, if you're writing a regular expression literal and need to match a slash ("/"), you need to escape that (otherwise, it terminates the pattern). For instance, to search for the string "/example/" followed by one or more alphabetic characters, you'd use /\/example\/[a-z]+/i—the backslashes before each slash make them literal.

要匹配文字反斜杠,你需要转义反斜杠。例如,要匹配字符串 "C:"(其中 "C" 可以是任何字母),你可以使用 /[A-Z]:\\/ — 第一个反斜杠转义其后的反斜杠,因此表达式会搜索单个文字反斜杠。

¥To match a literal backslash, you need to escape the backslash. For instance, to match the string "C:" where "C" can be any letter, you'd use /[A-Z]:\\/ — the first backslash escapes the one after it, so the expression searches for a single literal backslash.

如果将 RegExp 构造函数与字符串文字一起使用,请记住反斜杠是字符串文字中的转义符,因此要在正则表达式中使用它,你需要在字符串文字级别对其进行转义。/a\*b/new RegExp("a\\*b") 创建相同的表达式,该表达式搜索 "a",后跟文字 "*",后跟 "b"。

¥If using the RegExp constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. /a\*b/ and new RegExp("a\\*b") create the same expression, which searches for "a" followed by a literal "*" followed by "b".

如果转义字符串还不是你的模式的一部分,你可以使用 String.prototype.replace() 添加它们:

¥If escape strings are not already part of your pattern you can add them using String.prototype.replace():

js
function escapeRegExp(string) {
  return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
}

正则表达式后面的 "g" 是执行全局搜索的选项或标志,查找整个字符串并返回所有匹配项。下面在 使用标志进行高级搜索 中详细解释。

¥The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches. It is explained in detail below in Advanced Searching With Flags.

为什么这没有内置到 JavaScript 中?有一个 proposal 可以给 RegExp 添加这样的功能。

¥Why isn't this built into JavaScript? There is a proposal to add such a function to RegExp.

使用括号

¥Using parentheses

正则表达式模式任何部分的括号都会导致匹配子字符串的该部分被记住。一旦记住,子字符串就可以被调用以供其他用途。详细信息请参见 组和反向引用

¥Parentheses around any part of the regular expression pattern causes that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use. See Groups and backreferences for more details.

在 JavaScript 中使用正则表达式

¥Using regular expressions in JavaScript

正则表达式与 RegExp 方法 test()exec() 以及 String 方法 match()matchAll()replace()replaceAll()search()split() 一起使用。

¥Regular expressions are used with the RegExp methods test() and exec() and with the String methods match(), matchAll(), replace(), replaceAll(), search(), and split().

方法 描述
exec() 执行字符串中匹配项的搜索。如果不匹配,它会返回一个信息数组或 null
test() 测试字符串中的匹配。它返回 truefalse
match() 返回包含所有匹配项的数组,包括捕获组;如果未找到匹配项,则返回 null
matchAll() 返回一个包含所有匹配项的迭代器,包括捕获组。
search() 测试字符串中的匹配。它返回匹配的索引,如果搜索失败,则返回 -1
replace() 在字符串中执行匹配搜索,并用替换子字符串替换匹配的子字符串。
replaceAll() 搜索字符串中的所有匹配项,并用替换子字符串替换匹配的子字符串。
split() 使用正则表达式或固定字符串将字符串分解为子字符串数组。

当你想知道是否在字符串中找到某个模式时,请使用 test()search() 方法;要了解更多信息(但执行速度较慢),请使用 exec()match() 方法。如果你使用 exec()match() 并且匹配成功,这些方法将返回一个数组并更新关联正则表达式对象以及预定义正则表达式对象 RegExp 的属性。如果匹配失败,exec() 方法将返回 null(强制转换为 false)。

¥When you want to know whether a pattern is found in a string, use the test() or search() methods; for more information (but slower execution) use the exec() or match() methods. If you use exec() or match() and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp. If the match fails, the exec() method returns null (which coerces to false).

在以下示例中,脚本使用 exec() 方法在字符串中查找匹配项。

¥In the following example, the script uses the exec() method to find a match in a string.

js
const myRe = /d(b+)d/g;
const myArray = myRe.exec("cdbbdbsbz");

如果你不需要访问正则表达式的属性,则创建 myArray 的另一种方法是使用以下脚本:

¥If you do not need to access the properties of the regular expression, an alternative way of creating myArray is with this script:

js
const myArray = /d(b+)d/g.exec("cdbbdbsbz");
// similar to 'cdbbdbsbz'.match(/d(b+)d/g); however,
// 'cdbbdbsbz'.match(/d(b+)d/g) outputs [ "dbbd" ]
// while /d(b+)d/g.exec('cdbbdbsbz') outputs [ 'dbbd', 'bb', index: 1, input: 'cdbbdbsbz' ]

(有关不同行为的更多信息,请参阅 将全局搜索标志与 exec() 一起使用。)

¥(See Using the global search flag with exec() for further info about the different behaviors.)

如果你想从字符串构造正则表达式,还有另一种选择是此脚本:

¥If you want to construct the regular expression from a string, yet another alternative is this script:

js
const myRe = new RegExp("d(b+)d", "g");
const myArray = myRe.exec("cdbbdbsbz");

使用这些脚本,匹配成功并返回数组并更新下表中显示的属性。

¥With these scripts, the match succeeds and returns the array and updates the properties shown in the following table.

Results of regular expression execution.
对象 属性或索引 描述 在这个例子中
myArray 匹配的字符串和所有记住的子字符串。 ['dbbd', 'bb', index: 1, input: 'cdbbdbsbz']
index 输入字符串中匹配项的从 0 开始的索引。 XSPACE1
input 原来的字符串。 'cdbbdbsbz'
[0] 最后匹配的字符。 'dbbd'
myRe lastIndex 下一场比赛开始的索引。(仅当正则表达式使用 g 选项时才设置此属性,如 使用标志进行高级搜索 中所述。) XSPACE5
source 图案的文本。在创建正则表达式时更新,而不是执行时更新。 'd(b+)d'

如本示例的第二种形式所示,你可以使用通过对象初始值设定项创建的正则表达式,而不将其分配给变量。但是,如果这样做,则每次出现都是一个新的正则表达式。因此,如果你使用此形式而不将其分配给变量,则随后无法访问该正则表达式的属性。例如,假设你有以下脚本:

¥As shown in the second form of this example, you can use a regular expression created with an object initializer without assigning it to a variable. If you do, however, every occurrence is a new regular expression. For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:

js
const myRe = /d(b+)d/g;
const myArray = myRe.exec("cdbbdbsbz");
console.log(`The value of lastIndex is ${myRe.lastIndex}`);

// "The value of lastIndex is 5"

但是,如果你有这个脚本:

¥However, if you have this script:

js
const myArray = /d(b+)d/g.exec("cdbbdbsbz");
console.log(`The value of lastIndex is ${/d(b+)d/g.lastIndex}`);

// "The value of lastIndex is 0"

两条语句中出现的 /d(b+)d/g 是不同的正则表达式对象,因此其 lastIndex 属性具有不同的值。如果需要访问使用对象初始值设定项创建的正则表达式的属性,则应首先将其分配给变量。

¥The occurrences of /d(b+)d/g in the two statements are different regular expression objects and hence have different values for their lastIndex property. If you need to access the properties of a regular expression created with an object initializer, you should first assign it to a variable.

使用标志进行高级搜索

¥Advanced searching with flags

正则表达式具有可选标志,可实现全局搜索和不区分大小写搜索等功能。这些标志可以单独使用,也可以按任意顺序一起使用,并且作为正则表达式的一部分包含在内。

¥Regular expressions have optional flags that allow for functionality like global searching and case-insensitive searching. These flags can be used separately or together in any order, and are included as part of the regular expression.

标志 描述 对应属性
d 生成子字符串匹配的索引。 hasIndices
g 全球搜索。 global
i 不区分大小写的搜索。 ignoreCase
m 允许 ^$ 匹配换行符旁边。 multiline
s 允许 . 匹配换行符。 dotAll
u "统一码";将模式视为 Unicode 代码点序列。 unicode
v 升级到 u 模式,具有更多 Unicode 功能。 unicodeSets
y 执行从目标字符串中的当前位置开始匹配的 "sticky" 搜索。 sticky

要在正则表达式中包含标志,请使用以下语法:

¥To include a flag with the regular expression, use this syntax:

js
const re = /pattern/flags;

or

js
const re = new RegExp("pattern", "flags");

请注意,标志是正则表达式的组成部分。以后无法添加或删除它们。

¥Note that the flags are an integral part of a regular expression. They cannot be added or removed later.

例如,re = /\w+\s/g 创建一个正则表达式,用于查找一个或多个后跟空格的字符,并在整个字符串中查找这一组合。

¥For example, re = /\w+\s/g creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.

js
const re = /\w+\s/g;
const str = "fee fi fo fum";
const myArray = str.match(re);
console.log(myArray);

// ["fee ", "fi ", "fo "]

你可以替换该行:

¥You could replace the line:

js
const re = /\w+\s/g;

和:

¥with:

js
const re = new RegExp("\\w+\\s", "g");

并得到相同的结果。

¥and get the same result.

m 标志用于指定多行输入字符串应被视为多行。如果使用 m 标志,则 ^$ 在输入字符串内任何行的开头或结尾处匹配,而不是在整个字符串的开头或结尾处匹配。

¥The m flag is used to specify that a multiline input string should be treated as multiple lines. If the m flag is used, ^ and $ match at the start or end of any line within the input string instead of the start or end of the entire string.

将全局搜索标志与 exec() 一起使用

¥Using the global search flag with exec()

带有 g 标志的 RegExp.prototype.exec() 方法迭代地返回每个匹配项及其位置。

¥RegExp.prototype.exec() method with the g flag returns each match and its position iteratively.

js
const str = "fee fi fo fum";
const re = /\w+\s/g;

console.log(re.exec(str)); // ["fee ", index: 0, input: "fee fi fo fum"]
console.log(re.exec(str)); // ["fi ", index: 4, input: "fee fi fo fum"]
console.log(re.exec(str)); // ["fo ", index: 7, input: "fee fi fo fum"]
console.log(re.exec(str)); // null

相反,String.prototype.match() 方法立即返回所有匹配项,但不包含它们的位置。

¥In contrast, String.prototype.match() method returns all matches at once, but without their position.

js
console.log(str.match(re)); // ["fee ", "fi ", "fo "]

使用 unicode 正则表达式

¥Using unicode regular expressions

u 标志用于创建 "unicode" 正则表达式;即支持与 unicode 文本匹配的正则表达式。在 unicode 模式下启用的一个重要功能是 Unicode 属性转义。例如,以下正则表达式可用于匹配任意 unicode "word":

¥The u flag is used to create "unicode" regular expressions; that is, regular expressions which support matching against unicode text. An important feature that's enabled in unicode mode is Unicode property escapes. For example, the following regular expression might be used to match against an arbitrary unicode "word":

js
/\p{L}*/u;

Unicode 正则表达式也有不同的执行行为。RegExp.prototype.unicode 包含对此的更多解释。

¥Unicode regular expressions have different execution behavior as well. RegExp.prototype.unicode contains more explanation about this.

示例

¥Examples

注意:还提供了几个示例:

¥Note: Several examples are also available in:

使用特殊字符来验证输入

¥Using special characters to verify input

在以下示例中,用户需要输入调用号码。当用户按下 "查看" 按钮时,脚本会检查该号码的有效性。如果该号码有效(与正则表达式指定的字符序列匹配),脚本将显示一条消息,感谢用户并确认该号码。如果该号码无效,脚本会通知用户该调用号码无效。

¥In the following example, the user is expected to enter a phone number. When the user presses the "Check" button, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script shows a message thanking the user and confirming the number. If the number is invalid, the script informs the user that the phone number is not valid.

正则表达式查找:

¥The regular expression looks for:

  1. 数据行的开头:^
  2. 后跟三个数字字符 \d{3} OR | 左括号 \(,后跟三个数字 \d{3},后跟右括号 \),位于非捕获组 (?:)
  3. 捕获组 () 中后跟一个破折号、正斜杠或小数点
  4. 后面跟着三位数字 \d{3}
  5. 接下来是(第一个)捕获组 \1 中记住的比赛
  6. 后面跟着四位数字 \d{4}
  7. 接下来是数据行的末尾:$

HTML

html
<p>
  Enter your phone number (with area code) and then click "Check".
  <br />
  The expected format is like ###-###-####.
</p>
<form id="form">
  <input id="phone" />
  <button type="submit">Check</button>
</form>
<p id="output"></p>

JavaScript

js
const form = document.querySelector("#form");
const input = document.querySelector("#phone");
const output = document.querySelector("#output");

const re = /^(?:\d{3}|\(\d{3}\))([-/.])\d{3}\1\d{4}$/;

function testInfo(phoneInput) {
  const ok = re.exec(phoneInput.value);

  output.textContent = ok
    ? `Thanks, your phone number is ${ok[0]}`
    : `${phoneInput.value} isn't a phone number with area code!`;
}

form.addEventListener("submit", (event) => {
  event.preventDefault();
  testInfo(input);
});

结果

¥Result

工具

¥Tools

RegExr

学习、构建和测试正则表达式的在线工具。

正则表达式测试器

在线正则表达式生成器/调试器

正则表达式交互式教程

在线互动教程、备忘单和游乐场。

正则表达式可视化工具

在线视觉正则表达式测试器。