文字字符:a, b
文字字符准确指定其自身要在输入文本中匹配。
¥A literal character specifies exactly itself to be matched in the input text.
语法
参数
描述
¥Description
在正则表达式中,大多数字符都可以按字面意思出现。它们通常是模式最基本的构建块。例如,以下是 删除 HTML 标签 示例中的模式:
¥In regular expressions, most characters can appear literally. They are usually the most basic building blocks of patterns. For example, here is a pattern from the Removing HTML tags example:
const pattern = /<.+?>/g;
在此示例中,.
、+
和 ?
称为语法字符。它们在正则表达式中具有特殊含义。模式中的其余字符(<
和 >
)是文字字符。它们在输入文本中进行自我匹配:左尖括号和右尖括号。
¥In this example, .
, +
, and ?
are called syntax characters. They have special meanings in regular expressions. The rest of the characters in the pattern (<
and >
) are literal characters. They match themselves in the input text: the left and right angle brackets.
以下字符是正则表达式中的语法字符,它们不能作为文字字符出现:
¥The following characters are syntax characters in regular expressions, and they cannot appear as literal characters:
在字符类中,可以按字面意思显示更多字符。有关详细信息,请参阅 字符类 页。例如,\.
和 [.]
都匹配文字 .
。然而,在 v
-模式字符类 中,保留了一组不同的字符作为语法字符。为了最全面,下面是 ASCII 字符表,以及它们在不同上下文中是否可能出现转义或未转义,其中 "✅" 表示该字符代表其自身,"❌" 表示它抛出语法错误,"⚠️" 表示该字符有效但表示某些内容 除了它自己之外。
¥Within character classes, more characters can appear literally. For more information, see the Character class page. For example \.
and [.]
both match a literal .
. In v
-mode character classes, however, there are a different set of characters reserved as syntax characters. To be most comprehensive, below is a table of ASCII characters and whether they may appear escaped or unescaped in different contexts, where "✅" means the character represents itself, "❌" means it throws a syntax error, and "⚠️" means the character is valid but means something other than itself.
人物 | u 或 v 模式下的外部字符类 |
在 u 模式字符类中 |
在 v 模式字符类中 |
|||
---|---|---|---|---|---|---|
未转义 | 逃脱 | 未转义 | 逃脱 | 未转义 | 逃脱 | |
XSPACE123456789 "' |
✅ | ❌ | ✅ | ❌ | ✅ | ❌ |
!#%&,:;<=>@`~ |
✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
] |
❌ | ✅ | ❌ | ✅ | ❌ | ✅ |
()[{} |
❌ | ✅ | ✅ | ✅ | ❌ | ✅ |
*+? |
❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
/ |
✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
XSPACE0DSWbdfnrstvw |
✅ | ⚠️ | ✅ | ⚠️ | ✅ | ⚠️ |
B |
✅ | ⚠️ | ✅ | ❌ | ✅ | ❌ |
$. |
⚠️ | ✅ | ✅ | ✅ | ✅ | ✅ |
| |
⚠️ | ✅ | ✅ | ✅ | ❌ | ✅ |
- |
✅ | ❌ | ✅⚠️ | ✅ | ❌⚠️ | ✅ |
^ |
⚠️ | ✅ | ✅⚠️ | ✅ | ✅⚠️ | ✅ |
<代码>\ | ❌⚠️ | ✅ | ❌⚠️ | ✅ | ❌⚠️ | ✅ |
注意:在
v
模式字符类中可以转义和不转义的字符正是 "双标点符号" 所禁止的字符。请参阅v
-模式字符类 了解更多信息。¥Note: The characters that can both be escaped and unescaped in
v
-mode character classes are exactly those forbidden as "double punctuators". Seev
-mode character classes for more information.
每当你想要字面匹配语法字符时,你需要用反斜杠(\
)对其进行 escape。例如,要匹配模式中的文字 *
,你需要在模式中写入 \*
。使用语法字符作为文字字符会导致意外结果或导致语法错误 - 例如,/*/
不是有效的正则表达式,因为量词前面没有模式。如果无法将 Unicode 不识别模式、]
、{
和 }
解析为字符类或量词分隔符的结尾,则它们可能按字面意思出现。这是 已弃用的 Web 兼容性语法,你不应该依赖它。
¥Whenever you want to match a syntax character literally, you need to escape it with a backslash (\
). For example, to match a literal *
in a pattern, you need to write \*
in the pattern. Using syntax characters as literal characters either leads to unexpected results or causes syntax errors — for example, /*/
is not a valid regular expression because the quantifier is not preceded by a pattern. In Unicode-unaware mode, ]
, {
, and }
may appear literally if it's not possible to parse them as the end of a character class or quantifier delimiters. This is a deprecated syntax for web compatibility, and you should not rely on it.
正则表达式文字不能用某些非语法文字字符指定。/
不能作为正则表达式文字中的文字字符出现,因为 /
用作文字本身的分隔符。如果你想匹配文字 /
,则需要将其转义为 \/
。行终止符也不能在正则表达式文字中显示为文字字符,因为文字不能跨越多行。你需要使用像 \n
这样的 字符转义。使用 RegExp()
构造函数时没有这样的限制,尽管字符串文字有自己的转义规则(例如,"\\"
实际上表示单个反斜杠字符,因此 new RegExp("\\*")
和 /\*/
是等效的)。
¥Regular expression literals cannot be specified with certain non-syntax literal characters. /
cannot appear as a literal character in a regular expression literal, because /
is used as the delimiter of the literal itself. You need to escape it as \/
if you want to match a literal /
. Line terminators cannot appear as literal characters in a regular expression literal either, because a literal cannot span multiple lines. You need to use a character escape like \n
instead. There are no such restrictions when using the RegExp()
constructor, although string literals have their own escaping rules (for example, "\\"
actually denotes a single backslash character, so new RegExp("\\*")
and /\*/
are equivalent).
在 Unicode 不识别模式 中,该模式被解释为 UTF-16 代码单元 的序列。这意味着代理对实际上代表两个文字字符。与其他功能配合使用时,这会导致意外行为:
¥In Unicode-unaware mode, the pattern is interpreted as a sequence of UTF-16 code units. This means surrogate pairs actually represent two literal characters. This causes unexpected behaviors when paired with other features:
/^[😄]$/.test("😄"); // false, because the pattern is interpreted as /^[\ud83d\udc04]$/
/^😄+$/.test("😄😄"); // false, because the pattern is interpreted as /^\ud83d\udc04+$/
在 Unicode 感知模式下,模式被解释为 Unicode 代码点序列,并且代理对不会被拆分。因此,你应该始终优先使用 u
标志。
¥In Unicode-aware mode, the pattern is interpreted as a sequence of Unicode code points, and surrogate pairs do not get split. Therefore, you should always prefer to use the u
flag.
示例
使用文字字符
¥Using literal characters
下面的例子是从 字符转义 复制过来的。a
和 b
字符是模式中的文字字符,而 \n
是转义字符,因为它不能按字面意思出现在正则表达式文字中。
¥The following example is copied from Character escape. The a
and b
characters are literal characters in the pattern, and \n
is an escaped character because it cannot appear literally in a regular expression literal.
const pattern = /a\nb/;
const string = `a
b`;
console.log(pattern.test(string)); // true
规范
Specification |
---|
ECMAScript Language Specification # prod-PatternCharacter |
浏览器兼容性
BCD tables only load in the browser
也可以看看
¥See also