文字字符:a, b

文字字符准确指定其自身要在输入文本中匹配。

¥A literal character specifies exactly itself to be matched in the input text.

语法

¥Syntax

regex
c

参数

¥Parameters

c

不属于下述语法字符之一的单个字符。

描述

¥Description

在正则表达式中,大多数字符都可以按字面意思出现。它们通常是模式最基本的构建块。例如,以下是 删除 HTML 标签 示例中的模式:

¥In regular expressions, most characters can appear literally. They are usually the most basic building blocks of patterns. For example, here is a pattern from the Removing HTML tags example:

js
const pattern = /<.+?>/g;

在此示例中,.+? 称为语法字符。它们在正则表达式中具有特殊含义。模式中的其余字符(<>)是文字字符。它们在输入文本中进行自我匹配:左尖括号和右尖括号。

¥In this example, ., +, and ? are called syntax characters. They have special meanings in regular expressions. The rest of the characters in the pattern (< and >) are literal characters. They match themselves in the input text: the left and right angle brackets.

以下字符是正则表达式中的语法字符,它们不能作为文字字符出现:

¥The following characters are syntax characters in regular expressions, and they cannot appear as literal characters:

在字符类中,可以按字面意思显示更多字符。有关详细信息,请参阅 字符类 页。例如,\.[.] 都匹配文字 .。然而,在 v-模式字符类 中,保留了一组不同的字符作为语法字符。为了最全面,下面是 ASCII 字符表,以及它们在不同上下文中是否可能出现转义或未转义,其中 "✅" 表示该字符代表其自身,"❌" 表示它抛出语法错误,"⚠️" 表示该字符有效但表示某些内容 除了它自己之外。

¥Within character classes, more characters can appear literally. For more information, see the Character class page. For example \. and [.] both match a literal .. In v-mode character classes, however, there are a different set of characters reserved as syntax characters. To be most comprehensive, below is a table of ASCII characters and whether they may appear escaped or unescaped in different contexts, where "✅" means the character represents itself, "❌" means it throws a syntax error, and "⚠️" means the character is valid but means something other than itself.

人物 uv 模式下的外部字符类 u 模式字符类中 v 模式字符类中
未转义 逃脱 未转义 逃脱 未转义 逃脱
XSPACE123456789 "'
ACEFGHIJKLMN
OPQRTUVXYZ_
aceghijklmop
quxyz
!#%&,:;<=>@`~
]
()[{}
*+?
/
XSPACE0DSWbdfnrstvw ⚠️ ⚠️ ⚠️
B ⚠️
$. ⚠️
| ⚠️
- ✅⚠️ ❌⚠️
^ ⚠️ ✅⚠️ ✅⚠️
<代码>\ ❌⚠️ ❌⚠️ ❌⚠️

注意:在 v 模式字符类中可以转义和不转义的字符正是 "双标点符号" 所禁止的字符。请参阅 v-模式字符类 了解更多信息。

¥Note: The characters that can both be escaped and unescaped in v-mode character classes are exactly those forbidden as "double punctuators". See v-mode character classes for more information.

每当你想要字面匹配语法字符时,你需要用反斜杠(\)对其进行 escape。例如,要匹配模式中的文字 *,你需要在模式中写入 \*。使用语法字符作为文字字符会导致意外结果或导致语法错误 - 例如,/*/ 不是有效的正则表达式,因为量词前面没有模式。如果无法将 Unicode 不识别模式]{} 解析为字符类或量词分隔符的结尾,则它们可能按字面意思出现。这是 已弃用的 Web 兼容性语法,你不应该依赖它。

¥Whenever you want to match a syntax character literally, you need to escape it with a backslash (\). For example, to match a literal * in a pattern, you need to write \* in the pattern. Using syntax characters as literal characters either leads to unexpected results or causes syntax errors — for example, /*/ is not a valid regular expression because the quantifier is not preceded by a pattern. In Unicode-unaware mode, ], {, and } may appear literally if it's not possible to parse them as the end of a character class or quantifier delimiters. This is a deprecated syntax for web compatibility, and you should not rely on it.

正则表达式文字不能用某些非语法文字字符指定。/ 不能作为正则表达式文字中的文字字符出现,因为 / 用作文字本身的分隔符。如果你想匹配文字 /,则需要将其转义为 \/。行终止符也不能在正则表达式文字中显示为文字字符,因为文字不能跨越多行。你需要使用像 \n 这样的 字符转义。使用 RegExp() 构造函数时没有这样的限制,尽管字符串文字有自己的转义规则(例如,"\\" 实际上表示单个反斜杠字符,因此 new RegExp("\\*")/\*/ 是等效的)。

¥Regular expression literals cannot be specified with certain non-syntax literal characters. / cannot appear as a literal character in a regular expression literal, because / is used as the delimiter of the literal itself. You need to escape it as \/ if you want to match a literal /. Line terminators cannot appear as literal characters in a regular expression literal either, because a literal cannot span multiple lines. You need to use a character escape like \n instead. There are no such restrictions when using the RegExp() constructor, although string literals have their own escaping rules (for example, "\\" actually denotes a single backslash character, so new RegExp("\\*") and /\*/ are equivalent).

Unicode 不识别模式 中,该模式被解释为 UTF-16 代码单元 的序列。这意味着代理对实际上代表两个文字字符。与其他功能配合使用时,这会导致意外行为:

¥In Unicode-unaware mode, the pattern is interpreted as a sequence of UTF-16 code units. This means surrogate pairs actually represent two literal characters. This causes unexpected behaviors when paired with other features:

js
/^[😄]$/.test("😄"); // false, because the pattern is interpreted as /^[\ud83d\udc04]$/
/^😄+$/.test("😄😄"); // false, because the pattern is interpreted as /^\ud83d\udc04+$/

在 Unicode 感知模式下,模式被解释为 Unicode 代码点序列,并且代理对不会被拆分。因此,你应该始终优先使用 u 标志。

¥In Unicode-aware mode, the pattern is interpreted as a sequence of Unicode code points, and surrogate pairs do not get split. Therefore, you should always prefer to use the u flag.

示例

¥Examples

使用文字字符

¥Using literal characters

下面的例子是从 字符转义 复制过来的。ab 字符是模式中的文字字符,而 \n 是转义字符,因为它不能按字面意思出现在正则表达式文字中。

¥The following example is copied from Character escape. The a and b characters are literal characters in the pattern, and \n is an escaped character because it cannot appear literally in a regular expression literal.

js
const pattern = /a\nb/;
const string = `a
b`;
console.log(pattern.test(string)); // true

规范

Specification
ECMAScript Language Specification
# prod-PatternCharacter

¥Specifications

浏览器兼容性

BCD tables only load in the browser

¥Browser compatibility

也可以看看