词汇语法
本页描述了 JavaScript 的词法语法。JavaScript 源文本只是一个字符序列 - 为了让解释器理解它,必须将字符串解析为更结构化的表示形式。解析的初始步骤称为 词法分析,其中从左到右扫描文本并将其转换为一系列单独的原子输入元素。一些输入元素对解释器来说无关紧要,在此步骤之后将被剥离 - 它们包括 空白 和 comments。其他的,包括 identifiers、keywords、literals 和标点符号(主要是 operators),将用于进一步的语法分析。线路终结器 和多行注释在语法上也无关紧要,但它们指导 自动插入分号 的过程,使某些无效的标记序列变得有效。
¥This page describes JavaScript's lexical grammar. JavaScript source text is just a sequence of characters — in order for the interpreter to understand it, the string has to be parsed to a more structured representation. The initial step of parsing is called lexical analysis, in which the text gets scanned from left to right and is converted into a sequence of individual, atomic input elements. Some input elements are insignificant to the interpreter, and will be stripped after this step — they include white space and comments. The others, including identifiers, keywords, literals, and punctuators (mostly operators), will be used for further syntax analysis. Line terminators and multiline comments are also syntactically insignificant, but they guide the process for automatic semicolons insertion to make certain invalid token sequences become valid.
格式控制字符
¥Format-control characters
格式控制字符没有视觉表示,但用于控制文本的解释。
¥Format-control characters have no visual representation but are used to control the interpretation of the text.
代码点 | 名称 | 缩写 | 描述 |
---|---|---|---|
U+200C | 零宽度非连接器 | <ZWNJ> | 放置在字符之间,以防止连接成某些语言的连字 (维基百科)。 |
U+200D | 零宽度连接器 | <ZWJ> | 放置在通常不会连接的字符之间,以便使字符在某些语言中使用其连接形式呈现(维基百科)。 |
U+FEFF | 字节顺序标记 | <BOM> | 在脚本开头使用,将其标记为 Unicode 和文本的字节顺序 (维基百科)。 |
在 JavaScript 源文本中, <ZWNJ> 和 <ZWJ> 被视为 identifier 部分,而 <BOM> (也称为 零宽度不间断空格 <ZWNBSP> 当不在文本开头时)被视为 空白。
¥In JavaScript source text, <ZWNJ> and <ZWJ> are treated as identifier parts, while <BOM> (also called a zero-width no-break space <ZWNBSP> when not at the start of text) is treated as white space.
空白区域
¥White space
空白区域 个字符提高了源文本的可读性并将标记彼此分开。这些字符对于代码的功能通常是不必要的。缩小工具 通常用于删除空格,以减少需要传输的数据量。
¥White space characters improve the readability of source text and separate tokens from each other. These characters are usually unnecessary for the functionality of the code. Minification tools are often used to remove whitespace in order to reduce the amount of data that needs to be transferred.
代码点 | 名称 | 缩写 | 描述 | 转义序列 |
---|---|---|---|---|
U+0009 | 字符列表 | <TAB> | 水平制表 | \t |
U+000B | 线路制表 | <VT> | 垂直制表 | \v |
U+000C | 换页 | <FF> | 分页控制字符 (维基百科)。 | \F |
U+0020 | 空间 | <SP> | 普通空间 | |
U+00A0 | 无间断空间 | <NBSP> | 正常空间,但没有可能断线的点 | |
U+FEFF | 零宽度不间断空间 | <ZWNBSP> | 当不在脚本开头时,BOM 标记是普通的空白字符。 | |
其他的 | 其他 Unicode 空格字符 | <USP> | "Space_Separator" 一般类别中的字符 |
注意:其中 具有 "White_Space" 属性但不属于 "Space_Separator" 一般类别的字符、U+0009、U+000B 和 U+000C 在 JavaScript 中仍然被视为空格;U+0085 NEXT LINE 无特殊作用;其他的成为 行终止符 的集合。
¥Note: Of those characters with the "White_Space" property but are not in the "Space_Separator" general category, U+0009, U+000B, and U+000C are still treated as white space in JavaScript; U+0085 NEXT LINE has no special role; others become the set of line terminators.
注意:JavaScript 引擎使用的 Unicode 标准的更改可能会影响程序的行为。例如,ES2016 将参考 Unicode 标准从 5.1 升级到 8.0.0,导致 U+180E 蒙古元音分隔符从 "Space_Separator" 类别移至 "格式(比照)" 类别,并使其成为非空白。随后,
"\u180E".trim().length
的结果从0
变成了1
。¥Note: Changes to the Unicode standard used by the JavaScript engine may affect programs' behavior. For example, ES2016 upgraded the reference Unicode standard from 5.1 to 8.0.0, which caused U+180E MONGOLIAN VOWEL SEPARATOR to be moved from the "Space_Separator" category to the "Format (Cf)" category, and made it a non-whitespace. Subsequently, the result of
"\u180E".trim().length
changed from0
to1
.
线路终结器
¥Line terminators
除了 空白 字符之外,还使用行终止符来提高源文本的可读性。然而,在某些情况下,行终止符可能会影响 JavaScript 代码的执行,因为有一些地方是禁止使用行终止符的。行终止符也会影响 自动插入分号 的进程。
¥In addition to white space characters, line terminator characters are used to improve the readability of the source text. However, in some cases, line terminators can influence the execution of JavaScript code as there are a few places where they are forbidden. Line terminators also affect the process of automatic semicolon insertion.
在词汇语法的上下文之外,空格和行终止符经常被混为一谈。例如,String.prototype.trim()
删除字符串开头和结尾的所有空格和行终止符。正则表达式中的 \s
字符类转义 匹配所有空格和行终止符。
¥Outside the context of lexical grammar, white space and line terminators are often conflated. For example, String.prototype.trim()
removes all white space and line terminators from the beginning and end of a string. The \s
character class escape in regular expressions matches all white space and line terminators.
在 ECMAScript 中,只有以下 Unicode 代码点被视为行终止符,其他换行符被视为空格(例如,Next Line、NEL、U+0085 被视为空格)。
¥Only the following Unicode code points are treated as line terminators in ECMAScript, other line breaking characters are treated as white space (for example, Next Line, NEL, U+0085 is considered as white space).
评论
¥Comments
注释用于向 JavaScript 代码添加提示、注释、建议或警告。这可以使它更容易阅读和理解。它们还可以用于禁用代码以防止其执行;这可能是一个有价值的调试工具。
¥Comments are used to add hints, notes, suggestions, or warnings to JavaScript code. This can make it easier to read and understand. They can also be used to disable code to prevent it from being executed; this can be a valuable debugging tool.
JavaScript 有两种长期存在的方法来向代码添加注释:行注释和块注释。此外,还有一种特殊的 hashbang 注释语法。
¥JavaScript has two long-standing ways to add comments to code: line comments and block comments. In addition, there's a special hashbang comment syntax.
线路评论
阻止评论
¥Block comments
第二种方式是 /* */
风格,更加灵活。
¥The second way is the /* */
style, which is much more flexible.
例如,你可以在一行上使用它:
¥For example, you can use it on a single line:
function comment() {
/* This is a one line JavaScript comment */
console.log("Hello world!");
}
comment();
你还可以进行多行注释,如下所示:
¥You can also make multiple-line comments, like this:
function comment() {
/* This comment spans multiple lines. Notice
that we don't need to end the comment until we're done. */
console.log("Hello world!");
}
comment();
如果你愿意,你也可以在行的中间使用它,尽管这会使你的代码更难以阅读,因此应谨慎使用:
¥You can also use it in the middle of a line, if you wish, although this can make your code harder to read so it should be used with caution:
function comment(x) {
console.log("Hello " + x /* insert the value of x */ + " !");
}
comment("world");
此外,你可以使用它来禁用代码以防止其运行,方法是将代码封装在注释中,如下所示:
¥In addition, you can use it to disable code to prevent it from running, by wrapping code in a comment, like this:
function comment() {
/* console.log("Hello world!"); */
}
comment();
在这种情况下,永远不会发出 console.log()
调用,因为它位于注释内。可以通过这种方式禁用任意数量的代码行。
¥In this case, the console.log()
call is never issued, since it's inside a comment. Any number of lines of code can be disabled this way.
至少包含一个行终止符的块注释的行为类似于 自动插入分号 中的 行终止符。
¥Block comments that contain at least one line terminator behave like line terminators in automatic semicolon insertion.
哈希邦评论
¥Hashbang comments
有一种特殊的第三种注释语法,即 hashbang 注释。hashbang 注释的行为与单行 (//
) 注释完全相同,只是它以 #!
开头,并且仅在脚本或模块的绝对开始处有效。另请注意,#!
之前不允许有任何类型的空格。注释由 #!
之后到第一行末尾的所有字符组成;只允许一条这样的评论。
¥There's a special third comment syntax, the hashbang comment. A hashbang comment behaves exactly like a single line-only (//
) comment, except that it begins with #!
and is only valid at the absolute start of a script or module. Note also that no whitespace of any kind is permitted before the #!
. The comment consists of all the characters after #!
up to the end of the first line; only one such comment is permitted.
JavaScript 中的 Hashbang 注释类似于 Unix 中的 shebang,它提供了你要用来执行脚本的特定 JavaScript 解释器的路径。在 hashbang 评论成为标准化之前,它实际上已经在 Node.js 等非浏览器主机中实现,在传递到引擎之前它从源文本中剥离。示例如下:
¥Hashbang comments in JavaScript resemble shebangs in Unix which provide the path to a specific JavaScript interpreter that you want to use to execute the script. Before the hashbang comment became standardized, it had already been de-facto implemented in non-browser hosts like Node.js, where it was stripped from the source text before being passed to the engine. An example is as follows:
#!/usr/bin/env node
console.log("Hello world");
JavaScript 解释器会将其视为普通注释 - 如果脚本直接在 shell 中运行,则它仅对 shell 具有语义意义。
¥The JavaScript interpreter will treat it as a normal comment — it only has semantic meaning to the shell if the script is directly run in a shell.
警告:如果你希望脚本可以直接在 shell 环境中运行,请使用 UTF-8 对其进行编码,不带 BOM。尽管 BOM 不会对浏览器中运行的代码造成任何问题(因为在分析源文本之前,它会在 UTF-8 解码过程中被剥离),但如果 hashbang 前面有 BOM 字符,Unix/Linux shell 将无法识别它。
¥Warning: If you want scripts to be runnable directly in a shell environment, encode them in UTF-8 without a BOM. Although a BOM will not cause any problems for code running in a browser — because it's stripped during UTF-8 decoding, before the source text is analyzed — a Unix/Linux shell will not recognize the hashbang if it's preceded by a BOM character.
你只能使用 #!
注释样式来指定 JavaScript 解释器。在所有其他情况下,只需使用 //
注释(或多行注释)。
¥You must only use the #!
comment style to specify a JavaScript interpreter. In all other cases just use a //
comment (or multiline comment).
身份标识
¥Identifiers
标识符用于将值与名称链接起来。标识符可以用在很多地方:
¥An identifier is used to link a value with a name. Identifiers can be used in various places:
const decl = 1; // Variable declaration (may also be `let` or `var`)
function fn() {} // Function declaration
const obj = { key: "value" }; // Object keys
// Class declaration
class C {
#priv = "value"; // Private property
}
lbl: console.log(1); // Label
在 JavaScript 中,标识符通常由字母数字字符、下划线 (_
) 和美元符号 ($
) 组成。标识符不允许以数字开头。然而,JavaScript 标识符不仅限于 ASCII — 还允许许多 Unicode 代码点。即,ID_Start 类别中的任何字符都可以开始标识符,而 ID_Continue 类别中的任何字符可以出现在第一个字符之后。
¥In JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_
), and dollar signs ($
). Identifiers are not allowed to start with numbers. However, JavaScript identifiers are not only limited to ASCII — many Unicode code points are allowed as well. Namely, any character in the ID_Start category can start an identifier, while any character in the ID_Continue category can appear after the first character.
注意:如果出于某种原因,你需要自己解析某些 JavaScript 源代码,请不要假设所有标识符都遵循模式
/[A-Za-z_$][\w$]*/
(即仅限 ASCII)!标识符的范围可以通过正则表达式/[$_\p{ID_Start}][$\u200c\u200d\p{ID_Continue}]*/u
来描述(不包括 unicode 转义序列)。¥Note: If, for some reason, you need to parse some JavaScript source yourself, do not assume all identifiers follow the pattern
/[A-Za-z_$][\w$]*/
(i.e. ASCII-only)! The range of identifiers can be described by the regex/[$_\p{ID_Start}][$\u200c\u200d\p{ID_Continue}]*/u
(excluding unicode escape sequences).
此外,JavaScript 允许在标识符中以 \u0000
或 \u{000000}
的形式使用 Unicode 转义序列,它们编码与实际 Unicode 字符相同的字符串值。例如,你好
和 \u4f60\u597d
是相同的标识符:
¥In addition, JavaScript allows using Unicode escape sequences in the form of \u0000
or \u{000000}
in identifiers, which encode the same string value as the actual Unicode characters. For example, 你好
and \u4f60\u597d
are the same identifiers:
const 你好 = "Hello";
console.log(\u4f60\u597d); // Hello
并非所有地方都接受完整范围的标识符。某些语法(例如函数声明、函数表达式和变量声明)要求使用非 保留字 的标识符名称。
¥Not all places accept the full range of identifiers. Certain syntaxes, such as function declarations, function expressions, and variable declarations require using identifiers names that are not reserved words.
function import() {} // Illegal: import is a reserved word.
最值得注意的是,私有属性和对象属性允许保留字。
¥Most notably, private properties and object properties allow reserved words.
const obj = { import: "value" }; // Legal despite `import` being reserved
class C {
#import = "value";
}
关键词
¥Keywords
关键字是看起来像标识符但在 JavaScript 中具有特殊含义的标记。例如,函数声明前的关键字 async
表示该函数是异步的。
¥Keywords are tokens that look like identifiers but have special meanings in JavaScript. For example, the keyword async
before a function declaration indicates that the function is asynchronous.
有些关键字是保留的,这意味着它们不能用作变量声明、函数声明等的标识符。它们通常称为保留字。下面提供了 这些保留字的列表。并非所有关键字都是保留的 - 例如,async
可以在任何地方用作标识符。有些关键字仅在上下文中保留 - 例如,await
仅在异步函数体内保留,而 let
仅在严格模式代码或 const
和 let
声明中保留。
¥Some keywords are reserved, meaning that they cannot be used as an identifier for variable declarations, function declarations, etc. They are often called reserved words. A list of these reserved words is provided below. Not all keywords are reserved — for example, async
can be used as an identifier anywhere. Some keywords are only contextually reserved — for example, await
is only reserved within the body of an async function, and let
is only reserved in strict mode code, or const
and let
declarations.
标识符始终按字符串值进行比较,因此转义序列会被解释。例如,这仍然是一个语法错误:
¥Identifiers are always compared by string value, so escape sequences are interpreted. For example, this is still a syntax error:
const els\u{65} = 1;
// `els\u{65}` encodes the same identifier as `else`
保留字
¥Reserved words
这些关键字不能在 JavaScript 源代码中的任何地方用作变量、函数、类等的标识符。
¥These keywords cannot be used as identifiers for variables, functions, classes, etc. anywhere in JavaScript source.
break
case
catch
class
const
continue
debugger
default
delete
do
else
export
extends
false
finally
for
function
if
import
in
instanceof
new
null
return
super
switch
this
throw
true
try
typeof
var
void
while
with
以下仅当在严格模式代码中找到时才保留:
¥The following are only reserved when they are found in strict mode code:
仅当在模块代码或异步函数体中找到以下内容时才保留它们:
¥The following are only reserved when they are found in module code or async function bodies:
未来保留字
¥Future reserved words
以下内容被 ECMAScript 规范保留为未来的关键字。它们目前没有特殊功能,但将来可能会具有特殊功能,因此它们不能用作标识符。
¥The following are reserved as future keywords by the ECMAScript specification. They have no special functionality at present, but they might at some future time, so they cannot be used as identifiers.
这些始终是保留的:
¥These are always reserved:
enum
以下仅当在严格模式代码中找到时才保留:
¥The following are only reserved when they are found in strict mode code:
implements
interface
package
private
protected
public
旧标准中的未来保留字
¥Future reserved words in older standards
以下内容被较旧的 ECMAScript 规范(ECMAScript 1 至 3)保留为未来关键字。
¥The following are reserved as future keywords by older ECMAScript specifications (ECMAScript 1 till 3).
abstract
boolean
byte
char
double
final
float
goto
int
long
native
short
synchronized
throws
transient
volatile
具有特殊含义的标识符
¥Identifiers with special meanings
一些标识符在某些上下文中具有特殊含义,而不是任何类型的保留字。他们包括:
¥A few identifiers have a special meaning in some contexts without being reserved words of any kind. They include:
arguments
(不是关键字,但不能在严格模式下声明为标识符)as
(import * as ns from "mod"
)async
eval
(不是关键字,但不能在严格模式下声明为标识符)from
(import x from "mod"
)get
of
set
文字
¥Literals
注意:本节讨论原子标记的文字。对象字面量 和 数组文字 是由一系列令牌组成的 expressions。
¥Note: This section discusses literals that are atomic tokens. Object literals and array literals are expressions that consist of a series of tokens.
空文字
布尔文字
数字文字
¥Numeric literals
¥The Number and BigInt types use numeric literals.
十进制
¥Decimal
1234567890
42
十进制文字可以以零 (0
) 开头,后跟另一个十进制数字,但如果前导 0
之后的所有数字都小于 8,则该数字将被解释为八进制数。这被认为是旧语法,以 0
为前缀的数字文字,无论解释为八进制还是十进制,都会导致 严格模式 出现语法错误 - 因此,请改用 0o
前缀。
¥Decimal literals can start with a zero (0
) followed by another decimal digit, but if all digits after the leading 0
are smaller than 8, the number is interpreted as an octal number. This is considered a legacy syntax, and number literals prefixed with 0
, whether interpreted as octal or decimal, cause a syntax error in strict mode — so, use the 0o
prefix instead.
0888 // 888 parsed as decimal
0777 // parsed as octal, 511 in decimal
指数
¥Exponential
十进制指数文字由以下格式指定:beN
;其中 b
是基数(整数或浮点数),后跟 E
或 e
字符(用作分隔符或指数指示符)和 N
,它是指数或幂数 - 有符号整数。
¥The decimal exponential literal is specified by the following format: beN
; where b
is a base number (integer or floating), followed by an E
or e
character (which serves as separator or exponent indicator) and N
, which is exponent or power number – a signed integer.
0e-5 // 0
0e+5 // 0
5e1 // 50
175e-2 // 1.75
1e3 // 1000
1e-3 // 0.001
1E3 // 1000
二进制
¥Binary
二进制数语法使用前导零,后跟小写或大写拉丁字母 "B"(0b
或 0B
)。0b
之后的任何非 0 或 1 的字符都将终止文字序列。
¥Binary number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "B" (0b
or 0B
). Any character after the 0b
that is not 0 or 1 will terminate the literal sequence.
0b10000000000000000000000000000000 // 2147483648
0b01111111100000000000000000000000 // 2139095040
0B00000000011111111111111111111111 // 8388607
八进制
¥Octal
八进制数字语法使用前导零,后跟小写或大写拉丁字母 "O"(0o
或 0O)
)。0o
之后超出范围 (01234567) 的任何字符都将终止文字序列。
¥Octal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "O" (0o
or 0O)
. Any character after the 0o
that is outside the range (01234567) will terminate the literal sequence.
0O755 // 493
0o644 // 420
十六进制
¥Hexadecimal
十六进制数字语法使用前导零,后跟小写或大写拉丁字母 "X"(0x
或 0X
)。0x
之后超出范围 (0123456789ABCDEF) 的任何字符都将终止文字序列。
¥Hexadecimal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "X" (0x
or 0X
). Any character after the 0x
that is outside the range (0123456789ABCDEF) will terminate the literal sequence.
0xFFFFFFFFFFFFFFFFF // 295147905179352830000
0x123456789ABCDEF // 81985529216486900
0XA // 10
BigInt 字面值
¥BigInt literal
BigInt 类型是 JavaScript 中的数字基元,可以表示任意精度的整数。BigInt 文字是通过将 n
附加到整数末尾来创建的。
¥The BigInt type is a numeric primitive in JavaScript that can represent integers with arbitrary precision. BigInt literals are created by appending n
to the end of an integer.
123456789123456789n // 123456789123456789
0o777777777777n // 68719476735
0x123456789ABCDEFn // 81985529216486895
0b11101001010101010101n // 955733
BigInt 文字不能以 0
开头,以避免与旧八进制文字混淆。
¥BigInt literals cannot start with 0
to avoid confusion with legacy octal literals.
0755n; // SyntaxError: invalid BigInt syntax
对于八进制 BigInt
数字,始终使用零后跟字母 "o"(大写或小写):
¥For octal BigInt
numbers, always use zero followed by the letter "o" (uppercase or lowercase):
0o755n;
有关 BigInt
的更多信息,另请参阅 JavaScript 数据结构。
¥For more information about BigInt
, see also JavaScript data structures.
数字分隔符
¥Numeric separators
为了提高数字文字的可读性,可以使用下划线 (_
, U+005F
) 作为分隔符:
¥To improve readability for numeric literals, underscores (_
, U+005F
) can be used as separators:
1_000_000_000_000
1_050.95
0b1010_0001_1000_0101
0o2_2_5_6
0xA0_B0_C0
1_000_000_000_000_000_000_000n
请注意这些限制:
¥Note these limitations:
// More than one underscore in a row is not allowed
100__000; // SyntaxError
// Not allowed at the end of numeric literals
100_; // SyntaxError
// Can not be used after leading 0
0_1; // SyntaxError
字符串文字
¥String literals
string 文字是用单引号或双引号括起来的零个或多个 Unicode 代码点。Unicode 代码点也可以由转义序列表示。除以下代码点外,所有代码点都可能按字面意思出现在字符串文字中:
¥A string literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for these code points:
- U+005C \(反斜杠)
- U+000D <CR>
- U+000A <LF>
- 与字符串文字开头的引号相同
任何代码点都可以以转义序列的形式出现。字符串文字的计算结果为 ECMAScript 字符串值。生成这些字符串值时,Unicode 代码点采用 UTF-16 编码。
¥Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded.
'foo'
"bar"
以下小节描述了字符串文字中可用的各种转义序列(\
后跟一个或多个字符)。下面未列出的任何转义序列都会成为 "身份逃避",进而成为代码点本身。例如,\z
与 z
相同。已弃用和过时的功能 页中描述了一种已弃用的八进制转义序列语法。其中许多转义序列在正则表达式中也是有效的 — 请参阅 字符转义。
¥The following subsections describe various escape sequences (\
followed by one or more characters) available in string literals. Any escape sequence not listed below becomes an "identity escape" that becomes the code point itself. For example, \z
is the same as z
. There's a deprecated octal escape sequence syntax described in the Deprecated and obsolete features page. Many of these escape sequences are also valid in regular expressions — see Character escape.
转义序列
¥Escape sequences
特殊字符可以使用转义序列进行编码:
¥Special characters can be encoded using escape sequences:
转义序列 | Unicode 代码点 |
---|---|
\0 |
空字符 (U+0000 NULL) |
\' |
单引号 (U+0027 撇号) |
\" |
双引号 (U+0022 引号) |
\\ |
反斜杠 (U+005C REVERSE SOLIDUS) |
\n |
换行符(U+000A 换行;LF) |
\r |
回车(U+000D 回车;CR) |
\v |
垂直制表符(U+000B 行制表符) |
\t |
选项卡(U+0009 字符制表) |
\b |
退格键 (U+0008 退格键) |
\f |
换页 (U+000C 换页) |
\ 后跟 行终止符 |
空字符串 |
最后一个转义序列 \
后跟行终止符,对于将字符串文字拆分为多行而不改变其含义非常有用。
¥The last escape sequence, \
followed by a line terminator, is useful for splitting a string literal across multiple lines without changing its meaning.
const longString =
"This is a very long string which needs \
to wrap across multiple lines because \
otherwise my code is unreadable.";
确保反斜杠后面没有空格或任何其他字符(换行符除外),否则它将不起作用。如果下一行缩进,多余的空格也将出现在字符串的值中。
¥Make sure there is no space or any other character after the backslash (except for a line break), otherwise it will not work. If the next line is indented, the extra spaces will also be present in the string's value.
你还可以使用 +
运算符将多个字符串附加在一起,如下所示:
¥You can also use the +
operator to append multiple strings together, like this:
const longString =
"This is a very long string which needs " +
"to wrap across multiple lines because " +
"otherwise my code is unreadable.";
上述两种方法都会产生相同的字符串。
¥Both of the above methods result in identical strings.
十六进制转义序列
¥Hexadecimal escape sequences
十六进制转义序列由 \x
后跟两个十六进制数字组成,表示 0x0000 到 0x00FF 范围内的代码单元或代码点。
¥Hexadecimal escape sequences consist of \x
followed by exactly two hexadecimal digits representing a code unit or code point in the range 0x0000 to 0x00FF.
"\xA9"; // "©"
Unicode 转义序列
¥Unicode escape sequences
Unicode 转义序列恰好由 \u
后面的四个十六进制数字组成。它代表 UTF-16 编码中的一个代码单元。对于代码点 U+0000 到 U+FFFF,代码单元等于代码点。代码点 U+10000 到 U+10FFFF 需要两个转义序列,表示用于对字符进行编码的两个代码单元(代理对);代理对与代码点不同。
¥A Unicode escape sequence consists of exactly four hexadecimal digits following \u
. It represents a code unit in the UTF-16 encoding. For code points U+0000 to U+FFFF, the code unit is equal to the code point. Code points U+10000 to U+10FFFF require two escape sequences representing the two code units (a surrogate pair) used to encode the character; the surrogate pair is distinct from the code point.
另请参见 String.fromCharCode()
和 String.prototype.charCodeAt()
。
¥See also String.fromCharCode()
and String.prototype.charCodeAt()
.
"\u00A9"; // "©" (U+A9)
Unicode 代码点转义
¥Unicode code point escapes
Unicode 代码点转义由 \u{
、后跟十六进制代码点和 }
组成。十六进制数字的值必须在 0 和 0x10FFFF 范围内(包括 0 和 0x10FFFF)。U+10000 到 U+10FFFF 范围内的代码点不需要表示为代理对。
¥A Unicode code point escape consists of \u{
, followed by a code point in hexadecimal base, followed by }
. The value of the hexadecimal digits must be in the range 0 and 0x10FFFF inclusive. Code points in the range U+10000 to U+10FFFF do not need to be represented as a surrogate pair.
另请参见 String.fromCodePoint()
和 String.prototype.codePointAt()
。
¥See also String.fromCodePoint()
and String.prototype.codePointAt()
.
"\u{2F804}"; // CJK COMPATIBILITY IDEOGRAPH-2F804 (U+2F804)
// the same character represented as a surrogate pair
"\uD87E\uDC04";
正则表达式文字
¥Regular expression literals
正则表达式文字由两个正斜杠 (/
) 括起来。词法分析器会消耗直到下一个未转义的正斜杠或行尾的所有字符,除非正斜杠出现在字符类 ([]
) 中。某些字符(即 标识符部分 的字符)可以出现在结束斜杠之后,表示标志。
¥Regular expression literals are enclosed by two forward slashes (/
). The lexer consumes all characters up to the next unescaped forward slash or the end of the line, unless the forward slash appears within a character class ([]
). Some characters (namely, those that are identifier parts) can appear after the closing slash, denoting flags.
词汇语法非常宽松:并非所有被标识为一个标记的正则表达式文字都是有效的正则表达式。
¥The lexical grammar is very lenient: not all regular expression literals that get identified as one token are valid regular expressions.
另请参阅 RegExp
了解更多信息。
¥See also RegExp
for more information.
/ab+c/g
/[/]/
正则表达式文字不能以两个正斜杠 (//
) 开头,因为那将是行注释。要指定空正则表达式,请使用 /(?:)/
。
¥A regular expression literal cannot start with two forward slashes (//
), because that would be a line comment. To specify an empty regular expression, use /(?:)/
.
模板文字
¥Template literals
一个模板文字由多个标记组成:
xxx${(模板头)、
}xxx${(模板中)和
}xxx
(模板尾)是单独的标记,而任何表达式都可以位于它们之间。
¥One template literal consists of several tokens: `xxx${
(template head), }xxx${
(template middle), and }xxx`
(template tail) are individual tokens, while any expression may come between them.
另请参阅 模板文字 了解更多信息。
¥See also template literals for more information.
`string text`
`string text line 1
string text line 2`
`string text ${expression} string text`
tag`string text ${expression} string text`
自动插入分号
¥Automatic semicolon insertion
某些 JavaScript 语句' 语法定义需要在末尾添加分号 (;
)。他们包括:
¥Some JavaScript statements' syntax definitions require semicolons (;
) at the end. They include:
var
,let
,const
- 表达式语句
do...while
continue
,break
,return
,throw
debugger
- 类字段声明(public 或 private)
import
、export
然而,为了使语言更平易近人、更方便,JavaScript 能够在使用令牌流时自动插入分号,以便某些无效的令牌序列可以 "fixed" 为有效语法。此步骤发生在根据词法语法将程序文本解析为标记之后。自动插入分号有以下三种情况:
¥However, to make the language more approachable and convenient, JavaScript is able to automatically insert semicolons when consuming the token stream, so that some invalid token sequences can be "fixed" to valid syntax. This step happens after the program text has been parsed to tokens according to the lexical grammar. There are three cases when semicolons are automatically inserted:
{ 1
2 } 3
// is transformed by ASI into:
{ 1
;2 ;} 3;
// Which is valid grammar encoding three statements,
// each consisting of a number literal
do...while
的结尾 ")" 也被视为此规则的特殊情况。
¥The ending ")" of do...while
is taken care of as a special case by this rule as well.
do {
// ...
} while (condition) /* ; */ // ASI here
const a = 1
但是,如果分号随后将成为 for
语句头部的分隔符,则不会插入分号。
¥However, semicolons are not inserted if the semicolon would then become the separator in the for
statement's head.
for (
let a = 1 // No ASI here
a < 10 // No ASI here
a++
) {}
分号也永远不会作为 空语句 插入。例如,在下面的代码中,如果在 ")" 之后插入分号,则该代码将有效,其中 if
主体为空语句,而 const
声明为单独的语句。但是,由于自动插入的分号不能成为空语句,这会导致 declaration 成为 if
语句的主体,这是无效的。
¥Semicolons are also never inserted as empty statements. For example, in the code below, if a semicolon is inserted after ")", then the code would be valid, with an empty statement as the if
body and the const
declaration being a separate statement. However, because automatically inserted semicolons cannot become empty statements, this causes a declaration to become the body of the if
statement, which is not valid.
if (Math.random() > 0.5)
const x = 1 // SyntaxError: Unexpected token 'const'
const a = 1 /* ; */ // ASI here
该规则是对前一规则的补充,特别是针对没有 "违规令牌" 但输入流结尾的情况。
¥This rule is a complement to the previous rule, specifically for the case where there's no "offending token" but the end of input stream.
3.这些地方包括:
¥3. When the grammar forbids line terminators in some place but a line terminator is found, a semicolon is inserted. These places include:
expr <here> ++
、expr <here> --
continue <here> lbl
break <here> lbl
return <here> expr
throw <here> expr
yield <here> expr
yield <here> * expr
(param) <here> => {}
async <here> function
、async <here> prop()
、async <here> function*
、async <here> *prop()
、async <here> (param) <here> => {}
这里,++
不被视为应用于变量 b
的后缀运算符,因为行终止符出现在 b
和 ++
之间。
¥Here ++
is not treated as a postfix operator applying to variable b
, because a line terminator occurs between b
and ++
.
a = b
++c
// is transformed by ASI into
a = b;
++c;
这里,return
语句返回 undefined
,a + b
变成不可达语句。
¥Here, the return
statement returns undefined
, and the a + b
becomes an unreachable statement.
return
a + b
// is transformed by ASI into
return;
a + b;
请注意,仅当换行符分隔标记时才会触发 ASI,否则会产生无效语法。如果下一个标记可以被解析为有效结构的一部分,则不会插入分号。例如:
¥Note that ASI would only be triggered if a line break separates tokens that would otherwise produce invalid syntax. If the next token can be parsed as part of a valid structure, semicolons would not be inserted. For example:
const a = 1
(1).toString()
const b = 1
[1, 2, 3].forEach(console.log)
因为 ()
可以看作是一个函数调用,所以它通常不会触发 ASI。同样,[]
可能是会员访问。上面的代码相当于:
¥Because ()
can be seen as a function call, it would usually not trigger ASI. Similarly, []
may be a member access. The code above is equivalent to:
const a = 1(1).toString();
const b = 1[1, 2, 3].forEach(console.log);
这恰好是有效的语法。1[1, 2, 3]
是带有 comma 连接表达式的 属性访问器。因此,运行代码时会出现 "1 不是函数" 和 "无法读取未定义的属性(读取 'forEach')" 之类的错误。
¥This happens to be valid syntax. 1[1, 2, 3]
is a property accessor with a comma-joined expression. Therefore, you would get errors like "1 is not a function" and "Cannot read properties of undefined (reading 'forEach')" when running the code.
在类中,类字段和生成器方法也可能是一个陷阱。
¥Within classes, class fields and generator methods can be a pitfall as well.
class A {
a = 1
*gen() {}
}
它被视为:
¥It is seen as:
class A {
a = 1 * gen() {}
}
因此在 {
左右将会出现语法错误。
¥And therefore will be a syntax error around {
.
如果你想强制执行无分号样式,则有以下处理 ASI 的经验法则:
¥There are the following rules-of-thumb for dealing with ASI, if you want to enforce semicolon-less style:
- 将后缀
++
和--
写在与它们的操作数相同的行上。jsconst a = b ++ console.log(a) // ReferenceError: Invalid left-hand side expression in prefix operation
jsconst a = b++ console.log(a)
return
、throw
或yield
之后的表达式应与关键字在同一行。jsfunction foo() { return 1 + 1 // Returns undefined; 1 + 1 is ignored }
jsfunction foo() { return 1 + 1 } function foo() { return ( 1 + 1 ) }
- 同样,
break
或continue
之后的标签标识符应与关键字在同一行。jsouterBlock: { innerBlock: { break outerBlock // SyntaxError: Illegal break statement } }
jsouterBlock: { innerBlock: { break outerBlock } }
- 箭头函数的
=>
应与其参数末尾在同一行。jsconst foo = (a, b) => a + b
jsconst foo = (a, b) => a + b
- 异步函数、方法等的
async
后面不能直接跟行终止符。jsasync function foo() {}
jsasync function foo() {}
- 如果一行以
(
、[
、```、+
、-
、/
之一开头(如正则表达式文字),请在其前面添加分号,或者在前一行中添加分号。js// The () may be merged with the previous line as a function call (() => { // ... })() // The [ may be merged with the previous line as a property access [1, 2, 3].forEach(console.log) // The ` may be merged with the previous line as a tagged template literal `string text ${data}`.match(pattern).forEach(console.log) // The + may be merged with the previous line as a binary + expression +a.toString() // The - may be merged with the previous line as a binary - expression -a.toString() // The / may be merged with the previous line as a division expression /pattern/.exec(str).forEach(console.log)
js;(() => { // ... })() ;[1, 2, 3].forEach(console.log) ;`string text ${data}`.match(pattern).forEach(console.log) ;+a.toString() ;-a.toString() ;/pattern/.exec(str).forEach(console.log)
- 类字段最好始终以分号结尾 - 除了前面的规则(包括字段声明后跟 计算属性,因为后者以
[
开头)之外,字段声明和生成器方法之间也需要分号。jsclass A { a = 1 [b] = 2 *gen() {} // Seen as a = 1[b] = 2 * gen() {} }
jsclass A { a = 1; [b] = 2; *gen() {} }
规范
Specification |
---|
ECMAScript Language Specification |
浏览器兼容性
BCD tables only load in the browser
也可以看看
¥See also
- 语法和类型 指南
- 杰夫·沃尔登的《ES6 的微功能现已出现在 Firefox Aurora 和 Nightly 中:二进制和八进制数》(2013)
- 马蒂亚斯·拜恩斯 (Mathias Bynens) 的 JavaScript 字符转义序列 (2011)