文本格式

本章介绍如何在 JavaScript 中使用字符串和文本。

¥This chapter introduces how to work with strings and text in JavaScript.

字符串

¥Strings

JavaScript 的 字符串 类型用于表示文本数据。它是一组 "elements" 的 16 位无符号整数值(UTF-16 代码单元)。String 中的每个元素都在 String 中占据一个位置。第一个元素位于索引 0 处,下一个元素位于索引 1 处,依此类推。字符串的长度是其中元素的数量。你可以使用字符串文字或字符串对象创建字符串。

¥JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values (UTF-16 code units). Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it. You can create strings using string literals or string objects.

字符串文字

¥String literals

你可以使用单引号或双引号创建简单的字符串:

¥You can create simple strings using either single or double quotes:

js
'foo'
"bar"

可以使用转义序列创建更高级的字符串:

¥More advanced strings can be created using escape sequences:

十六进制转义序列

¥Hexadecimal escape sequences

\x 之后的数字被解释为 hexadecimal 数字。

¥The number after \x is interpreted as a hexadecimal number.

js
"\xA9" // "©"

Unicode 转义序列

¥Unicode escape sequences

Unicode 转义序列要求 \u 后至少有四个十六进制数字。

¥The Unicode escape sequences require at least four hexadecimal digits following \u.

js
"\u00A9" // "©"

Unicode 代码点转义

¥Unicode code point escapes

通过 Unicode 代码点转义,可以使用十六进制数字对任何字符进行转义,以便可以使用最高 0x10FFFF 的 Unicode 代码点。对于简单的 Unicode 转义,通常需要分别编写代理项的一半才能获得相同的结果。

¥With Unicode code point escapes, any character can be escaped using hexadecimal numbers so that it is possible to use Unicode code points up to 0x10FFFF. With simple Unicode escapes it is often necessary to write the surrogate halves separately to achieve the same result.

另请参见 String.fromCodePoint()String.prototype.codePointAt()

¥See also String.fromCodePoint() or String.prototype.codePointAt().

js
"\u{2F804}"

// the same with simple Unicode escapes
"\uD87E\uDC04"

字符串对象

¥String objects

String 对象是字符串原始数据类型的封装器。

¥The String object is a wrapper around the string primitive data type.

js
const foo = new String("foo"); // Creates a String object
console.log(foo); // [String: 'foo']
typeof foo; // 'object'

你可以对字符串文字值调用 String 对象的任何方法 - JavaScript 会自动将字符串文字转换为临时 String 对象,调用该方法,然后丢弃临时 String 对象。你还可以将 length 属性与字符串文字一起使用。

¥You can call any of the methods of the String object on a string literal value—JavaScript automatically converts the string literal to a temporary String object, calls the method, then discards the temporary String object. You can also use the length property with a string literal.

除非你特别需要使用 String 对象,否则应该使用字符串文字,因为 String 对象可能具有违反直觉的行为。例如:

¥You should use string literals unless you specifically need to use a String object, because String objects can have counterintuitive behavior. For example:

js
const firstString = "2 + 2"; // Creates a string literal value
const secondString = new String("2 + 2"); // Creates a String object
eval(firstString); // Returns the number 4
eval(secondString); // Returns a String object containing "2 + 2"

String 对象有一个属性 length,它指示字符串中 UTF-16 代码单元的数量。例如,以下代码为 helloLength 分配值 13,因为 "你好世界!" 有 13 个字符,每个字符由一个 UTF-16 代码单元表示。你可以使用数组括号样式访问每个代码单元。你无法更改单个字符,因为字符串是不可变的类似数组的对象:

¥A String object has one property, length, that indicates the number of UTF-16 code units in the string. For example, the following code assigns helloLength the value 13, because "Hello, World!" has 13 characters, each represented by one UTF-16 code unit. You can access each code unit using an array bracket style. You can't change individual characters because strings are immutable array-like objects:

js
const hello = "Hello, World!";
const helloLength = hello.length;
hello[0] = "L"; // This has no effect, because strings are immutable
hello[0]; // This returns "H"

Unicode 标量值大于 U+FFFF 的字符(例如一些罕见的中文/日文/韩文/越南文字符和一些表情符号)以 UTF-16 存储,每个字符有两个代理代码单元。例如,包含单个字符 U+1F600 "表情符号笑脸" 的字符串的长度为 2。使用方括号访问此类字符串中的各个代码单元可能会产生不良后果,例如形成具有不匹配的代理代码单元的字符串,这违反了 Unicode 标准。(修复 MDN bug 857438 后,应将示例添加到此页面。)另请参见 String.fromCodePoint()String.prototype.codePointAt()

¥Characters whose Unicode scalar values are greater than U+FFFF (such as some rare Chinese/Japanese/Korean/Vietnamese characters and some emoji) are stored in UTF-16 with two surrogate code units each. For example, a string containing the single character U+1F600 "Emoji grinning face" will have length 2. Accessing the individual code units in such a string using square brackets may have undesirable consequences such as the formation of strings with unmatched surrogate code units, in violation of the Unicode standard. (Examples should be added to this page after MDN bug 857438 is fixed.) See also String.fromCodePoint() or String.prototype.codePointAt().

String 对象有多种方法:例如那些返回字符串本身变体的字符串,例如 substringtoUpperCase

¥A String object has a variety of methods: for example those that return a variation on the string itself, such as substring and toUpperCase.

下表总结了 String 对象的方法。

¥The following table summarizes the methods of String objects.

Methods of String

方法 描述
charAt(), charCodeAt(), codePointAt() 返回字符串中指定位置的字符或字符代码。
indexOf()lastIndexOf() 分别返回指定子字符串在字符串中的位置或指定子字符串的最后一个位置。
startsWith(), endsWith(), includes() 返回字符串是否开始、结束或包含指定字符串。
concat() 组合两个字符串的文本并返回一个新字符串。
split() 通过将字符串分隔为子字符串,将 String 对象拆分为字符串数组。
slice() 提取字符串的一部分并返回一个新字符串。
substring()substr() 通过指定开始和结束索引或开始索引和长度,返回字符串的指定子集。
match()matchAll()replace()replaceAll()search() 使用正则表达式。
toLowerCase()toUpperCase()

分别返回全部小写或全部大写的字符串。

normalize() 返回调用字符串值的 Unicode 规范化形式。
repeat() 返回由重复给定次数的对象元素组成的字符串。
trim() 修剪字符串开头和结尾的空格。

多行模板文字

¥Multi-line template literals

模板文字 是允许嵌入表达式的字符串文字。你可以使用多行字符串和字符串插值功能。

¥Template literals are string literals allowing embedded expressions. You can use multi-line strings and string interpolation features with them.

模板文字由反引号 (重音) 字符 (```) 括起来,而不是用双引号或单引号括起来。模板文字可以包含占位符。这些由美元符号和大括号 (${expression}) 表示。

¥Template literals are enclosed by backtick (grave accent) characters (`) instead of double or single quotes. Template literals can contain placeholders. These are indicated by the dollar sign and curly braces (${expression}).

多线

¥Multi-lines

源中插入的任何新行字符都是模板文字的一部分。使用普通字符串,你必须使用以下语法才能获取多行字符串:

¥Any new line characters inserted in the source are part of the template literal. Using normal strings, you would have to use the following syntax in order to get multi-line strings:

js
console.log(
  "string text line 1\n\
string text line 2",
);
// "string text line 1
// string text line 2"

要使用多行字符串获得相同的效果,你现在可以编写:

¥To get the same effect with multi-line strings, you can now write:

js
console.log(`string text line 1
string text line 2`);
// "string text line 1
// string text line 2"

嵌入表达式

¥Embedded expressions

为了在普通字符串中嵌入表达式,你可以使用以下语法:

¥In order to embed expressions within normal strings, you would use the following syntax:

js
const five = 5;
const ten = 10;
console.log(
  "Fifteen is " + (five + ten) + " and not " + (2 * five + ten) + ".",
);
// "Fifteen is 15 and not 20."

现在,通过模板文字,你可以利用语法糖使像这样的替换更具可读性:

¥Now, with template literals, you are able to make use of the syntactic sugar making substitutions like this more readable:

js
const five = 5;
const ten = 10;
console.log(`Fifteen is ${five + ten} and not ${2 * five + ten}.`);
// "Fifteen is 15 and not 20."

有关详细信息,请阅读 JavaScript 参考 中的 模板文字

¥For more information, read about Template literals in the JavaScript reference.

国际化

¥Internationalization

Intl 对象是 ECMAScript 国际化 API 的命名空间,它提供语言敏感的字符串比较、数字格式以及日期和时间格式。Intl.CollatorIntl.NumberFormatIntl.DateTimeFormat 对象的构造函数是 Intl 对象的属性。

¥The Intl object is the namespace for the ECMAScript Internationalization API, which provides language sensitive string comparison, number formatting, and date and time formatting. The constructors for Intl.Collator, Intl.NumberFormat, and Intl.DateTimeFormat objects are properties of the Intl object.

日期和时间格式

¥Date and time formatting

Intl.DateTimeFormat 对象对于格式化日期和时间非常有用。以下格式为美国使用的英语日期。(不同时区的结果是不同的。)

¥The Intl.DateTimeFormat object is useful for formatting date and time. The following formats a date for English as used in the United States. (The result is different in another time zone.)

js
// July 17, 2014 00:00:00 UTC:
const july172014 = new Date("2014-07-17");

const options = {
  year: "2-digit",
  month: "2-digit",
  day: "2-digit",
  hour: "2-digit",
  minute: "2-digit",
  timeZoneName: "short",
};
const americanDateTime = new Intl.DateTimeFormat("en-US", options).format;

// Local timezone vary depending on your settings
// In CEST, logs: 07/17/14, 02:00 AM GMT+2
// In PDT, logs: 07/16/14, 05:00 PM GMT-7
console.log(americanDateTime(july172014));

数字格式

¥Number formatting

Intl.NumberFormat 对象对于格式化数字(例如货币)非常有用。

¥The Intl.NumberFormat object is useful for formatting numbers, for example currencies.

js
const gasPrice = new Intl.NumberFormat("en-US", {
  style: "currency",
  currency: "USD",
  minimumFractionDigits: 3,
});

console.log(gasPrice.format(5.259)); // $5.259

const hanDecimalRMBInChina = new Intl.NumberFormat("zh-CN-u-nu-hanidec", {
  style: "currency",
  currency: "CNY",
});

console.log(hanDecimalRMBInChina.format(1314.25)); // ¥ 一,三一四.二五

整理

¥Collation

Intl.Collator 对象对于比较和排序字符串很有用。

¥The Intl.Collator object is useful for comparing and sorting strings.

例如,德语实际上有两种不同的排序顺序:调用簿和词典。调用簿排序强调声音,就好像在排序之前将 "ä"、"ö" 等扩展为 "ae"、"oe" 等。

¥For example, there are actually two different sort orders in German, phonebook and dictionary. Phonebook sort emphasizes sound, and it's as if "ä", "ö", and so on were expanded to "ae", "oe", and so on prior to sorting.

js
const names = ["Hochberg", "Hönigswald", "Holzman"];

const germanPhonebook = new Intl.Collator("de-DE-u-co-phonebk");

// as if sorting ["Hochberg", "Hoenigswald", "Holzman"]:
console.log(names.sort(germanPhonebook.compare).join(", "));
// "Hochberg, Hönigswald, Holzman"

一些德语单词与额外的变音符号共轭,因此在字典中,明智的做法是排序时忽略变音符号(除非排序仅由变音符号不同的单词:schon 在 schön 之前)。

¥Some German words conjugate with extra umlauts, so in dictionaries it's sensible to order ignoring umlauts (except when ordering words differing only by umlauts: schon before schön).

js
const germanDictionary = new Intl.Collator("de-DE-u-co-dict");

// as if sorting ["Hochberg", "Honigswald", "Holzman"]:
console.log(names.sort(germanDictionary.compare).join(", "));
// "Hochberg, Holzman, Hönigswald"

有关 Intl API 的更多信息,另请参阅 JavaScript 国际化 API 简介

¥For more information about the Intl API, see also Introducing the JavaScript Internationalization API.