字符串

String 对象用于表示和操作字符序列。

¥The String object is used to represent and manipulate a sequence of characters.

描述

¥Description

字符串对于保存可以以文本形式表示的数据很有用。对字符串最常用的一些操作是检查其 length、使用 ++= 字符串运算符 构建和连接它们、使用 indexOf() 方法检查子字符串是否存在或位置,或者使用 substring() 方法提取子字符串。

¥Strings are useful for holding data that can be represented in text form. Some of the most-used operations on strings are to check their length, to build and concatenate them using the + and += string operators, checking for the existence or location of substrings with the indexOf() method, or extracting substrings with the substring() method.

创建字符串

¥Creating strings

可以使用 String() 构造函数将字符串创建为基元、字符串文字或对象:

¥Strings can be created as primitives, from string literals, or as objects, using the String() constructor:

js
const string1 = "A string primitive";
const string2 = 'Also a string primitive';
const string3 = `Yet another string primitive`;
js
const string4 = new String("A String object");

字符串基元和字符串对象有许多相同的行为,但也有其他重要的区别和注意事项。参见下面的“字符串基元和字符串对象”。

¥String primitives and string objects share many behaviors, but have other important differences and caveats. See "String primitives and String objects" below.

字符串文字可以使用单引号或双引号(处理方式相同)或使用反引号字符 ` 指定。最后一种形式指定了 模板文字:使用这种形式,你可以插入表达式。有关字符串文字语法的更多信息,请参阅 词汇语法

¥String literals can be specified using single or double quotes, which are treated identically, or using the backtick character `. This last form specifies a template literal: with this form you can interpolate expressions. For more information on the syntax of string literals, see lexical grammar.

字符访问

¥Character access

有两种方法可以访问字符串中的单个字符。第一种是 charAt() 方法:

¥There are two ways to access an individual character in a string. The first is the charAt() method:

js
"cat".charAt(1); // gives value "a"

另一种方法是将字符串视为类似数组的对象,其中各个字符对应于数字索引:

¥The other way is to treat the string as an array-like object, where individual characters correspond to a numerical index:

js
"cat"[1]; // gives value "a"

当使用方括号表示法进行字符访问时,尝试删除这些属性或为其赋值将不会成功。涉及的属性既不可写也不可配置。(更多信息请参见 Object.defineProperty()。)

¥When using bracket notation for character access, attempting to delete or assign a value to these properties will not succeed. The properties involved are neither writable nor configurable. (See Object.defineProperty() for more information.)

比较字符串

¥Comparing strings

使用 小于和大于运算符 比较字符串:

¥Use the less-than and greater-than operators to compare strings:

js
const a = "a";
const b = "b";
if (a < b) {
  // true
  console.log(`${a} is less than ${b}`);
} else if (a > b) {
  console.log(`${a} is greater than ${b}`);
} else {
  console.log(`${a} and ${b} are equal.`);
}

请注意,所有比较运算符(包括 =====)都区分大小写比较字符串。不区分大小写比较字符串的常见方法是在比较之前将它们转换为相同的大小写(大写或小写)。

¥Note that all comparison operators, including === and ==, compare strings case-sensitively. A common way to compare strings case-insensitively is to convert both to the same case (upper or lower) before comparing them.

js
function areEqualCaseInsensitive(str1, str2) {
  return str1.toUpperCase() === str2.toUpperCase();
}

是否通过 toUpperCase()toLowerCase() 进行转换的选择大多是任意的,并且当扩展到拉丁字母之外时,两者都不是完全鲁棒的。例如,德语小写字母 ßss 都会被 toUpperCase() 转换为 SS,而土耳其字母 ı 会被误报为 toLowerCase() 不等于 I,除非专门使用 toLocaleLowerCase("tr")

¥The choice of whether to transform by toUpperCase() or toLowerCase() is mostly arbitrary, and neither one is fully robust when extending beyond the Latin alphabet. For example, the German lowercase letter ß and ss are both transformed to SS by toUpperCase(), while the Turkish letter ı would be falsely reported as unequal to I by toLowerCase() unless specifically using toLocaleLowerCase("tr").

js
const areEqualInUpperCase = (str1, str2) =>
  str1.toUpperCase() === str2.toUpperCase();
const areEqualInLowerCase = (str1, str2) =>
  str1.toLowerCase() === str2.toLowerCase();

areEqualInUpperCase("ß", "ss"); // true; should be false
areEqualInLowerCase("ı", "I"); // false; should be true

用于测试不区分大小写的相等性的区域设置感知且强大的解决方案是使用 Intl.Collator API 或字符串的 localeCompare() 方法(它们共享相同的接口),并将 sensitivity 选项设置为 "accent""base"

¥A locale-aware and robust solution for testing case-insensitive equality is to use the Intl.Collator API or the string's localeCompare() method — they share the same interface — with the sensitivity option set to "accent" or "base".

js
const areEqual = (str1, str2, locale = "en-US") =>
  str1.localeCompare(str2, locale, { sensitivity: "accent" }) === 0;

areEqual("ß", "ss", "de"); // false
areEqual("ı", "I", "tr"); // true

localeCompare() 方法以与 strcmp() 类似的方式启用字符串比较 — 它允许以区域设置感知的方式对字符串进行排序。

¥The localeCompare() method enables string comparison in a similar fashion as strcmp() — it allows sorting strings in a locale-aware manner.

字符串基元和字符串对象

¥String primitives and String objects

请注意,JavaScript 区分 String 对象和 primitive string 值。(BooleanNumbers 也是如此。)

¥Note that JavaScript distinguishes between String objects and primitive string values. (The same is true of Boolean and Numbers.)

字符串文字(用双引号或单引号表示)和非构造函数上下文中的 String 调用返回的字符串(即,不使用 new 关键字进行调用)是原始字符串。在对原始字符串调用方法或发生属性查找的上下文中,JavaScript 将自动封装原始字符串并调用该方法或对封装对象执行属性查找。

¥String literals (denoted by double or single quotes) and strings returned from String calls in a non-constructor context (that is, called without using the new keyword) are primitive strings. In contexts where a method is to be invoked on a primitive string or a property lookup occurs, JavaScript will automatically wrap the string primitive and call the method or perform the property lookup on the wrapper object instead.

js
const strPrim = "foo"; // A literal is a string primitive
const strPrim2 = String(1); // Coerced into the string primitive "1"
const strPrim3 = String(true); // Coerced into the string primitive "true"
const strObj = new String(strPrim); // String with new returns a string wrapper object.

console.log(typeof strPrim); // "string"
console.log(typeof strPrim2); // "string"
console.log(typeof strPrim3); // "string"
console.log(typeof strObj); // "object"

警告:你应该很少发现自己使用 String 作为构造函数。

¥Warning: You should rarely find yourself using String as a constructor.

使用 eval() 时,字符串基元和 String 对象也会给出不同的结果。传递给 eval 的原语被视为源代码;通过返回对象,将 String 对象视为所有其他对象。例如:

¥String primitives and String objects also give different results when using eval(). Primitives passed to eval are treated as source code; String objects are treated as all other objects are, by returning the object. For example:

js
const s1 = "2 + 2"; // creates a string primitive
const s2 = new String("2 + 2"); // creates a String object
console.log(eval(s1)); // returns the number 4
console.log(eval(s2)); // returns the string "2 + 2"

由于这些原因,当代码需要原始字符串时,当它遇到 String 对象时,代码可能会中断,尽管通常作者不必担心这种区别。

¥For these reasons, the code may break when it encounters String objects when it expects a primitive string instead, although generally, authors need not worry about the distinction.

String 对象始终可以使用 valueOf() 方法转换为其原始对应对象。

¥A String object can always be converted to its primitive counterpart with the valueOf() method.

js
console.log(eval(s2.valueOf())); // returns the number 4

字符串强制转换

¥String coercion

许多期望字符串的内置操作首先将其参数强制转换为字符串(这在很大程度上是 String 对象的行为与字符串基元类似的原因)。操作 可概括如下:

¥Many built-in operations that expect strings first coerce their arguments to strings (which is largely why String objects behave similarly to string primitives). The operation can be summarized as follows:

有多种方法可以在 JavaScript 中实现几乎相同的效果。

¥There are several ways to achieve nearly the same effect in JavaScript.

  • 模板文字${x} 完全执行上面针对嵌入表达式解释的字符串强制步骤。
  • String() 功能:String(x) 使用相同的算法来转换 x,只不过 符号 不抛出 TypeError,而是返回 "Symbol(description)",其中 description 是 Symbol 的 description
  • 使用 + 运算符"" + x 将其操作数强制为原语而不是字符串,并且对于某些对象,其行为与普通字符串强制完全不同。有关更多详细信息,请参阅其 参考页

根据你的用例,你可能想要使用 ${x}(模仿内置行为)或 String(x)(处理符号值而不引发错误),但不应使用 "" + x

¥Depending on your use case, you may want to use `${x}` (to mimic built-in behavior) or String(x) (to handle symbol values without throwing an error), but you should not use "" + x.

UTF-16 字符、Unicode 代码点和字素簇

¥UTF-16 characters, Unicode code points, and grapheme clusters

字符串基本上表示为 UTF-16 代码单元 的序列。在 UTF-16 编码中,每个代码单元的长度均为 16 位。这意味着最多有 216 或 65536 个可能的字符可表示为单个 UTF-16 代码单元。该字符集称为 基本多语言平面(BMP),包括最常见的字符,如拉丁语、希腊语、西里尔字母,以及许多东亚字符。每个代码单元都可以写成一个字符串,其中 \u 后跟四个十六进制数字。

¥Strings are represented fundamentally as sequences of UTF-16 code units. In UTF-16 encoding, every code unit is exact 16 bits long. This means there are a maximum of 216, or 65536 possible characters representable as single UTF-16 code units. This character set is called the basic multilingual plane (BMP), and includes the most common characters like the Latin, Greek, Cyrillic alphabets, as well as many East Asian characters. Each code unit can be written in a string with \u followed by exactly four hex digits.

然而,整个 Unicode 字符集比 65536 大得多。额外的字符以代理对的形式存储在 UTF-16 中,代理对是表示单个字符的 16 位代码单元对。为了避免歧义,该对的两个部分必须在 0xD8000xDFFF 之间,并且这些代码单元不用于对单代码单元字符进行编码。(更准确地说,前导代理项(也称为高代理项代码单元)的值介于 0xD8000xDBFF(含)之间,而尾随代理项(也称为低代理项代码单元)的值介于 0xDC000xDFFF(含)之间。)每个 Unicode 字符由一个或两个 UTF-16 代码单元组成,也称为 Unicode 代码点。每个 Unicode 代码点都可以用 \u{xxxxxx} 写入字符串,其中 xxxxxx 代表 1-6 个十六进制数字。

¥However, the entire Unicode character set is much, much bigger than 65536. The extra characters are stored in UTF-16 as surrogate pairs, which are pairs of 16-bit code units that represent a single character. To avoid ambiguity, the two parts of the pair must be between 0xD800 and 0xDFFF, and these code units are not used to encode single-code-unit characters. (More precisely, leading surrogates, also called high-surrogate code units, have values between 0xD800 and 0xDBFF, inclusive, while trailing surrogates, also called low-surrogate code units, have values between 0xDC00 and 0xDFFF, inclusive.) Each Unicode character, comprised of one or two UTF-16 code units, is also called a Unicode code point. Each Unicode code point can be written in a string with \u{xxxxxx} where xxxxxx represents 1–6 hex digits.

"孤独的代理" 是满足以下描述之一的 16 位代码单元:

¥A "lone surrogate" is a 16-bit code unit satisfying one of the descriptions below:

  • 它在 0xD8000xDBFF 范围内(包括端值)(即是前导代理项),但它是字符串中的最后一个代码单元,或者下一个代码单元不是尾随代理项。
  • 它在 0xDC000xDFFF 范围内(包括端值)(即尾随代理项),但它是字符串中的第一个代码单元,或者前一个代码单元不是前导代理项。

单独代理不代表任何 Unicode 字符。尽管大多数 JavaScript 内置方法都可以正确处理它们,因为它们都基于 UTF-16 代码单元工作,但在与其他系统交互时,单独代理项通常不是有效值 — 例如,对于单独代理项,encodeURI() 会抛出 URIError,因为 URI 编码 使用 UTF-8 编码,它没有任何单独代理的编码。不包含任何单独代理的字符串称为格式正确的字符串,并且可以安全地与不处理 UTF-16 的函数(例如 encodeURI()TextEncoder)一起使用。你可以使用 isWellFormed() 方法检查字符串是否格式良好,或使用 toWellFormed() 方法清理单独的代理项。

¥Lone surrogates do not represent any Unicode character. Although most JavaScript built-in methods handle them correctly because they all work based on UTF-16 code units, lone surrogates are often not valid values when interacting with other systems — for example, encodeURI() will throw a URIError for lone surrogates, because URI encoding uses UTF-8 encoding, which does not have any encoding for lone surrogates. Strings not containing any lone surrogates are called well-formed strings, and are safe to be used with functions that do not deal with UTF-16 (such as encodeURI() or TextEncoder). You can check if a string is well-formed with the isWellFormed() method, or sanitize lone surrogates with the toWellFormed() method.

除了 Unicode 字符之外,还有某些 Unicode 字符序列应被视为一个视觉单元,称为字素簇。最常见的情况是表情符号:许多具有多种变化的表情符号实际上是由多个表情符号组成的,通常由 <ZWJ> (U+200D) 字符连接。

¥On top of Unicode characters, there are certain sequences of Unicode characters that should be treated as one visual unit, known as a grapheme cluster. The most common case is emojis: many emojis that have a range of variations are actually formed by multiple emojis, usually joined by the <ZWJ> (U+200D) character.

你必须小心迭代的字符级别。例如,split("") 将按 UTF-16 代码单元分割并分隔代理项对。字符串索引也指每个 UTF-16 代码单元的索引。另一方面,[Symbol.iterator]() 按 Unicode 代码点进行迭代。迭代字素簇将需要一些自定义代码。

¥You must be careful which level of characters you are iterating on. For example, split("") will split by UTF-16 code units and will separate surrogate pairs. String indexes also refer to the index of each UTF-16 code unit. On the other hand, [Symbol.iterator]() iterates by Unicode code points. Iterating through grapheme clusters will require some custom code.

js
"😄".split(""); // ['\ud83d', '\ude04']; splits into two lone surrogates

// "Backhand Index Pointing Right: Dark Skin Tone"
[..."👉🏿"]; // ['👉', '🏿']
// splits into the basic "Backhand Index Pointing Right" emoji and
// the "Dark skin tone" emoji

// "Family: Man, Boy"
[..."👨‍👦"]; // [ '👨', '‍', '👦' ]
// splits into the "Man" and "Boy" emoji, joined by a ZWJ

// The United Nations flag
[..."🇺🇳"]; // [ '🇺', '🇳' ]
// splits into two "region indicator" letters "U" and "N".
// All flag emojis are formed by joining two region indicator letters

构造函数

¥Constructor

String()

创建 String 对象。当作为函数调用时,它返回 String 类型的原始值。

静态方法

¥Static methods

String.fromCharCode()

返回使用指定的 Unicode 值序列创建的字符串。

String.fromCodePoint()

返回使用指定的代码点序列创建的字符串。

String.raw()

返回从原始模板字符串创建的字符串。

实例属性

¥Instance properties

这些属性在 String.prototype 上定义并由所有 String 实例共享。

¥These properties are defined on String.prototype and shared by all String instances.

String.prototype.constructor

创建实例对象的构造函数。对于 String 实例,初始值为 String 构造函数。

这些属性是每个 String 实例自己的属性。

¥These properties are own properties of each String instance.

length

反映字符串的 length。只读。

实例方法

¥Instance methods

String.prototype.at()

返回指定 index 处的字符(恰好是一个 UTF-16 代码单元)。接受负整数,从最后一个字符串字符开始倒数。

String.prototype.charAt()

返回指定 index 处的字符(恰好是一个 UTF-16 代码单元)。

String.prototype.charCodeAt()

返回一个数字,该数字是给定 index 处的 UTF-16 代码单元值。

String.prototype.codePointAt()

返回一个非负整数 Number,它是从指定 pos 开始的 UTF-16 编码代码点的代码点值。

String.prototype.concat()

组合两个(或多个)字符串的文本并返回一个新字符串。

String.prototype.endsWith()

判断字符串是否以字符串 searchString 的字符结尾。

String.prototype.includes()

判断调用字符串是否包含 searchString

String.prototype.indexOf()

返回调用 String 对象中第一次出现 searchValue 的索引,如果未找到,则返回 -1

String.prototype.isWellFormed()

返回一个布尔值,指示该字符串是否包含任何 孤独的代理

String.prototype.lastIndexOf()

返回最后一次出现的 searchValue 的调用 String 对象中的索引,如果未找到,则返回 -1

String.prototype.localeCompare()

返回一个数字,指示引用字符串 compareString 按排序顺序是否位于给定字符串之前、之后或等效。

String.prototype.match()

用于将正则表达式 regexp 与字符串进行匹配。

String.prototype.matchAll()

返回所有 regexp 匹配的迭代器。

String.prototype.normalize()

返回调用字符串值的 Unicode 规范化形式。

String.prototype.padEnd()

用给定的字符串从末尾填充当前字符串,并返回长度为 targetLength 的新字符串。

String.prototype.padStart()

用给定的字符串从头开始填充当前字符串,并返回长度为 targetLength 的新字符串。

String.prototype.repeat()

返回由重复 count 次的对象元素组成的字符串。

String.prototype.replace()

用于使用 replaceWith 替换出现的 searchForsearchFor 可以是字符串或正则表达式,replaceWith 可以是字符串或函数。

String.prototype.replaceAll()

用于使用 replaceWith 替换所有出现的 searchForsearchFor 可以是字符串或正则表达式,replaceWith 可以是字符串或函数。

String.prototype.search()

搜索正则表达式 regexp 和调用字符串之间的匹配项。

String.prototype.slice()

提取字符串的一部分并返回一个新字符串。

String.prototype.split()

返回一个字符串数组,该数组通过在出现子字符串 sep 时分割调用字符串来填充。

String.prototype.startsWith()

判断调用字符串是否以字符串 searchString 的字符开头。

String.prototype.substr() Deprecated

返回字符串的一部分,从指定索引开始,然后扩展给定数量的字符。

String.prototype.substring()

返回一个新字符串,其中包含指定索引(或多个索引)中(或之间)的调用字符串的字符。

String.prototype.toLocaleLowerCase()

字符串中的字符将转换为小写,同时尊重当前区域设置。

对于大多数语言,这将返回与 toLowerCase() 相同的结果。

String.prototype.toLocaleUpperCase()

字符串中的字符将转换为大写,同时尊重当前区域设置。

对于大多数语言,这将返回与 toUpperCase() 相同的结果。

String.prototype.toLowerCase()

返回转换为小写的调用字符串值。

String.prototype.toString()

返回表示指定对象的字符串。覆盖 Object.prototype.toString() 方法。

String.prototype.toUpperCase()

返回转换为大写的调用字符串值。

String.prototype.toWellFormed()

返回一个字符串,其中该字符串的所有 孤独的代理 均替换为 Unicode 替换字符 U+FFFD。

String.prototype.trim()

修剪字符串开头和结尾的空格。

String.prototype.trimEnd()

修剪字符串末尾的空格。

String.prototype.trimStart()

修剪字符串开头的空格。

String.prototype.valueOf()

返回指定对象的原始值。覆盖 Object.prototype.valueOf() 方法。

String.prototype[Symbol.iterator]()

返回一个新的迭代器对象,该对象迭代 String 值的代码点,将每个代码点作为 String 值返回。

HTML 封装方法

¥HTML wrapper methods

警告:已弃用。避免使用这些方法。

¥Warning: Deprecated. Avoid these methods.

它们的用途有限,因为它们基于非常古老的 HTML 标准,并且仅提供当前可用的 HTML 标记和属性的子集。如今,他们中的许多人创建了已弃用或非标准的标记。此外,它们执行简单的字符串连接,无需任何验证或清理,这使得它们在使用 innerHTML 直接插入时成为潜在的安全威胁。使用 DOM API 例如 document.createElement() 代替。

¥They are of limited use, as they are based on a very old HTML standard and provide only a subset of the currently available HTML tags and attributes. Many of them create deprecated or non-standard markup today. In addition, they do simple string concatenation without any validation or sanitation, which makes them a potential security threat when directly inserted using innerHTML. Use DOM APIs such as document.createElement() instead.

String.prototype.anchor() Deprecated

<a name="name">(超文本目标)

String.prototype.big() Deprecated
<big>

<blink>

String.prototype.bold() Deprecated
<b>
String.prototype.fixed() Deprecated
<tt>
String.prototype.fontcolor() Deprecated

<font color="color">

String.prototype.fontsize() Deprecated

<font size="size">

String.prototype.italics() Deprecated
<i>

<a href="url">(链接至网址)

String.prototype.small() Deprecated
<small>
String.prototype.strike() Deprecated
<strike>
String.prototype.sub() Deprecated
<sub>
String.prototype.sup() Deprecated
<sup>

请注意,这些方法不会检查字符串本身是否包含 HTML 标签,因此可能会创建无效的 HTML:

¥Note that these methods do not check if the string itself contains HTML tags, so it's possible to create invalid HTML:

js
"</b>".bold(); // <b></b></b>

他们所做的唯一转义是将属性值中的 "(对于 anchor()fontcolor()fontsize()link())替换为 &quot;

¥The only escaping they do is to replace " in the attribute value (for anchor(), fontcolor(), fontsize(), and link()) with &quot;.

js
"foo".anchor('"Hello"'); // <a name="&quot;Hello&quot;">foo</a>

示例

¥Examples

字符串转换

¥String conversion

与调用值的 toString() 方法相比,String() 函数是将值转换为字符串的更可靠方法,因为前者在 nullundefined 上使用时有效。例如:

¥The String() function is a more reliable way of converting values to strings than calling the toString() method of the value, as the former works when used on null and undefined. For example:

js
// You cannot access properties on null or undefined

const nullVar = null;
nullVar.toString(); // TypeError: Cannot read properties of null
String(nullVar); // "null"

const undefinedVar = undefined;
undefinedVar.toString(); // TypeError: Cannot read properties of undefined
String(undefinedVar); // "undefined"

规范

Specification
ECMAScript Language Specification
# sec-string-objects

¥Specifications

浏览器兼容性

BCD tables only load in the browser

¥Browser compatibility

也可以看看

¥See also