String.prototype.codePointAt()

String 值的 codePointAt() 方法返回一个非负整数,它是从给定索引开始的字符的 Unicode 代码点值。请注意,该索引仍然基于 UTF-16 代码单元,而不是 Unicode 代码点。

¥The codePointAt() method of String values returns a non-negative integer that is the Unicode code point value of the character starting at the given index. Note that the index is still based on UTF-16 code units, not Unicode code points.

Try it

语法

¥Syntax

js
codePointAt(index)

参数

¥Parameters

index

要返回的字符的从零开始的索引。转换为整数undefined 转换为 0。

返回值

¥Return value

一个非负整数,表示给定 index 处字符的代码点值。

¥A non-negative integer representing the code point value of the character at the given index.

  • 如果 index 超出 0str.length - 1 范围,则 codePointAt() 返回 undefined
  • 如果 index 处的元素是 UTF-16 前导代理项,则返回代理项对的代码点。
  • 如果 index 处的元素是 UTF-16 尾随代理项,则仅返回尾随代理项代码单元。

描述

¥Description

字符串中的字符从左到右进行索引。名为 str 的字符串中第一个字符的索引是 0,最后一个字符的索引是 str.length - 1

¥Characters in a string are indexed from left to right. The index of the first character is 0, and the index of the last character in a string called str is str.length - 1.

Unicode 代码点范围从 01114111 (0x10FFFF)。在 UTF-16 中,每个字符串索引都是一个值为 065535 的代码单元。较高的代码点由一对 16 位代理伪字符表示。因此,codePointAt() 返回可能跨越两个字符串索引的代码点。有关 Unicode 的信息,请参阅 UTF-16 字符、Unicode 代码点和字素簇

¥Unicode code points range from 0 to 1114111 (0x10FFFF). In UTF-16, each string index is a code unit with value 065535. Higher code points are represented by a pair of 16-bit surrogate pseudo-characters. Therefore, codePointAt() returns a code point that may span two string indices. For information on Unicode, see UTF-16 characters, Unicode code points, and grapheme clusters.

示例

¥Examples

使用 codePointAt()

¥Using codePointAt()

js
"ABC".codePointAt(0); // 65
"ABC".codePointAt(0).toString(16); // 41

"😍".codePointAt(0); // 128525
"\ud83d\ude0d".codePointAt(0); // 128525
"\ud83d\ude0d".codePointAt(0).toString(16); // 1f60d

"😍".codePointAt(1); // 56845
"\ud83d\ude0d".codePointAt(1); // 56845
"\ud83d\ude0d".codePointAt(1).toString(16); // de0d

"ABC".codePointAt(42); // undefined

使用 codePointAt() 循环

¥Looping with codePointAt()

因为使用字符串索引进行循环会导致相同的代码点被访问两次(一次用于前导代理项,一次用于尾随代理项),并且第二次 codePointAt() 仅返回尾随代理项,所以最好避免按索引循环。

¥Because using string indices for looping causes the same code point to be visited twice (once for the leading surrogate, once for the trailing surrogate), and the second time codePointAt() returns only the trailing surrogate, it's better to avoid looping by index.

js
const str = "\ud83d\udc0e\ud83d\udc71\u2764";

for (let i = 0; i < str.length; i++) {
  console.log(str.codePointAt(i).toString(16));
}
// '1f40e', 'dc0e', '1f471', 'dc71', '2764'

相反,请使用 for...of 语句或 把绳子展开,这两者都会调用字符串的 [Symbol.iterator](),后者按代码点进行迭代。然后,使用 codePointAt(0) 获取每个元素的代码点。

¥Instead, use a for...of statement or spread the string, both of which invoke the string's [Symbol.iterator](), which iterates by code points. Then, use codePointAt(0) to get the code point of each element.

js
for (const codePoint of str) {
  console.log(codePoint.codePointAt(0).toString(16));
}
// '1f40e', '1f471', '2764'

[...str].map((cp) => cp.codePointAt(0).toString(16));
// ['1f40e', '1f471', '2764']

规范

Specification
ECMAScript Language Specification
# sec-string.prototype.codepointat

¥Specifications

浏览器兼容性

BCD tables only load in the browser

¥Browser compatibility

也可以看看