String.prototype.charCodeAt()
String 值的 charCodeAt() 方法返回 0 和 65535 之间的整数,表示给定索引处的 UTF-16 代码单元。
¥The charCodeAt() method of String values returns an integer between 0 and 65535 representing the UTF-16 code unit at the given index.
charCodeAt() 始终将字符串索引为 UTF-16 代码单元 的序列,因此它可能返回单独的代理项。要获取给定索引处的完整 Unicode 代码点,请使用 String.prototype.codePointAt()。
¥charCodeAt() always indexes the string as a sequence of UTF-16 code units, so it may return lone surrogates. To get the full Unicode code point at the given index, use String.prototype.codePointAt().
Try it
语法
参数
返回值
¥Return value
0 和 65535 之间的整数,表示指定 index 处字符的 UTF-16 代码单元值。如果 index 超出 0 – str.length - 1 的范围,则 charCodeAt() 返回 NaN。
¥An integer between 0 and 65535 representing the UTF-16 code unit value of the character at the specified index. If index is out of range of 0 – str.length - 1, charCodeAt() returns NaN.
描述
¥Description
字符串中的字符从左到右进行索引。名为 str 的字符串中第一个字符的索引是 0,最后一个字符的索引是 str.length - 1。
¥Characters in a string are indexed from left to right. The index of the first character is 0, and the index of the last character in a string called str is str.length - 1.
Unicode 代码点范围从 0 到 1114111 (0x10FFFF)。charCodeAt() 始终返回小于 65536 的值,因为较高的代码点由一对 16 位代理伪字符表示。因此,为了获得值大于 65535 的完整字符,不仅需要检索 charCodeAt(i),还需要检索 charCodeAt(i + 1)(就像操作两个字符的字符串一样),或者使用 codePointAt(i) 代替。有关 Unicode 的信息,请参阅 UTF-16 字符、Unicode 代码点和字素簇。
¥Unicode code points range from 0 to 1114111 (0x10FFFF). charCodeAt() always returns a value that is less than 65536, because the higher code points are represented by a pair of 16-bit surrogate pseudo-characters. Therefore, in order to get a full character with value greater than 65535, it is necessary to retrieve not only charCodeAt(i), but also charCodeAt(i + 1) (as if manipulating a string with two characters), or to use codePointAt(i) instead. For information on Unicode, see UTF-16 characters, Unicode code points, and grapheme clusters.
示例
使用 charCodeAt()
¥Using charCodeAt()
以下示例返回 65,即 A 的 Unicode 值。
¥The following example returns 65, the Unicode value for A.
"ABC".charCodeAt(0); // returns 65
charCodeAt() 可能会返回单独的代理,它们不是有效的 Unicode 字符。
¥charCodeAt() may return lone surrogates, which are not valid Unicode characters.
const str = "𠮷𠮾";
console.log(str.charCodeAt(0)); // 55362, or d842, which is not a valid Unicode character
console.log(str.charCodeAt(1)); // 57271, or dfb7, which is not a valid Unicode character
要获取给定索引处的完整 Unicode 代码点,请使用 String.prototype.codePointAt()。
¥To get the full Unicode code point at the given index, use String.prototype.codePointAt().
const str = "𠮷𠮾";
console.log(str.codePointAt(0)); // 134071
注意:避免使用
charCodeAt()重新实现codePointAt()。从 UTF-16 代理项到 Unicode 代码点的转换很复杂,而codePointAt()可能性能更高,因为它直接使用字符串的内部表示形式。如有必要,请为codePointAt()安装 Polyfill。¥Note: Avoid re-implementing
codePointAt()usingcharCodeAt(). The translation from UTF-16 surrogates to Unicode code points is complex, andcodePointAt()may be more performant as it directly uses the internal representation of the string. Install a polyfill forcodePointAt()if necessary.
下面是将一对 UTF-16 代码单元转换为 Unicode 代码点的可能算法,改编自 统一码常见问题解答:
¥Below is a possible algorithm to convert a pair of UTF-16 code units into a Unicode code point, adapted from the Unicode FAQ:
// constants
const LEAD_OFFSET = 0xd800 - (0x10000 >> 10);
const SURROGATE_OFFSET = 0x10000 - (0xd800 << 10) - 0xdc00;
function utf16ToUnicode(lead, trail) {
return (lead << 10) + trail + SURROGATE_OFFSET;
}
function unicodeToUTF16(codePoint) {
const lead = LEAD_OFFSET + (codePoint >> 10);
const trail = 0xdc00 + (codePoint & 0x3ff);
return [lead, trail];
}
const str = "𠮷";
console.log(utf16ToUnicode(str.charCodeAt(0), str.charCodeAt(1))); // 134071
console.log(str.codePointAt(0)); // 134071
规范
| Specification |
|---|
| ECMAScript Language Specification # sec-string.prototype.charcodeat |
浏览器兼容性
BCD tables only load in the browser