String.prototype.toWellFormed()
String 值的 toWellFormed() 方法返回一个字符串,其中该字符串的所有 孤独的代理 均替换为 Unicode 替换字符 U+FFFD。
¥The toWellFormed() method of String values returns a string where all lone surrogates of this string are replaced with the Unicode replacement character U+FFFD.
语法
参数
返回值
¥Return value
一个新字符串,它是该字符串的副本,所有单独代理均替换为 Unicode 替换字符 U+FFFD。如果 str 结构良好,仍然返回一个新字符串(本质上是 str 的副本)。
¥A new string that is a copy of this string, with all lone surrogates replaced with the Unicode replacement character U+FFFD. If str is well formed, a new string is still returned (essentially a copy of str).
描述
¥Description
JavaScript 中的字符串采用 UTF-16 编码。UTF-16 编码有代理对的概念,在 UTF-16 字符、Unicode 代码点和字素簇 章节中有详细介绍。
¥Strings in JavaScript are UTF-16 encoded. UTF-16 encoding has the concept of surrogate pairs, which is introduced in detail in the UTF-16 characters, Unicode code points, and grapheme clusters section.
toWellFormed() 迭代该字符串的代码单元,并用 Unicode 替换字符 U+FFFD � 替换任何单独的代理项。这确保返回的字符串格式良好,并且可以在需要格式良好的字符串的函数中使用,例如 encodeURI。与自定义实现相比,toWellFormed() 更高效,因为引擎可以直接访问字符串的内部表示。
¥toWellFormed() iterates through the code units of this string, and replaces any lone surrogates with the Unicode replacement character U+FFFD �. This ensures that the returned string is well-formed and can be used in functions that expect well-formed strings, such as encodeURI. Compared to a custom implementation, toWellFormed() is more efficient, as engines can directly access the internal representation of strings.
当在某些上下文中使用格式错误的字符串(例如 TextEncoder)时,它们会使用相同的替换字符自动转换为格式正确的字符串。当渲染单独的代理时,它们也会被渲染为替换角色(内部带有问号的菱形)。
¥When ill-formed strings are used in certain contexts, such as TextEncoder, they are automatically converted to well-formed strings using the same replacement character. When lone surrogates are rendered, they are also rendered as the replacement character (a diamond with a question mark inside).
示例
使用 toWellFormed()
¥Using toWellFormed()
const strings = [
// Lone leading surrogate
"ab\uD800",
"ab\uD800c",
// Lone trailing surrogate
"\uDFFFab",
"c\uDFFFab",
// Well-formed
"abc",
"ab\uD83D\uDE04c",
];
for (const str of strings) {
console.log(str.toWellFormed());
}
// Logs:
// "ab�"
// "ab�c"
// "�ab"
// "c�ab"
// "abc"
// "ab😄c"
避免 encodeURI()中的错误
¥Avoiding errors in encodeURI()
如果传递的字符串格式不正确,encodeURI 会抛出错误。可以通过首先使用 toWellFormed() 将字符串转换为格式良好的字符串来避免这种情况。
¥encodeURI throws an error if the string passed is not well-formed. This can be avoided by using toWellFormed() to convert the string to a well-formed string first.
const illFormed = "https://example.com/search?q=\uD800";
try {
encodeURI(illFormed);
} catch (e) {
console.log(e); // URIError: URI malformed
}
console.log(encodeURI(illFormed.toWellFormed())); // "https://example.com/search?q=%EF%BF%BD"
规范
| Specification |
|---|
| ECMAScript Language Specification # sec-string.prototype.towellformed |
浏览器兼容性
BCD tables only load in the browser