内存管理

底层语言(例如 C)具有手动内存管理原语,例如 malloc()free()。相比之下,JavaScript 在创建对象时自动分配内存,并在不再使用对象时释放内存(垃圾回收)。这种自动性是造成混乱的潜在根源:它会给开发者留下他们不需要担心内存管理的错误印象。

¥Low-level languages like C, have manual memory management primitives such as malloc() and free(). In contrast, JavaScript automatically allocates memory when objects are created and frees it when they are not used anymore (garbage collection). This automaticity is a potential source of confusion: it can give developers the false impression that they don't need to worry about memory management.

内存生命周期

¥Memory life cycle

无论使用哪种编程语言,内存生命周期几乎总是相同的:

¥Regardless of the programming language, the memory life cycle is pretty much always the same:

  1. 分配你需要的内存
  2. 使用分配的内存(读、写)
  3. 当不再需要时释放分配的内存

第二部分在所有语言中都是明确的。第一部分和最后一部分在底层语言中是显式的,但在高级语言(如 JavaScript)中大多是隐式的。

¥The second part is explicit in all languages. The first and last parts are explicit in low-level languages but are mostly implicit in high-level languages like JavaScript.

JavaScript 中的分配

¥Allocation in JavaScript

值初始化

¥Value initialization

为了不打扰程序员进行分配,JavaScript 将在最初声明值时自动分配内存。

¥In order to not bother the programmer with allocations, JavaScript will automatically allocate memory when values are initially declared.

js
const n = 123; // allocates memory for a number
const s = "azerty"; // allocates memory for a string

const o = {
  a: 1,
  b: null,
}; // allocates memory for an object and contained values

// (like object) allocates memory for the array and
// contained values
const a = [1, null, "abra"];

function f(a) {
  return a + 2;
} // allocates a function (which is a callable object)

// function expressions also allocate an object
someElement.addEventListener(
  "click",
  () => {
    someElement.style.backgroundColor = "blue";
  },
  false,
);

通过函数调用分配

¥Allocation via function calls

某些函数调用会导致对象分配。

¥Some function calls result in object allocation.

js
const d = new Date(); // allocates a Date object

const e = document.createElement("div"); // allocates a DOM element

一些方法分配新值或对象:

¥Some methods allocate new values or objects:

js
const s = "azerty";
const s2 = s.substr(0, 3); // s2 is a new string
// Since strings are immutable values,
// JavaScript may decide to not allocate memory,
// but just store the [0, 3] range.

const a = ["ouais ouais", "nan nan"];
const a2 = ["generation", "nan nan"];
const a3 = a.concat(a2);
// new array with 4 elements being
// the concatenation of a and a2 elements.

使用价值观

¥Using values

使用值基本上意味着在分配的内存中读取和写入。这可以通过读取或写入变量或对象属性的值,甚至将参数传递给函数来完成。

¥Using values basically means reading and writing in allocated memory. This can be done by reading or writing the value of a variable or an object property or even passing an argument to a function.

当不再需要内存时释放

¥Release when the memory is not needed anymore

大多数内存管理问题发生在这个阶段。此阶段最困难的方面是确定何时不再需要分配的内存。

¥The majority of memory management issues occur at this phase. The most difficult aspect of this stage is determining when the allocated memory is no longer needed.

底层语言要求开发者手动确定程序中不再需要分配的内存的时间点并释放它。

¥Low-level languages require the developer to manually determine at which point in the program the allocated memory is no longer needed and to release it.

一些高级语言(例如 JavaScript)利用一种称为 垃圾收集 (GC) 的自动内存管理形式。垃圾收集器的目的是监视内存分配并确定何时不再需要分配的内存块并回收它。这个自动过程是一个近似值,因为确定是否仍然需要特定存储器块的一般问题是 undecidable

¥Some high-level languages, such as JavaScript, utilize a form of automatic memory management known as garbage collection (GC). The purpose of a garbage collector is to monitor memory allocation and determine when a block of allocated memory is no longer needed and reclaim it. This automatic process is an approximation since the general problem of determining whether or not a specific piece of memory is still needed is undecidable.

垃圾收集

¥Garbage collection

如上所述,自动查找某内存 "不再需要了" 是否不可判定的一般问题。因此,垃圾收集器对一般问题的解决方案实现了限制。本节将解释理解主要垃圾收集算法及其各自限制所必需的概念。

¥As stated above, the general problem of automatically finding whether some memory "is not needed anymore" is undecidable. As a consequence, garbage collectors implement a restriction of a solution to the general problem. This section will explain the concepts that are necessary for understanding the main garbage collection algorithms and their respective limitations.

参考

¥References

垃圾收集算法依赖的主要概念是引用的概念。在内存管理的上下文中,如果一个对象可以访问另一个对象(隐式或显式),则称该对象引用另一个对象。例如,JavaScript 对象具有对其 prototype(隐式引用)及其属性值(显式引用)的引用。

¥The main concept that garbage collection algorithms rely on is the concept of reference. Within the context of memory management, an object is said to reference another object if the former has access to the latter (either implicitly or explicitly). For instance, a JavaScript object has a reference to its prototype (implicit reference) and to its properties values (explicit reference).

在这种情况下,"object" 的概念被扩展到比常规 JavaScript 对象更广泛的东西,并且还包含函数范围(或全局词法范围)。

¥In this context, the notion of an "object" is extended to something broader than regular JavaScript objects and also contain function scopes (or the global lexical scope).

引用计数垃圾收集

¥Reference-counting garbage collection

注意:现代 JavaScript 引擎不再使用引用计数来进行垃圾收集。

¥Note: no modern JavaScript engine uses reference-counting for garbage collection anymore.

这是最简单的垃圾收集算法。该算法减少了从确定是否仍然需要某个对象到确定该对象是否仍有任何其他对象引用它的问题。如果有零个引用指向某个对象,则该对象被称为 "garbage",或者可收集。

¥This is the most naïve garbage collection algorithm. This algorithm reduces the problem from determining whether or not an object is still needed to determining if an object still has any other objects referencing it. An object is said to be "garbage", or collectible if there are zero references pointing to it.

例如:

¥For example:

js
let x = {
  a: {
    b: 2,
  },
};
// 2 objects are created. One is referenced by the other as one of its properties.
// The other is referenced by virtue of being assigned to the 'x' variable.
// Obviously, none can be garbage-collected.

let y = x;
// The 'y' variable is the second thing that has a reference to the object.

x = 1;
// Now, the object that was originally in 'x' has a unique reference
// embodied by the 'y' variable.

let z = y.a;
// Reference to 'a' property of the object.
// This object now has 2 references: one as a property,
// the other as the 'z' variable.

y = "mozilla";
// The object that was originally in 'x' has now zero
// references to it. It can be garbage-collected.
// However its 'a' property is still referenced by
// the 'z' variable, so it cannot be freed.

z = null;
// The 'a' property of the object originally in x
// has zero references to it. It can be garbage collected.

循环引用有一个限制。在以下示例中,创建了两个对象,其属性相互引用,从而创建了一个循环。函数调用完成后,它们将超出范围。那时它们就不再需要了,它们分配的内存应该被回收。然而,引用计数算法不会认为它们是可回收的,因为这两个对象中的每一个都至少有一个指向它们的引用,导致它们都没有被标记为垃圾回收。循环引用是内存泄漏的常见原因。

¥There is a limitation when it comes to circular references. In the following example, two objects are created with properties that reference one another, thus creating a cycle. They will go out of scope after the function call has completed. At that point they become unneeded and their allocated memory should be reclaimed. However, the reference-counting algorithm will not consider them reclaimable since each of the two objects has at least one reference pointing to them, resulting in neither of them being marked for garbage collection. Circular references are a common cause of memory leaks.

js
function f() {
  const x = {};
  const y = {};
  x.a = y; // x references y
  y.a = x; // y references x

  return "azerty";
}

f();

标记-清除算法

¥Mark-and-sweep algorithm

该算法将 "不再需要某个对象" 的定义简化为 "对象无法访问"。

¥This algorithm reduces the definition of "an object is no longer needed" to "an object is unreachable".

该算法假设了解一组称为根的对象。在 JavaScript 中,根是全局对象。垃圾收集器会定期从这些根开始,找到从这些根引用的所有对象,然后找到从这些根引用的所有对象,等等。从根开始,垃圾收集器将找到所有可到达的对象并收集所有不可到达的对象 对象。

¥This algorithm assumes the knowledge of a set of objects called roots. In JavaScript, the root is the global object. Periodically, the garbage collector will start from these roots, find all objects that are referenced from these roots, then all objects referenced from these, etc. Starting from the roots, the garbage collector will thus find all reachable objects and collect all non-reachable objects.

该算法比前一种算法有所改进,因为具有零引用的对象实际上是不可访问的。正如我们在循环引用中看到的那样,相反的情况并不成立。

¥This algorithm is an improvement over the previous one since an object having zero references is effectively unreachable. The opposite does not hold true as we have seen with circular references.

目前,所有现代引擎都配备了标记和清除垃圾收集器。过去几年在 JavaScript 垃圾收集(分代/增量/并发/并行垃圾收集)字段所做的所有改进都是该算法的实现改进,但不是对垃圾收集算法本身的改进,也不是对何时定义的减少 "不再需要某个对象"。

¥Currently, all modern engines ship a mark-and-sweep garbage collector. All improvements made in the field of JavaScript garbage collection (generational/incremental/concurrent/parallel garbage collection) over the last few years are implementation improvements of this algorithm, but not improvements over the garbage collection algorithm itself nor its reduction of the definition of when "an object is no longer needed".

这种方法的直接好处是循环不再是问题。在上面的第一个示例中,函数调用返回后,从全局对象可访问的任何资源都不再引用这两个对象。因此,垃圾收集器将发现它们无法访问,并回收其分配的内存。

¥The immediate benefit of this approach is that cycles are no longer a problem. In the first example above, after the function call returns, the two objects are no longer referenced by any resource that is reachable from the global object. Consequently, they will be found unreachable by the garbage collector and have their allocated memory reclaimed.

但是,仍然无法手动控制垃圾收集。有时,手动决定何时释放以及释放哪些内存会很方便。为了释放对象的内存,需要显式地使其不可访问。在 JavaScript 中也不可能以编程方式触发垃圾收集,并且可能永远不会在核心语言中,尽管引擎可能会公开选择加入标志后面的 API。

¥However, the inability to manually control garbage collection remains. There are times when it would be convenient to manually decide when and what memory is released. In order to release the memory of an object, it needs to be made explicitly unreachable. It is also not possible to programmatically trigger garbage collection in JavaScript — and will likely never be within the core language, although engines may expose APIs behind opt-in flags.

配置引擎的内存模型

¥Configuring an engine's memory model

JavaScript 引擎通常提供公开内存模型的标志。例如,Node.js 提供了额外的选项和工具,这些选项和工具公开了用于配置和调试内存问题的底层 V8 机制。此配置可能在浏览器中不可用,对于网页(通过 HTTP 标头等)更不可用。

¥JavaScript engines typically offer flags that expose the memory model. For example, Node.js offers additional options and tools that expose the underlying V8 mechanisms for configuring and debugging memory issues. This configuration may not be available in browsers, and even less so for web pages (via HTTP headers, etc.).

可以使用标志来增加可用堆内存的最大量:

¥The max amount of available heap memory can be increased with a flag:

bash
node --max-old-space-size=6000 index.js

我们还可以使用标志和 Chrome 调试器 公开垃圾收集器来调试内存问题:

¥We can also expose the garbage collector for debugging memory issues using a flag and the Chrome Debugger:

bash
node --expose-gc --inspect index.js

帮助内存管理的数据结构

¥Data structures aiding memory management

尽管 JavaScript 不直接公开垃圾收集器 API,但该语言提供了几种间接观察垃圾收集的数据结构,并可用于管理内存使用情况。

¥Although JavaScript does not directly expose the garbage collector API, the language offers several data structures that indirectly observe garbage collection and can be used to manage memory usage.

WeakMap 和 WeakSet

¥WeakMaps and WeakSets

WeakMapWeakSet 是数据结构,其 API 与非弱对应数据结构密切相关:MapSetWeakMap 允许你维护键值对的集合,而 WeakSet 允许你维护唯一值的集合,两者都具有高性能的添加、删除和查询。

¥WeakMap and WeakSet are data structures whose APIs closely mirror their non-weak counterparts: Map and Set. WeakMap allows you to maintain a collection of key-value pairs, while WeakSet allows you to maintain a collection of unique values, both with performant addition, deletion, and querying.

WeakMapWeakSet 的名称来源于弱持有值的概念。如果 xy 弱保留,则意味着虽然你可以通过 y 访问 x 的值,但如果没有其他东西强保留它,则标记和清除算法不会认为 x 是可达的。大多数数据结构(除了这里讨论的数据结构)都强烈保存传入的对象,以便你可以随时检索它们。只要程序中没有其他内容引用该键,WeakMapWeakSet 的键就可以被垃圾收集(对于 WeakMap 对象,这些值也将符合垃圾收集的条件)。这是由两个特性保证的:

¥WeakMap and WeakSet got the name from the concept of weakly held values. If x is weakly held by y, it means that although you can access the value of x via y, the mark-and-sweep algorithm won't consider x as reachable if nothing else strongly holds to it. Most data structures, except the ones discussed here, strongly holds to the objects passed in so that you can retrieve them at any time. The keys of WeakMap and WeakSet can be garbage-collected (for WeakMap objects, the values would then be eligible for garbage collection as well) as long as nothing else in the program is referencing the key. This is ensured by two characteristics:

  • WeakMapWeakSet 只能存储对象或符号。这是因为只有对象才会被垃圾收集 - 原始值总是可以被伪造(即 1 === 1{} !== {}),使它们永远留在集合中。注册符号(如 Symbol.for("key"))也可以被伪造,因此不可进行垃圾回收,但使用 Symbol("key") 创建的符号是可进行垃圾回收的。与 Symbol.iterator 一样,知名符号 属于固定集合,并且在程序的整个生命周期中都是唯一的,类似于 Array.prototype 等内在对象,因此它们也可以作为键。
  • WeakMapWeakSet 不可迭代。这可以防止你使用 Array.from(map.keys()).length 来观察对象的活跃性,或获取本来应该有资格进行垃圾回收的任意密钥。(垃圾收集应该尽可能不可见。)

WeakMapWeakSet 的典型解释中(例如上面的解释),通常暗示键首先被垃圾收集,同时释放值以进行垃圾收集。但是,请考虑引用键的值的情况:

¥In typical explanations of WeakMap and WeakSet (such as the one above), it's often implied that the key is garbage-collected first, freeing the value for garbage collection as well. However, consider the case of the value referencing the key:

js
const wm = new WeakMap();
const key = {};
wm.set(key, { key });
// Now `key` cannot be garbage collected,
// because the value holds a reference to the key,
// and the value is strongly held in the map!

如果 key 存储为实际引用,它将创建一个循环引用,并使键和值都没有资格进行垃圾回收,即使没有其他东西引用 key — 因为如果 key 被垃圾回收,则意味着在某个特定时刻,value.key 会指向一个不存在的地址,这是不合法的。为了解决这个问题,WeakMapWeakSet 的条目不是实际引用,而是 ephemerons,这是对标记和清除机制的增强。巴罗斯等人。 提供了对该算法的很好的总结(第 4 页)。引用一段话:

¥If key is stored as an actual reference, it would create a cyclic reference and make both the key and value ineligible for garbage collection, even when nothing else references key — because if key is garbage collected, it means that at some particular instant, value.key would point to a non-existent address, which is not legal. To fix this, the entries of WeakMap and WeakSet aren't actual references, but ephemerons, an enhancement to the mark-and-sweep mechanism. Barros et al. offers a good summary of the algorithm (page 4). To quote a paragraph:

星历是弱对的改进,其中键和值都不能被归类为弱或强。键的连通性决定了值的连通性,但值的连通性不影响键的连通性。[...]当垃圾收集为 ephemerons 提供支持时,它分三个阶段而不是两个阶段(标记和清除)进行。

¥Ephemerons are a refinement of weak pairs where neither the key nor the value can be classified as weak or strong. The connectivity of the key determines the connectivity of the value, but the connectivity of the value does not affect the connectivity of the key. […] when the garbage collection offers support to ephemerons, it occurs in three phases instead of two (mark and sweep).

作为一个粗略的心理模型,将 WeakMap 视为以下实现:

¥As a rough mental model, think of a WeakMap as the following implementation:

警告:这不是一个 polyfill,也不是它在引擎中的实现方式(它与垃圾收集机制钩子)。

¥Warning: This is not a polyfill nor is anywhere close to how it's implemented in the engine (which hooks into the garbage collection mechanism).

js
class MyWeakMap {
  #marker = Symbol("MyWeakMapData");
  get(key) {
    return key[this.#marker];
  }
  set(key, value) {
    key[this.#marker] = value;
  }
  has(key) {
    return this.#marker in key;
  }
  delete(key) {
    delete key[this.#marker];
  }
}

正如你所看到的,MyWeakMap 实际上从未保存过一组密钥。它只是将元数据添加到传入的每个对象中。然后该对象可以通过标记和清除进行垃圾收集。因此,不可能迭代 WeakMap 中的键,也不可能清除 WeakMap(因为这也依赖于整个键集合的知识)。

¥As you can see, the MyWeakMap never actually holds a collection of keys. It simply adds metadata to each object being passed in. The object is then garbage-collectable via mark-and-sweep. Therefore, it's not possible to iterate over the keys in a WeakMap, nor clear the WeakMap (as that also relies on the knowledge of the entire key collection).

有关其 API 的更多信息,请参阅 键控集合 指南。

¥For more information on their APIs, see the keyed collections guide.

WeakRefs 和 FinalizationRegistry

¥WeakRefs and FinalizationRegistry

注意:WeakRefFinalizationRegistry 提供对垃圾收集机制的直接内省。尽可能避免使用它们 因为运行时语义几乎完全无法保证。

¥Note: WeakRef and FinalizationRegistry offer direct introspection into the garbage collection machinery. Avoid using them where possible because the runtime semantics are almost completely unguaranteed.

所有以对象作为值的变量都是对该对象的引用。然而,这样的引用很强大 - 它们的存在会阻止垃圾收集器将对象标记为有资格收集。WeakRef 是对象的弱引用,它允许对象被垃圾收集,同时仍然保留在其生命周期内读取对象内容的能力。

¥All variables with an object as value are references to that object. However, such references are strong — their existence would prevent the garbage collector from marking the object as eligible for collection. A WeakRef is a weak reference to an object that allows the object to be garbage collected, while still retaining the ability to read the object's content during its lifetime.

WeakRef 的一个用例是将字符串 URL 映射到大对象的缓存系统。我们不能使用 WeakMap 来达到此目的,因为 WeakMap 对象的键是弱保留的,但不是它们的值 - 如果你访问一个键,你总是会确定性地获得该值(因为可以访问该键意味着它仍然存在)。在这里,我们可以获取某个键的 undefined(如果相应的值不再存在),因为我们可以重新计算它,但我们不希望无法访问的对象保留在缓存中。在这种情况下,我们可以使用普通的 Map,但每个值都是对象的 WeakRef,而不是实际的对象值。

¥One use case for WeakRef is a cache system which maps string URLs to large objects. We cannot use a WeakMap for this purpose, because WeakMap objects have their keys weakly held, but not their values — if you access a key, you would always deterministically get the value (since having access to the key means it's still alive). Here, we are okay to get undefined for a key (if the corresponding value is no longer alive) since we can just re-compute it, but we don't want unreachable objects to stay in the cache. In this case, we can use a normal Map, but with each value being a WeakRef of the object instead of the actual object value.

js
function cached(getter) {
  // A Map from string URLs to WeakRefs of results
  const cache = new Map();
  return async (key) => {
    if (cache.has(key)) {
      const dereferencedValue = cache.get(key).deref();
      if (dereferencedValue !== undefined) {
        return dereferencedValue;
      }
    }
    const value = await getter(key);
    cache.set(key, new WeakRef(value));
    return value;
  };
}

const getImage = cached((url) => fetch(url).then((res) => res.blob()));

FinalizationRegistry 提供了更强大的机制来观察垃圾收集。它允许你注册对象并在它们被垃圾收集时收到通知。例如,对于上面示例的缓存系统,即使 blob 本身可以自由收集,但保存它们的 WeakRef 对象却不能 - 并且随着时间的推移,Map 可能会积累大量无用的条目。在这种情况下,使用 FinalizationRegistry 可以执行清理操作。

¥FinalizationRegistry provides an even stronger mechanism to observe garbage collection. It allows you to register objects and be notified when they are garbage collected. For example, for the cache system exemplified above, even when the blobs themselves are free for collection, the WeakRef objects that hold them are not — and over time, the Map may accumulate a lot of useless entries. Using a FinalizationRegistry allows one to perform cleanup in this case.

js
function cached(getter) {
  // A Map from string URLs to WeakRefs of results
  const cache = new Map();
  // Every time after a value is garbage collected, the callback is
  // called with the key in the cache as argument, allowing us to remove
  // the cache entry
  const registry = new FinalizationRegistry((key) => {
    // Note: it's important to test that the WeakRef is indeed empty.
    // Otherwise, the callback may be called after a new object has been
    // added with this key, and that new, alive object gets deleted
    if (!cache.get(key)?.deref()) {
      cache.delete(key);
    }
  });
  return async (key) => {
    if (cache.has(key)) {
      return cache.get(key).deref();
    }
    const value = await getter(key);
    cache.set(key, new WeakRef(value));
    registry.register(value, key);
    return value;
  };
}

const getImage = cached((url) => fetch(url).then((res) => res.blob()));

出于性能和安全考虑,无法保证回调何时被调用,或者是否会被调用。它应该只用于清理和非关键清理。还有其他方法可以实现更具确定性的资源管理,例如 try...finally,它将始终执行 finally 块。WeakRefFinalizationRegistry 的存在只是为了优化长时间运行的程序中的内存使用。

¥Due to performance and security concerns, there is no guarantee of when the callback will be called, or if it will be called at all. It should only be used for cleanup — and non-critical cleanup. There are other ways for more deterministic resource management, such as try...finally, which will always execute the finally block. WeakRef and FinalizationRegistry exist solely for optimization of memory usage in long-running programs.

有关 WeakRefFinalizationRegistry API 的更多信息,请参阅它们的参考页。

¥For more information on the API of WeakRef and FinalizationRegistry, see their reference pages.