【Java】Java - GC 是如何工作的

2024-05-08 15:33:02 浏览数 (3)

Source

Java — How GC works. One of the most notable features of… | by RR | Jul, 2023 | Medium

image.png

One of the most notable features of Java memory management is its automatic garbage collection. Its primary purpose is to automatically manage memory allocation and deallocation for runtime objects thereby making it easier for developers to write safer codes without any memory- related issues.

Java 内存管理最显著的功能之一是自动垃圾回收。

其主要目的是自动管理运行时对象的内存分配和删除,从而使开发人员更容易编写更安全的代码,而不会出现任何与内存相关的问题。

To understand more about GC let us talk about Java memory management.

重命名当前分支要进一步了解 GC,让我们来谈谈 Java 内存管理。

  1. Java Heap: It is used for dynamic memory allocation. It stores objects and other data structures created during program execution.
  2. Stack: It is used to store local variables and method call frames. Each thread in Java has its own stack which is created when the thread starts. All local variables inside that thread are stored in the stack. For objects created inside the stack, the actual object will be in the heap and the local variable inside stack will store a reference for it.
  3. Java 堆:用于动态内存分配。它存储程序执行过程中创建的对象和其他数据结构。
  4. 堆栈:用于存储局部变量和方法调用框架。
    1. Java 中的每个线程都有自己的栈,栈在线程启动时创建。该线程内的所有局部变量都存储在栈中。
    2. 对于在栈中创建的对象,实际对象将位于堆中,栈中的局部变量将存储其引用。

There are other areas like Metaspace, registers etc which are not very relevant here.

还有其他一些领域,如 Metaspace、寄存器等,与此处关系不大。

When an object is created using new or any other object instantiation methods, memory for that object is allocated inside the heap.

使用 new 或其他对象实例化方法创建对象时,会在堆中为该对象分配内存。

As the application progresses, some of these objects become unreachable when they are not referenced anymore.

随着应用程序的运行,其中一些对象在不再被引用时就会变得不可访问。

If GC is not running, the entire heap is quickly filled up and the application crashes as memory is not available.

如果不运行 GC,整个堆很快就会被填满,应用程序会因内存不足而崩溃。

How GC identifies dead(unreachable) objects (GC 如何识别死亡(不可访问)对象)

There is a misconception that GC tracks all dead objects, which is not true.

有一种误解认为 GC 会跟踪所有死对象,其实不然。

An easier and right way is to track all the live objects ( which are needed for program execution) and clean up everything else.

更简单、更正确的方法是跟踪所有活对象(程序执行所需的对象),并清理其他所有对象。

Now how do we identify all the live objects?

现在,我们该如何识别所有活对象呢?

GC roots

Consider all the objects created inside the program as a tree.

将程序中创建的所有对象视为一棵树。

If we have access to all the roots of such trees, we can easily track all the live objects( which are reachable).

如果我们能访问此类树的所有根,就能轻松跟踪所有活对象(可访问的)。

In Java , the following are considered as valid GC roots.

在 Java 中,以下内容被视为有效的 GC 根。

  1. Local variables: This is available via thread stack and considered live as long as the method is active
  2. Active Java threads.
  3. Static variable: They belong to classes and are shared across all instances. They remain GC roots as long as the class is loaded.
  4. JNI references: They are created as part of a JNI call. Objects thus created are managed specially as it is not possible to know whether it is referred by the native code or not.
  5. 本地变量:可通过线程堆栈使用,只要方法处于活动状态,就被视为活变量。
  6. 活动的 Java 线程
  7. 静态变量:它们属于类,在所有实例中共享。只要类被加载,它们就一直是 GC 根。
  8. JNI 引用:它们是作为 JNI 调用的一部分创建的。这样创建的对象需要特别管理,因为无法知道它是否被本地代码引用。

How cleanup happens?(如何进行清理?)

The GC is done in two steps

GC 分两步进行

Marking: JVM runs the GC thread intermittently which traverses all the object trees starting from the GC roots.

标记: JVM 间歇运行 GC 线程,该线程从 GC 根开始遍历所有对象树。

It marks all the objects reachable as live.

它将所有可到达的对象标记为实时对象。

Sweep: All the remaining heap space are marked(swept) as free for new allocations.

清扫: 所有剩余的堆空间都被标记(清扫)为空闲空间,以便进行新的分配。

As you can see, to perform the marking phase, it is essential that we stop all the live application threads.

如您所见,要执行标记阶段,我们必须停止所有实时应用程序线程。

Else, the object trees cannot be marked safely.

否则,就无法安全地标记对象树。

And the pause time depends on how much memory is live(not dead!!).

暂停时间取决于有多少内存是活动的(不是死的!!)。

The more live objects, the longer it takes for GC to traverse and analyse them.

实时对象越多,GC 遍历和分析这些对象所需的时间就越长。

And this brings us to the fundamental issue with GC — its impact on application performance.

这就引出了 GC 的根本问题 -- 它对 应用程序性能的影响。

Another issue associated with allocation/deallocation is memory fragmentation.

与分配/去分配相关的另一个问题是内存碎片。

It occurs when free memory is fragmented into smaller , non-contiguous chunks making it impossible to allocate large objects even if memory is available.

当空闲内存被分割成较小的、不连续的块时,即使内存可用,也无法分配大型对象。

This quickly becomes a bottleneck if memory is not compacted after each cleanup.

如果每次清理后都不对内存进行压缩,这种情况很快就会成为瓶颈。

GC algorithms(垃圾回收算法)

Multiple GC algorithms and techniques are developed to reduce this performance issue and fragmentation. Let us discuss some of these.

为了减少性能问题和碎片化,开发了多种垃圾回收算法和技术。让我们讨论其中一些。

Serial Collector(串行收集器)

It is the simplest collector which uses a single thread to perform mark-sweep algorithm. This is a stop-the-world-approach where all the application threads are paused while a single GC thread is running. This is suited for low concurrency applications with smaller memory footprints.

这是最简单的收集器,它使用单个线程执行标记-清除算法。

这是一种 Stop-world-方法,即在单个GC线程运行时,所有应用程序线程都会暂停。

适用于低并发应用程序和较小的内存占用。

Parallel Collector(并行收集器)

The only difference from a serial collector is that it uses multiple GC threads. It is more suitable when multiple CPUs are available.

与串行收集器唯一的区别在于它使用多个GC线程。

当有多个CPU可用时,这更加合适。

Concurrent Collector(并发收集器)

The concurrent collector performs most of the activities concurrently while application threads are running. This helps with application throughput and reduces the stop-the-world event duration, thus making the application more responsive.

并发收集器在应用程序线程运行时同时执行大部分活动。

这有助于提高应用程序的吞吐量,减少停止-世界事件的持续时间,从而使应用程序更具响应性。

Generational GC(分代垃圾回收)

As mentioned earlier, the GC performance depends on how many live objects are available in the heap. So in order to improve GC performance, we need to make sure that live objects are as less as possible. This is the main reason behind dividing the heap into multiple generations. The division allows different GC algorithms to be applied selectively based on the characteristics of objects residing there.

如前所述,垃圾回收性能取决于堆中有多少存活对象。

因此,为了提高垃圾回收性能,我们需要确保存活对象尽可能少。

这就是将堆分成多个代的主要原因。

这种划分允许根据所驻留对象的特征有选择地应用不同的垃圾回收算法。

Young and old generation

As you know, newly created objects are more likely to be short-lived, while objects that have been around for a while are more likely to remain live for long. Objects are usually created in young area. As most of the objects are short-lived, the GC runs quickly and cleans up most of them. If some objects survive a few GC cycles inside the young area, they are moved to the old area. As long as the old area is not growing big, we can avoid running GC in old area altogether.

如你所知,新创建的对象更可能是短命的,而存在一段时间的对象更可能长时间存活。

对象通常是在年轻区创建的。

由于大多数对象是短命的,所以垃圾回收运行得很快并清理掉大部分对象。

如果一些对象在年轻区内经历了几次垃圾回收仍然存活,它们将被移动到老年区。

只要老年区不变得很大,我们就可以完全避免在老年区运行垃圾回收。

In order to optimise this further , some JVMs divide young gen into multiple regions: Eden space and Survivor spaces(usually two). New objects are usually created in Eden space. During GC, surviving objects from Eden space and one of the Survivor space are copied into another survivor space, thus keeping Eden and one survivor space always free. This helps with fragmentation issues as we are always keeping Eden and one survivor space completely empty.

为了进一步优化这一过程,一些 JVM 将年轻代划分为多个区域:伊甸园空间和幸存者空间(通常有两个)。

新对象通常在伊甸园空间创建。

在垃圾回收期间,来自伊甸园空间和其中一个幸存者空间的存活对象被复制到另一个幸存者空间,从而始终保持伊甸园和一个幸存者空间为空闲。

这有助于解决碎片化问题,因为我们始终保持伊甸园和一个幸存者空间完全空闲。

  1. Collecting young gen is called minor GCs. Although they have a stop-the-world stage, it is very quick resulting in shorter pause times.
  2. The old gen contains long-lived objects and hence Major GCs are very less.
  3. Different GC algorithms can be applied to each gen based on size and application requirements.
  4. 收集年轻代被称为小型 GC。尽管它们有一个停顿阶段,但非常快速,导致较短的暂停时间。
  5. 老年代包含长寿命对象,因此主要 GC 很少发生。
  6. 根据大小和应用程序要求,可以对每个代应用不同的垃圾回收算法

It is very important that GC should be fine-tuned based on the application requirement. For example, if the young gen is too small, it will result in a lot of objects moving to the old generation. If the young gen is too large, minor GC cycles will take longer time to complete. This will negatively impact application response time.

根据应用程序要求对垃圾回收进行微调非常重要。

例如,如果年轻代太小,将导致许多对象移动到老年代。

如果年轻代太大,小型 GC 周期将花费更长的时间来完成。

这将对应用程序的响应时间产生负面影响。

One of the recent algorithms available from Java 9 is G1 GC. It provided more predictable pause time and better scalability for applications with large heaps.

从 Java 9 开始提供的一种最新算法是 G1 垃圾回收器。

它提供了更可预测的暂停时间,并为具有大堆的应用程序提供了更好的可伸缩性。

0 人点赞