Java中的ArrayList与System.arraycopy底层原理

2022-05-12 10:33:36 浏览数 (1)

OPENJDK源码:wget https://download.java.net/openjdk/jdk8/promoted/b132/openjdk-8-src-b132-03_mar_2014.zip

1 ArrayList增长策略

  • 最小增长区间:10
  • 增长算法:new = old old / 2
  • 实际增长点:10、15、22、33、49、73、109、163、244、366、548、823、1234

也就是说增长到1000的数组如果没有事先指定大小,会发生13次Arrays.copyOf动作,拷贝代价多大?继续分析

代码语言:javascript复制
    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity   (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

2 数组拷贝System.arraycopy

调试代码V1

底层函数被频繁调用,无法分辨是不是这套代码的堆栈。

代码语言:javascript复制
import java.util.*;

public class ListTest1 {
    public static void main(String[] args) {
        List<String> stringArrayList = new ArrayList<>();
        for (int i = 0; i<100000; i  ) {
            stringArrayList.add("hello");
        }
        System.out.println(stringArrayList.get(0));
    }
}

调试代码V2

GDB看不到JAVA堆栈,用变量值定位所需堆栈

代码语言:javascript复制
public class SystemCopyTest {
    public static void main(String[] args) {
        char[] s = new char[6000];
        char[] d = new char[9000];
        for (int i = 79; i < 137; i  ) {
            s[i] = (char)(i-14);
        }
        System.arraycopy(s, 79, d, 0, 58);
        System.out.println(d);
    }
}

输出

代码语言:javascript复制
[root@27cb1371dc0d ~]# /root/openjdk/build/linux-x86_64-normal-server-slowdebug/jdk/bin/javac SystemCopyTest.java
[root@27cb1371dc0d ~]# /root/openjdk/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java SystemCopyTest
ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz

堆栈分析

JAVA Frames-0

0层栈帧在JAVA中,进入JAVA堆栈

源码

代码语言:javascript复制
    public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);
JVM Frames-1

/root/openjdk/hotspot/src/share/vm/prims/jvm.cpp:310

s->klass()->copy_array(s, src_pos, d, dst_pos, length, thread);

JVM Frames-2

/root/openjdk/hotspot/src/share/vm/oops/typeArrayKlass.cpp:155

void TypeArrayKlass::copy_array(arrayOop s, int src_pos, arrayOop d, int dst_pos, int length, TRAPS)

注意打断点的技巧,底层函数会被很多地方调用,要找到关注的堆栈可以用值匹配

b TypeArrayKlass::copy_array if (src_pos79&&length58)

gdb

代码语言:javascript复制
Breakpoint 1, TypeArrayKlass::copy_array (this=0x100000210, s=0xf5a00000, src_pos=79, d=0xf5a02ef0, dst_pos=0, length=58, __the_thread__=0x7f905400b000)
    at /root/openjdk/hotspot/src/share/vm/oops/typeArrayKlass.cpp:130
130   assert(s->is_typeArray(), "must be type array");
(gdb) bt
#0  TypeArrayKlass::copy_array (this=0x100000210, s=0xf5a00000, src_pos=79, d=0xf5a02ef0, dst_pos=0, length=58, __the_thread__=0x7f905400b000)
    at /root/openjdk/hotspot/src/share/vm/oops/typeArrayKlass.cpp:155
#1  0x00007f905b070a47 in JVM_ArrayCopy (env=0x7f905400b210, ignored=0x7f905c8f86d8, src=0x7f905c8f86a8, src_pos=79, dst=0x7f905c8f86b8, dst_pos=0, length=58)
    at /root/openjdk/hotspot/src/share/vm/prims/jvm.cpp:310
#2  0x00007f90451cdb51 in ?? ()
#3  0x000000000000003a in ?? ()
#4  0x00000000f5a00000 in ?? ()
#5  0x00007f905c8f6aa8 in ?? ()
#6  0x00000000f5a02ef0 in ?? ()
#7  0x0000000000000004 in ?? ()
#8  0x0000000000000000 in ?? ()

源码

代码语言:javascript复制
void TypeArrayKlass::copy_array(arrayOop s, int src_pos, arrayOop d, int dst_pos, int length, TRAPS) {
  assert(s->is_typeArray(), "must be type array");

  // Check destination
  if (!d->is_typeArray() || element_type() != TypeArrayKlass::cast(d->klass())->element_type()) {
    THROW(vmSymbols::java_lang_ArrayStoreException());
  }

  // Check is all offsets and lengths are non negative
  if (src_pos < 0 || dst_pos < 0 || length < 0) {
    THROW(vmSymbols::java_lang_ArrayIndexOutOfBoundsException());
  }
  // Check if the ranges are valid
  if  ( (((unsigned int) length   (unsigned int) src_pos) > (unsigned int) s->length())
     || (((unsigned int) length   (unsigned int) dst_pos) > (unsigned int) d->length()) ) {
    THROW(vmSymbols::java_lang_ArrayIndexOutOfBoundsException());
  }
  // Check zero copy
  if (length == 0)
    return;

  // This is an attempt to make the copy_array fast.
  int l2es = log2_element_size();
  int ihs = array_header_in_bytes() / wordSize;
  char* src = (char*) ((oop*)s   ihs)   ((size_t)src_pos << l2es);
  char* dst = (char*) ((oop*)d   ihs)   ((size_t)dst_pos << l2es);
  Copy::conjoint_memory_atomic(src, dst, (size_t)length << l2es);
}
JVM Frames-3(对齐)

/root/openjdk/hotspot/src/share/vm/utilities/copy.cpp:45

void Copy::conjoint_memory_atomic

gdb

代码语言:javascript复制
(gdb) bt
#0  Copy::conjoint_memory_atomic (from=0xf5a000ae, to=0xf5a02f00, size=116) at /root/openjdk/hotspot/src/share/vm/utilities/copy.cpp:46
#1  0x00007f905b3f84df in TypeArrayKlass::copy_array (this=0x100000210, s=0xf5a00000, src_pos=79, d=0xf5a02ef0, dst_pos=0, length=58, __the_thread__=0x7f905400b000)
    at /root/openjdk/hotspot/src/share/vm/oops/typeArrayKlass.cpp:155
#2  0x00007f905b070a47 in JVM_ArrayCopy (env=0x7f905400b210, ignored=0x7f905c8f86d8, src=0x7f905c8f86a8, src_pos=79, dst=0x7f905c8f86b8, dst_pos=0, length=58)
    at /root/openjdk/hotspot/src/share/vm/prims/jvm.cpp:310
#3  0x00007f90451cdb51 in ?? ()
#4  0x000000000000003a in ?? ()
#5  0x00000000f5a00000 in ?? ()
#6  0x00007f905c8f6aa8 in ?? ()
#7  0x00000000f5a02ef0 in ?? ()
#8  0x0000000000000004 in ?? ()
#9  0x0000000000000000 in ?? ()

对齐到8、4、2上,走不同的处理分支

代码语言:javascript复制
void Copy::conjoint_memory_atomic(void* from, void* to, size_t size = 116) {
  address src = (address) from = 4120903854
  address dst = (address) to = 4120915712
  uintptr_t bits = (uintptr_t) src | (uintptr_t) dst | (uintptr_t) size = 4120915966

bits % sizeof(jlong) = 2
      
  if (bits % sizeof(jlong) == 0) {
    Copy::conjoint_jlongs_atomic((jlong*) src, (jlong*) dst, size / sizeof(jlong));
  } else if (bits % sizeof(jint) == 0) {
    Copy::conjoint_jints_atomic((jint*) src, (jint*) dst, size / sizeof(jint));
  } else if (bits % sizeof(jshort) == 0) {
--> Copy::conjoint_jshorts_atomic((jshort*) src, (jshort*) dst, size / sizeof(jshort));
  } else {
    // Not aligned, so no need to be atomic.
    Copy::conjoint_jbytes((void*) src, (void*) dst, size);
  }
}
JVM Frames-4

/root/openjdk/hotspot/src/share/vm/utilities/copy.hpp

static void conjoint_jshorts_atomic(jshort* from, jshort* to, size_t count)

gdb

代码语言:javascript复制
#0  Copy::conjoint_jshorts_atomic (from=0xf5a000ae, to=0xf5a02f00, count=58) at /root/openjdk/hotspot/src/share/vm/utilities/copy.hpp:134
#1  0x00007f905ae18b36 in Copy::conjoint_memory_atomic (from=0xf5a000ae, to=0xf5a02f00, size=116) at /root/openjdk/hotspot/src/share/vm/utilities/copy.cpp:49
#2  0x00007f905b3f84df in TypeArrayKlass::copy_array (this=0x100000210, s=0xf5a00000, src_pos=79, d=0xf5a02ef0, dst_pos=0, length=58, __the_thread__=0x7f905400b000)
    at /root/openjdk/hotspot/src/share/vm/oops/typeArrayKlass.cpp:155
#3  0x00007f905b070a47 in JVM_ArrayCopy (env=0x7f905400b210, ignored=0x7f905c8f86d8, src=0x7f905c8f86a8, src_pos=79, dst=0x7f905c8f86b8, dst_pos=0, length=58)
    at /root/openjdk/hotspot/src/share/vm/prims/jvm.cpp:310
#4  0x00007f90451cdb51 in ?? ()
#5  0x000000000000003a in ?? ()
#6  0x00000000f5a00000 in ?? ()
#7  0x00007f905c8f6aa8 in ?? ()
#8  0x00000000f5a02ef0 in ?? ()
#9  0x0000000000000004 in ?? ()
#10 0x0000000000000000 in ?? ()

源码

代码语言:javascript复制
  // jshorts,               conjoint, atomic on each jshort
  static void conjoint_jshorts_atomic(jshort* from, jshort* to, size_t count) {
    assert_params_ok(from, to, LogBytesPerShort);
    pd_conjoint_jshorts_atomic(from, to, count);
  }
JVM Frames-5

/root/openjdk/hotspot/src/os_cpu/linux_x86/vm/copy_linux_x86.inline.hpp:227

static void pd_conjoint_jshorts_atomic(jshort* from, jshort* to, size_t count)

gdb

代码语言:javascript复制
(gdb) bt
#0  Copy::pd_conjoint_jshorts_atomic (from=0xf5a000ae, to=0xf5a02f00, count=58) at /root/openjdk/hotspot/src/os_cpu/linux_x86/vm/copy_linux_x86.inline.hpp:227
#1  0x00007f905ae18cf0 in Copy::conjoint_jshorts_atomic (from=0xf5a000ae, to=0xf5a02f00, count=58) at /root/openjdk/hotspot/src/share/vm/utilities/copy.hpp:135
#2  0x00007f905ae18b36 in Copy::conjoint_memory_atomic (from=0xf5a000ae, to=0xf5a02f00, size=116) at /root/openjdk/hotspot/src/share/vm/utilities/copy.cpp:49
#3  0x00007f905b3f84df in TypeArrayKlass::copy_array (this=0x100000210, s=0xf5a00000, src_pos=79, d=0xf5a02ef0, dst_pos=0, length=58, __the_thread__=0x7f905400b000)
    at /root/openjdk/hotspot/src/share/vm/oops/typeArrayKlass.cpp:155
#4  0x00007f905b070a47 in JVM_ArrayCopy (env=0x7f905400b210, ignored=0x7f905c8f86d8, src=0x7f905c8f86a8, src_pos=79, dst=0x7f905c8f86b8, dst_pos=0, length=58)
    at /root/openjdk/hotspot/src/share/vm/prims/jvm.cpp:310
#5  0x00007f90451cdb51 in ?? ()
#6  0x000000000000003a in ?? ()
#7  0x00000000f5a00000 in ?? ()
#8  0x00007f905c8f6aa8 in ?? ()
#9  0x00000000f5a02ef0 in ?? ()
#10 0x0000000000000004 in ?? ()
#11 0x0000000000000000 in ?? ()

源码

代码语言:javascript复制
static void pd_conjoint_jshorts_atomic(jshort* from, jshort* to, size_t count) {
  _Copy_conjoint_jshorts_atomic(from, to, count);
}
JVM Frames-6(开始拷贝)

hotspot/src/os_cpu/linux_x86/vm/linux_x86_64.s

_Copy_conjoint_jshorts_atomic(from, to, count)

源码

代码语言:javascript复制
        .p2align 4,,15
    .type    _Copy_arrayof_conjoint_jshorts,@function
    .type    _Copy_conjoint_jshorts_atomic,@function
_Copy_arrayof_conjoint_jshorts:
_Copy_conjoint_jshorts_atomic:
        movq     %rdx,%r8             # word count
        shrq     $2,%rdx              # qword count
        cmpq     %rdi,%rsi
        leaq     -2(%rdi,%r8,2),%rax  # from   wcount*2 - 2
        jbe      acs_CopyRight
        cmpq     %rax,%rsi
        jbe      acs_CopyLeft 

gdb b _Copy_conjoint_jshorts_atomic 进入/root/openjdk/build/linux-x86_64-normal-server-slowdebug/jdk/lib/amd64/server/libjvm.so

layout regs:打开寄存器监控 disassemble:查看汇编代码si:单步i all-registers:打印所有寄存器i registers:打印寄存器

代码语言:javascript复制
关键寄存器

 rsi            0xf5a02f00   4120915712           ------> to
 rbx            0x3a     58
 rdx            0x3a     58                       ------> count
 rdi            0xf5a000ae     4120903854         ------> from
 rsp            0x7fbc3c7724c8   0x7fbc3c7724c8
 r9             0x3a     58
 r11            0x7fbc251cffb8   140446053236664
 r13            0x7fbc3c772708   140446445020936
 r15            0x7fbc3400b000   140446303039488
 eflags         0x202    [ IF ]
 ss             0x2b     43
 es             0x0    0
 gs             0x0    0
    
    
(gdb) disassemble
Dump of assembler code for function _Copy_conjoint_jshorts_atomic:
=> 0x00007fbc3afd3ef0 < 0>: mov    %rdx,%r8
   0x00007fbc3afd3ef3 < 3>: shr    $0x2,%rdx
   0x00007fbc3afd3ef7 < 7>: cmp    %rdi,%rsi
   0x00007fbc3afd3efa < 10>:    lea    -0x2(%rdi,%r8,2),%rax
   0x00007fbc3afd3eff < 15>:    jbe    0x7fbc3afd3f06 <acs_CopyRight>
   0x00007fbc3afd3f01 < 17>:    cmp    %rax,%rsi
   0x00007fbc3afd3f04 < 20>:    jbe    0x7fbc3afd3f84 <acs_CopyLeft>
End of assembler dump.

步骤分析:

汇编指令

说明

rdi(from)

rsi(to)

rdx(count)

r8

rax

eflags

0xf5a000ae

0xf5a02f00

58

0xf5a000ae

mov %rdx,%r8

0xf5a000ae

0xf5a02f00

58

58

0xf5a000ae

0x202 [ IF ]

shr $0x2,%rdx

rdx右移2位

0xf5a000ae

0xf5a02f00

14

58

0xf5a000ae

0x203 [ CF IF ]

cmp %rdi,%rsi

比较两个指针位置

0xf5a000ae

0xf5a02f00

14

58

0xf5a000ae

0x203 [ CF IF ]

lea -0x2(%rdi,%r8,2),%rax

rdi-2 (r8)*2

0xf5a000ae

0xf5a02f00

14

58

0xf5a00120

0x203 [ CF IF ]

jbe 0x7fbc3afd3f06 <acs_CopyRight>(CF=1 OR ZF=1跳转)

to在from的更高内存位,从右到做复制,避免覆盖

0xf5a000ae

0xf5a02f00

14

58

0xf5a00120

0x203 [ CF IF ]

cmp %rax,%rsi

0xf5a000ae

0xf5a02f00

14

58

0xf5a00120

0x202 [ IF ]

jbe 0x7fbc3afd3f84 <acs_CopyLeft>

0xf5a000ae

0xf5a02f00

14

58

0xf5a00120

0x202 [ IF ]

这里分acs_CopyRight和acs_CopyLeft是在内存拷贝时都要考虑的问题,保证复制目标可以顺利完成,不保证src一定不被覆盖。

例如:0x0000 保存6个字节拷贝到 0x0004的位置:

代码语言:javascript复制
0000  0001  0002  0003  0004  0005  0006  0007  0008  0009  0010
   a     b     c     d     e     f     g     
                           |     |     |
copy                       |     |     | 
                           |     |     |
                           a     b     c     d     e     f     g 
                           ( 被  覆  盖 )

如果从左到右先拷贝a就会破坏原数据e、f、g,所以在拷贝需要从右到做复制。

acs_CopyRight函数实现了具体的拷贝动作,基本都是寄存器mov操作,有兴趣可以看下hotspot/src/os_cpu/linux_x86/vm/linux_x86_64.s:171。

汇编运算速查:https://en.m.wikibooks.org/wiki/X86_Assembly/GAS_Syntax movl -8(�p,�x,4),�x # Full example: load *(ebp (edx * 4) - 8) into eax movl -4(�p),�x # Typical example: load a stack variable into eax movl (�x),�x # No index: copy the target of a pointer into a register leal 8(,�x,4),�x # Arithmetic: multiply eax by 4 and add 8 leal (�x,�x,2),�x # Arithmetic: multiply eax by 2 and add edx

总结下:Linux上OPENJDK使用汇编直接操作内存、寄存器来实现内存拷贝,避免了冗余的调用关系。数组拷贝可以尽量使用System.arraycopy发挥JDK极限性能。

3 序列化

ArrayList的序列化是比较有意思的,数据存储的变量被修饰transient,理论上直接序列化是不会编码这个变量的

transientObject[] elementData; // non-private to simplify nested class access

代码里面在开始序列化时,ObjectOutputStream会回调writeObject函数写数据(注意这个函数是没有继承关系的,直接按名字回调过来)

代码语言:javascript复制
 private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException{
        // Write out element count, and any hidden stuff
        int expectedModCount = modCount;
        s.defaultWriteObject();

        // Write out size as capacity for behavioural compatibility with clone()
        s.writeInt(size);elementData更像是ArrayList的缓冲区,空间往往比实际使用大,所以使用这种方式序列化避免空间浪费。



        // Write out all elements in the proper order.
        for (int i=0; i<size; i  ) {
            s.writeObject(elementData[i]);
        }

        if (modCount != expectedModCount) {
            throw new ConcurrentModificationException();
        }
    }
代码语言:javascript复制
 /**
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer. Any
     * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
     * will be expanded to DEFAULT_CAPACITY when the first element is added.
     */
    transient Object[] elementData; // non-private to simplify nested class access

0 人点赞