Almost every Java developer knows that programs written in the Java language are initially compiled into JVM bytecode and stored as class-files in a standardized format . After getting such class-files inside the virtual machine and until the compiler has yet to reach them, the JVM interprets the bytecode contained in these class-files. This article provides an overview of how the interpreter works with respect to the OpenJDK JVM HotSpot.
The content of the article:
- Environment
- Running java application
- Interpreter initialization and control transfer to java code
- Example
Environment
For experiments, we use the assembly of the latest available OpenJDK JDK12 revision with autoconf configuration
--enable-debug --with-native-debug-symbols=internal
on Ubuntu 18.04 / gcc 7.4.0.
--with-native-debug-symbols=internal
means that, when building the JDK, debazh symbols will be contained in the binaries themselves.
--enable-debug
- that the binary will contain additional debugging code.
Building JDK 12 in such an environment is not a complicated process. All I needed to do was install JDK11 ( to build JDK n, JDK n-1 is required ) and hand-deliver the necessary libraries autoconf signaled for. Next, run the command
bash configure --enable-debug --with-native-debug-symbols=internal && make CONF=fastdebug images
and after waiting a bit (on my laptop about 10 minutes), we get fastdebug assembly JDK 12.
In principle, it would be quite enough to simply install jdk from public repositories and additionally deliver the openjdk-xx-dbg package with debug symbols, where xx is the jdk version, but the fastdebug assembly provides debugging functions from gdb that can make life easier in some cases. At the moment, I am actively using ps () , a function for viewing Java stack traces from gdb, and pfl () , a function for analyzing the stack of frames (it is very convenient when debugging the interpreter in gdb).
Example ps () and pfl ()For example, consider the following gdb script
# java file /home/dmitrii/jdk12/build/linux-x86_64-server-fastdebug/images/jdk/bin/java # SEGV-, HotSpot # SEGV . #, https://hg.openjdk.java.net/jdk/jdk12/file/06222165c35f/src/hotspot/cpu/x86/vm_version_x86.cpp#l361 handle SIGSEGV nostop noprint set breakpoint pending on set pagination off # , # # java- public static void main(String args[]) b PostJVMInit thread 2 commands # , # set $buf = (char *) malloc(1000) # #( ) b *AbstractInterpreter::_entry_table[0] thread 2 commands # rbx. # Method* set $mthd = ((Method *) $rbx) # $buf call $mthd->name_and_sig_as_C_string($buf, 1000) # , public static void main(String args) if strcmp()("Main.main([Ljava/lang/String;)V", $buf) == 0 # , # ps/pfl #( ps/pfl) b InterpreterRuntime::build_method_counters(JavaThread*, Method*) commands # , # delete breakpoints call ps() call pfl() c end end c end c end r -cp /home/dmitrii/jdk12/ Main
The result of running such a script is:
"Executing ps" for thread: "main" #1 prio=5 os_prio=0 cpu=468,61ms elapsed=58,65s tid=0x00007ffff001b800 nid=0x5bfa runnable [0x00007ffff7fd9000] java.lang.Thread.State: RUNNABLE Thread: 0x00007ffff001b800 [0x5bfa] State: _running _has_called_back 0 _at_poll_safepoint 0 JavaThread state: _thread_in_Java 1 - frame( sp=0x00007ffff7fd9920, unextended_sp=0x00007ffff7fd9920, fp=0x00007ffff7fd9968, pc=0x00007fffd828748b) Main.main(Main.java:10) "Executing pfl" for thread: "main" #1 prio=5 os_prio=0 cpu=468,83ms elapsed=58,71s tid=0x00007ffff001b800 nid=0x5bfa runnable [0x00007ffff7fd9000] java.lang.Thread.State: RUNNABLE Thread: 0x00007ffff001b800 [0x5bfa] State: _running _has_called_back 0 _at_poll_safepoint 0 JavaThread state: _thread_in_Java [Describe stack layout] 0x00007ffff7fd99e0: 0x00007ffff7fd9b00 #2 entry frame call_stub word fp - 0 0x00007ffff7fd99d8: 0x00007ffff7fd9c10 call_stub word fp - 1 0x00007ffff7fd99d0: 0x00007fffd8287160 call_stub word fp - 2 0x00007ffff7fd99c8: 0x00007fffbf1fb3e0 call_stub word fp - 3 0x00007ffff7fd99c0: 0x000000000000000a call_stub word fp - 4 0x00007ffff7fd99b8: 0x00007ffff7fd9ce8 call_stub word fp - 5 0x00007ffff7fd99b0: 0x00007ffff7fd9a80 call_stub word fp - 6 0x00007ffff7fd99a8: 0x00007ffff001b800 call_stub word fp - 7 0x00007ffff7fd99a0: 0x00007ffff7fd9b40 call_stub word fp - 8 0x00007ffff7fd9998: 0x00007ffff7fd9c00 call_stub word fp - 9 0x00007ffff7fd9990: 0x00007ffff7fd9a80 call_stub word fp - 10 0x00007ffff7fd9988: 0x00007ffff7fd9ce0 call_stub word fp - 11 0x00007ffff7fd9980: 0x00007fff00001fa0 call_stub word fp - 12 0x00007ffff7fd9978: 0x0000000716a122b8 sp for #2 locals for #1 unextended_sp for #2 local 0 0x00007ffff7fd9970: 0x00007fffd82719f3 0x00007ffff7fd9968: 0x00007ffff7fd99e0 #1 method Main.main([Ljava/lang/String;)V @ 0 - 1 locals 1 max stack 0x00007ffff7fd9960: 0x00007ffff7fd9978 interpreter_frame_sender_sp 0x00007ffff7fd9958: 0x0000000000000000 interpreter_frame_last_sp 0x00007ffff7fd9950: 0x00007fffbf1fb3e0 interpreter_frame_method 0x00007ffff7fd9948: 0x0000000716a11c40 interpreter_frame_mirror 0x00007ffff7fd9940: 0x0000000000000000 interpreter_frame_mdp 0x00007ffff7fd9938: 0x00007fffbf1fb5e8 interpreter_frame_cache 0x00007ffff7fd9930: 0x00007ffff7fd9978 interpreter_frame_locals 0x00007ffff7fd9928: 0x00007fffbf1fb3d0 interpreter_frame_bcp 0x00007ffff7fd9920: 0x00007ffff7fd9920 sp for #1 interpreter_frame_initial_sp unextended_sp for #1
As you can see, in the case of ps()
we just get the call stack, in the case of pfl()
- the full stack organization.
Running java application
Before proceeding to the consideration of the interpreter directly, we will briefly review the actions that are performed before transferring control to the java code. For example, take a Java program that "does nothing at all":
public class Main { public static void main(String args[]){ } }
and try to figure out what happens when you run such an application:
javac Main.java && java Main
The first thing to do to answer this question is to find and look at the java binary - the one that we use to run all of our JVM applications. In my case, it is located along the path
/home/dmitrii/jdk12/build/linux-x86_64-server-fastdebug/images/jdk/bin/java
.
But in the end, there’s nothing special to watch. This is a binary which, together with debazhimy characters, takes up only 20KB and is compiled from only one source file launcher / main.c.
All that it does is take command line arguments (char * argv []), read the arguments from the JDK_JAVA_OPTIONS environment variable , do basic preprocessing and validation (for example, you cannot add a terminal option or Main class name to this environment variable) and call the function JLI_Launch with the resulting argument list.
The definition of the JLI_Launch function is not contained in the java binary and, if you look at its direct dependencies:
$ ldd java linux-vdso.so.1 (0x00007ffcc97ec000) libjli.so => /home/dmitrii/jdk12/build/linux-x86_64-server-fastdebug/images/jdk/bin/./../lib/libjli.so (0x00007ff27518d000) // <--------- libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff274d9c000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff274b7f000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff27497b000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff27475c000) /lib64/ld-linux-x86-64.so.2 (0x00007ff27559f000)
you can see libjli.so which is linked to it. This library contains launcher interface - a set of functions that java uses to initialize and start a virtual machine, among which there is JLI_Launch.
Full list of interface features $ objdump -T -j .text libjli.so libjli.so: file format elf64-x86-64 DYNAMIC SYMBOL TABLE: 0000000000009280 g DF .text 0000000000000038 Base JLI_List_add 0000000000003330 g DF .text 00000000000001c3 Base JLI_PreprocessArg 0000000000008180 g DF .text 0000000000000008 Base JLI_GetStdArgs 0000000000008190 g DF .text 0000000000000008 Base JLI_GetStdArgc 0000000000007e50 g DF .text 00000000000000b8 Base JLI_ReportErrorMessage 000000000000a400 g DF .text 00000000000000df Base JLI_ManifestIterate 0000000000002e70 g DF .text 0000000000000049 Base JLI_InitArgProcessing 0000000000008000 g DF .text 0000000000000011 Base JLI_ReportExceptionDescription 0000000000003500 g DF .text 0000000000000074 Base JLI_AddArgsFromEnvVar 0000000000007f10 g DF .text 00000000000000e9 Base JLI_ReportErrorMessageSys 0000000000005840 g DF .text 00000000000000b8 Base JLI_ReportMessage 0000000000009140 g DF .text 000000000000003a Base JLI_SetTraceLauncher 0000000000009020 g DF .text 000000000000000a Base JLI_MemFree 0000000000008f90 g DF .text 0000000000000026 Base JLI_MemAlloc 00000000000059c0 g DF .text 0000000000002013 Base JLI_Launch 00000000000091c0 g DF .text 000000000000003b Base JLI_List_new 0000000000008ff0 g DF .text 0000000000000026 Base JLI_StringDup 0000000000002ec0 g DF .text 000000000000000c Base JLI_GetAppArgIndex
After the transfer of control to JLI_Launch, a series of actions are required to start the JVM, such as:
I. Loading JVM HotSpot characters into memory and getting a function pointer to create a VM.
All JVM HotSpot code is located in the libjvm.so library. After determining the absolute path to libjvm.so, the library is loaded into memory and the pointer to the JNI_CreateJavaVM function is torn out of it . This function pointer is stored and subsequently used to create and initialize the virtual machine.
Obviously libjvm.so is not linked to libjli.so
II . Parsing arguments passed after preprocessing.
A function with the speaking name ParseArguments parses the arguments passed from the command line. This argument parser determines the application startup mode.
enum LaunchMode {
It also converts part of the arguments to the format -DpropertyName=propertyValue
, for example, -cp=/path
converted to -Djava.class.path=/path
. Further, such SystemProperty
are stored in the global array in the JVM HotSpot and forwarded to java.lang.System::props
in the first phase of initialization (In JDK12, the initialization mechanism of java.lang.System.props has been modified, more in this commit ).
Parsing arguments also discards some options that are not processed by the JVM (for example, --list-modules
, processing of this option takes place directly in launcher at this point ).
III . Fork a primordial thread and create a VM in it
But if something went wrong, an attempt is made to start the JVM in the main thread "just give it a try".
Having studied the question, I found one of the possible reasons why the JVM does not start in the main thread. The fact is that (at least on Linux) pthreads and the main thread work differently with the stack. The size of main-thread'a is limited by ulimit -s
, i.e. when setting an arbitrarily large value, we get an arbitrarily large stack. The main thread uses something similar to MAP_GROWSDOWN , but not MAP_GROWSDOWN
. Using MAP_GROWSDOWN
in its pure form is not safe and, if memory serves me right, is locked. On my machine, MAP_GROWSDOWN
does not add any effect. The difference between the main-thread mapping and the MAP_GROWSDOWN is that no other mmap
, with the exception of MAP_FIXED
, will be able to make a conflict with the area of possible stack expansion. All that is needed from the software is to set the corresponding rsp
value and then the OS will figure it out: And the page-fault will process and the guard will set . This difference affects a number of rakes: When determining the stack size of the current stream , when creating guard-pages
So, we will assume that at the moment we have successfully parsed the options and created a thread for the VM. After that, the just forked thread starts creating a virtual machine and enters the Threads :: create_vm function
In this function, a rather large number is made black magic initializations, we will be interested in only a few of them.
Initialization of the interpreter and transfer of control to the java code
For each instruction in the JVM HotSpot there is a specific machine code template for a specific architecture. When the interpreter starts executing an instruction, the first thing it looks for is the address of its template in the special DispatchTable table. Next, jump to the address of this template and after the execution of the instruction is completed, jvm takes out the address of the next instruction in order ) and starts to execute it in the same way, and so on. This behavior is observed with the interpreter only for instructions that do not "dispatch", for example, arithmetic instructions ( xsub
, xdiv
, etc, where x
- i
, l
, f
, d
). All they do is perform arithmetic operations.
In the case of procedure call instructions ( invokestatic
, invokevirtual
, etc.), the next instruction to be executed will be the first instruction in the called procedure. Such instructions themselves put down the address of the next bytecode-instruction to be executed in their template.
To ensure the operation of this machine in Threads::create_vm
, a number of initializations are performed on which the interpreter depends:
I. Initializing the table of available bytecodes
Before starting to initialize the interpreter, it is necessary to initialize the table of used bytecodes. It is executed in the Bytecodes :: initialize function and is presented as a very readable label. Its fragment is as follows:
In accordance with this table, for each bytecode its length is set (the size is always 1 byte, but there may also be an index in ConstantPool
, as well as wide bytecodes), name, bytecode and flags:
bool Bytecodes::_is_initialized = false; const char* Bytecodes::_name [Bytecodes::number_of_codes]; BasicType Bytecodes::_result_type [Bytecodes::number_of_codes]; s_char Bytecodes::_depth [Bytecodes::number_of_codes]; u_char Bytecodes::_lengths [Bytecodes::number_of_codes]; Bytecodes::Code Bytecodes::_java_code [Bytecodes::number_of_codes]; unsigned short Bytecodes::_flags [(1<<BitsPerByte)*2];
These parameters are further needed to generate interpreter template code.
II . Initialize Cache Code
In order to generate code for interpreter templates, you must first allocate memory for this business. Memory reservation for cache code is implemented in a function of the same name CodeCache :: initialize () . As can be seen from the following code section of this function
CodeCacheExpansionSize = align_up(CodeCacheExpansionSize, os::vm_page_size()); if (SegmentedCodeCache) {
the cache code is controlled by the options -XX:ReservedCodeCacheSize
, -XX:SegmentedCodeCache
, -XX:CodeCacheExpansionSize
, -XX:NonNMethodCodeHeapSize
, -XX:ProfiledCodeHeapSize
, -XX:NonProfiledCodeHeapSize
. A brief description of these options can be found on the links to which they lead. In addition to the command line, the values of some of these options are ergonomically adjusted, for example, if the SegmentedCodeCache
value is SegmentedCodeCache
by default (off), then with a code size >= 240Mb
, SegmentedCodeCache
will be included in CompilerConfig :: set_tiered_flags .
After performing checks, an area of size ReservedCodeCacheSize
bytes is ReservedCodeCacheSize
. If SegmentedCodeCache
turned out to be exposed, then this area is divided into parts: JIT-compiled methods, stab routines, etc.
III . Initialization of interpreter patterns
After the bytecode table and the cache code are initialized, you can proceed to the code generation of the interpreter templates. To do this, the interpreter reserves a buffer from the previously initialized cache code. At each stage of code generation, codelets — small sections of code — will be cut from the buffer . After completion of the current generation, the part of the codelet that is not used by the code is freed and becomes available for subsequent code generation.
Consider each of these steps individually:
{ CodeletMark cm(_masm, "slow signature handler"); AbstractInterpreter::_slow_signature_handler = generate_slow_signature_handler(); }
signature handler is used to prepare arguments for calls to native methods. In this case, a generic handler is generated if, for example, the native method has more than 13 arguments (I did not check it in the debugger, but judging by the code it should be like this)
{ CodeletMark cm(_masm, "error exits"); _unimplemented_bytecode = generate_error_exit("unimplemented bytecode"); _illegal_bytecode_sequence = generate_error_exit("illegal bytecode sequence - method not verified"); }
The VM validates classfiles during initialization, but this is in case the arguments on the stack are not in the format that is needed or the bytecode that the VM does not know about. These stubs are used when generating template code for each of the bytecodes.
After calling the procedures, it is necessary to restore the data of the frame stack, which was before the call of the procedure from which return is made.
Used when calling runtime from an interpreter.
Throwing exceptions
Method Entry Points
#define method_entry(kind) \ { CodeletMark cm(_masm, "method entry point (kind = " #kind ")"); \ Interpreter::_entry_table[Interpreter::kind] = generate_method_entry(Interpreter::kind); \ Interpreter::update_cds_entry_table(Interpreter::kind); \ }
Presented as a macro depending on the type of method. In the general case, the preparation of the interpreted stack frame is performed, StackOverflow check, stack-banging. For native methods, a signature handler is defined.
Bytecode Template Generation
The VM specification requires operands to be in Operand Stack to execute the instruction, but this does not prevent HotSpot from caching them in the register. To determine the current state of the top of the stack, an enumeration is used .
enum TosState {
Each instruction defines the input and output states of the TosState
top of the stack, and the generation of patterns occurs depending on this state. These templates are initialized in a readable template table . A fragment of this table is as follows:
We will be especially interested in
, out
and generator
columns.
in
- the state of the top of the stack at the time the instruction started
out
- state of the top of the stack at the time of completion of the instruction
generator
- generator of machine instruction code template
The general view of the template for all bytecodes can be described as:
If dispatch bit is not set for the instruction, then the instruction prolog is executed (no-op on x86)
Using generator
, machine code is generated
If the dispatch bit is not set for the instruction, then the transition to the next order in order is performed depending on the out
state of the stack top, which will be in
for the next instruction
The entry point address for the resulting template is stored in the global table and can be used for debugging.
In HotSpot, the following relatively dumb piece of code is responsible for this:
Instruction code generator void TemplateInterpreterGenerator::set_entry_points(Bytecodes::Code code) { CodeletMark cm(_masm, Bytecodes::name(code), code);
, . JVM. Java- . JavaCalls . JVM , main .
Example
, , :
public class Sum{ public static void sum(int a, int b){ return a + b; } } public class Main { public static void main(String args[]){ Sum.sum(2, 3); } }
Sum.sum(II)
.
2 javac -c *.java
, .
Sum.sum
:
descriptor: (II)I flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=2, args_size=2 0: iload_0 1: iload_1 2: iadd 3: ireturn LineNumberTable: line 3: 0
Main.main
descriptor: ([Ljava/lang/String;)V flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: iconst_2 1: iconst_3 2: invokestatic #2 // Method Sum.sum:(II)I 5: pop 6: return LineNumberTable: line 13: 0 line 14: 6
, — .
invokestatic
' x86 - HotSpot
void TemplateTable::invokestatic(int byte_no) { transition(vtos, vtos); assert(byte_no == f1_byte, "use this argument"); prepare_invoke(byte_no, rbx);
byte_no == f1_byte
— ConstantPoolCache
, , rbx
— , Method *
. : , , ( method_entry
).
prepare_invoke
. , invokestatic
ConstantPool
Constant_Methodref_Info
. HotSpot . 2 .. ConstantPoolCache
. ConstantPoolCache
, (, ConstantPoolCacheEntry
, ). ConstantPoolCacheEntry
, ( 0) / . , ConstantPool
, ConstantPoolCache
( x86 Little Endian).
, , HotSpot prepare_invoke
— ConstantPoolCache
. , , ConstantPoolCacheEntry
__ get_cache_and_index_and_bytecode_at_bcp(Rcache, index, temp, byte_no, 1, index_size); __ cmpl(temp, code);
, InterpreterRuntime::resolve_from_cache
.
receiver'a , . (, , , ConstantPoolCache
<clinit>
, ). define class, EagerInitialization
( , , :)). HotSpot ( CDS ) .
, , ConstantPoolCacheEntry
. Method *
rbx
, , .
Sum.sum(2, 3)
. gdb-script sum.gdb
:
# java file /home/dmitrii/jdk12/build/linux-x86_64-server-fastdebug/images/jdk/bin/java # gdb SEGV' #, https://hg.openjdk.java.net/jdk/jdk12/file/06222165c35f/src/hotspot/cpu/x86/vm_version_x86.cpp#l361 handle SIGSEGV nostop noprint # set breakpoint pending on # , # set pagination off # main b PostJVMInit commands # , # set $buffer = malloc(1000) # . #jmp # invokestatic b *AbstractInterpreter::_entry_table[0] thread 2 commands # invokestatic, # Method* rbx set $mthd = (Method *) $rbx # $buffer call $mthd->name_and_sig_as_C_string($buffer, 1000) if strcmp()($buffer, "Sum.sum(II)I") == 0 # iload_0, b *TemplateInterpreter::_normal_table._table[vtos][26] thread 2 # iload_1, - int, # iload_0 b *TemplateInterpreter::_normal_table._table[itos][27] thread 2 # iadd b *TemplateInterpreter::_normal_table._table[itos][96] thread 2 end c end c end r -cp . Main
gdb -x sum.gdb
, Sum.sum
$453 = 0x7ffff7fdcdd0 "Sum.sum(II)I"
layout asm
, , generate_normal_entry . -, StackOverflow, stack-banging dispatch iload_0
. :
0x7fffd828fa1f mov eax,DWORD PTR [r14] ;, iload_0 0x7fffd828fa22 movzx ebx,BYTE PTR [r13+0x1] ; 0x7fffd828fa27 inc r13 ; bcp (byte code pointer) 0x7fffd828fa2a movabs r10,0x7ffff717e8a0 ; DispatchTable 0x7fffd828fa34 jmp QWORD PTR [r10+rbx*8] ;jump
rax
,
0x7fffd828fabe push rax ; ; , 0x7fffd828fabf mov eax,DWORD PTR [r14-0x8] 0x7fffd828fac3 movzx ebx,BYTE PTR [r13+0x1] 0x7fffd828fac8 inc r13 0x7fffd828facb movabs r10,0x7ffff717e8a0 0x7fffd828fad5 jmp QWORD PTR [r10+rbx*8]
iadd
:
0x7fffd8292ba7 mov edx,DWORD PTR [rsp] ; , iload_1 0x7fffd8292baa add rsp,0x8 ; rsp 0x7fffd8292bae add eax,edx ; 0x7fffd8292bb0 movzx ebx,BYTE PTR [r13+0x1] 0x7fffd8292bb5 inc r13 0x7fffd8292bb8 movabs r10,0x7ffff717e8a0 0x7fffd8292bc2 jmp QWORD PTR [r10+rbx*8]
gdb
eax
edx
,
(gdb) p $eax $457 = 3 (gdb) p $edx $458 = 2
, Sum.sum
.