I was so curious how would JavaScript engine would optimize even check. Would it use modulo as normal, which is very slow, or it can be smart enough to use bitwise operator 🤔 . Due to its dynamic behavior, variable can be of any type and unknown to the compiler until run time.
First, let’s create a simple mod function.
function my_mod(n) {
if (n % 2 == 1) {
return true;
}
return false;
}
my_mod(2)
Now we need a way to check how V8 compile our code. Luckily, we can view bytecodes generated by V8 through node. We can use
node --v8-options
to show V8’s options that node allow passing through.
The following syntax for options is accepted (both '-' and '--' are ok):
--print-bytecode (print bytecode generated by ignition interpreter)
type: bool default: --no-print-bytecode
--print-bytecode-filter (filter for selecting which functions to print bytecode)
type: string default: --print-bytecode-filter="*"
There are --print-bytecode
and --print-bytecode-filter
option that allow us
to view generated bytecode without executing our main.js
file. The
--print-bytecode-filter
seems promising but I couldn’t make it to only print
our mode function. While --print-bytecode
option dump everything, including
node
’s runtime.
node --print-bytecode main.js > main.asmjs
Luckily, our function stay at the bottom of the file. So we can quickly remove everything at the top.
[generated bytecode for function: my_mod (0x00d0bcd5b401 <SharedFunctionInfo my_mod>)]
Bytecode length: 17
Parameter count 2
Register count 1
Frame size 8
23 S> 0x7087e4cdc80 @ 0 : 0b 03 Ldar a0
29 E> 0x7087e4cdc82 @ 2 : 4b 02 00 ModSmi [2], [0]
0x7087e4cdc85 @ 5 : c9 Star0
0x7087e4cdc86 @ 6 : 0d 01 LdaSmi [1]
33 E> 0x7087e4cdc88 @ 8 : 6f f9 01 TestEqual r0, [1]
0x7087e4cdc8b @ 11 : 9e 04 JumpIfFalse [4] (0x7087e4cdc8f @ 15)
45 S> 0x7087e4cdc8d @ 13 : 11 LdaTrue
57 S> 0x7087e4cdc8e @ 14 : ae Return
64 S> 0x7087e4cdc8f @ 15 : 12 LdaFalse
77 S> 0x7087e4cdc90 @ 16 : ae Return
Constant pool (size = 0)
Handler Table (size = 0)
Source Position Table (size = 16)
0x07087e4cdc99 <Other heap object (TRUSTED_BYTE_ARRAY_TYPE)>
[generated bytecode for function: my_mod (0x00d0bcd5b401 <SharedFunctionInfo my_mod>)]
Bytecode length: 17
Parameter count 2
Register count 1
Frame size 8
23 S> 0x7087e4cdc80 @ 0 : 0b 03 Ldar a0
29 E> 0x7087e4cdc82 @ 2 : 4b 02 00 ModSmi [2], [0]
0x7087e4cdc85 @ 5 : c9 Star0
0x7087e4cdc86 @ 6 : 0d 01 LdaSmi [1]
33 E> 0x7087e4cdc88 @ 8 : 6f f9 01 TestEqual r0, [1]
0x7087e4cdc8b @ 11 : 9e 04 JumpIfFalse [4] (0x7087e4cdc8f @ 15)
45 S> 0x7087e4cdc8d @ 13 : 11 LdaTrue
57 S> 0x7087e4cdc8e @ 14 : ae Return
64 S> 0x7087e4cdc8f @ 15 : 12 LdaFalse
77 S> 0x7087e4cdc90 @ 16 : ae Return
Constant pool (size = 0)
Handler Table (size = 0)
Source Position Table (size = 16)
0x07087e4cdc99 <Other heap object (TRUSTED_BYTE_ARRAY_TYPE)>
We can see it generate ModSmi [2], [0]
and TestEqual r0, [1]
bytecode,
identical to our JavaScript code, calculate modulo then compare with 1.
As each V8 bytecode later on will be interpreted as C++ functions, I don’t
think we can map it 1:1 to assembly instructions.
But I don’t stop here and make an assumption. I will dive until I hit the bottom or I’m out of breath. For this, we need the real V8, the only one who can give us all information we need.
At this point, I’m so grateful to my pass self for writing this blog.
For this, we need a x86_64.debug build. And we also need to update
our main.js
script a little in order to make turbofan optimize
my_mod
function. We either have to make unoptimized calls until
feedback collection kicks in on its own or, do a trick to force
turbofan to optimize at our will. I’ll show you the trick.
function my_mod(n) {
if (n % 2 == 1) {
return true;
}
return false;
}
my_mod(0);
my_mod(1);
my_mod(2);
%PrepareFunctionForOptimization(my_mod);
%OptimizeFunctionOnNextCall(my_mod);
my_mod(2);
./out/x64.debug/d8 --allow-natives-syntax --print-opt-code main-opt.js
We get following output.
--- Raw source ---
(n) {
if (n % 2 == 1)
return true;
return false;
}
--- Optimized code ---
optimization_id = 0
source_position = 340
kind = TURBOFAN_JS
name = my_mod
compiler = turbofan
address = 0x176e001401a1
Instructions (size = 176)
0x55a9e0000040 0 push rbp
0x55a9e0000041 1 REX.W movq rbp,rsp
0x55a9e0000044 4 push rsi
0x55a9e0000045 5 push rdi
0x55a9e0000046 6 push rax
0x55a9e0000047 7 REX.W subq rsp,0x8
0x55a9e000004b b REX.W movq [rbp-0x20],rsi
0x55a9e000004f f REX.W cmpq rsp,[r13-0x60] (external value (StackGuard::address_of_jslimit()))
0x55a9e0000053 13 jna 0x55a9e00000b7 <+0x77>
0x55a9e0000059 19 REX.W movq rdx,[rbp+0x18] ;; rdx = n
0x55a9e000005d 1d testb rdx,0x1 ;; test bit rdx
0x55a9e0000060 20 jnz 0x55a9e00000e1 <+0xa1> ;; if 1, jump 0x55a9e00000e1
;; so this can jump to return true
0x55a9e0000066 26 REX.W movq rcx,rdx ;; else rcx = rdx
0x55a9e0000069 29 sarl rcx, 1 ;; rcx >> 1 why? (keep sign bit)
0x55a9e000006b 2b testl rdx,rdx
0x55a9e000006d 2d jl 0x55a9e000007b <+0x3b>
0x55a9e0000073 33 andl rcx,0x1
0x55a9e0000076 36 jmp 0x55a9e000008a <+0x4a>
0x55a9e000007b 3b negl rcx
0x55a9e000007d 3d andl rcx,0x1
0x55a9e0000080 40 testl rcx,rcx
0x55a9e0000082 42 jz 0x55a9e00000e5 <+0xa5>
0x55a9e0000088 48 negl rcx
0x55a9e000008a 4a cmpl rcx,0x1
0x55a9e000008d 4d jz 0x55a9e00000b1 <+0x71>
0x55a9e0000093 53 REX.W leaq rax,[r14+0x55]
0x55a9e0000097 57 REX.W movq rcx,[rbp-0x18]
0x55a9e000009b 5b REX.W movq rsp,rbp
0x55a9e000009e 5e pop rbp
0x55a9e000009f 5f REX.W cmpq rcx,0x2
0x55a9e00000a3 63 jg 0x55a9e00000a8 <+0x68>
0x55a9e00000a5 65 ret 0x10
0x55a9e00000a8 68 pop r10
0x55a9e00000aa 6a REX.W leaq rsp,[rsp+rcx*8]
0x55a9e00000ae 6e push r10
0x55a9e00000b0 70 retl
0x55a9e00000b1 71 REX.W leaq rax,[r14+0x71]
0x55a9e00000b5 75 jmp 0x55a9e0000097 <+0x57>
0x55a9e00000b7 77 movl rdx,0x40
0x55a9e00000bc 7c push rdx
0x55a9e00000bd 7d REX.W movq rbx,0x55a9c75f4000
0x55a9e00000c7 87 movl rax,0x1
0x55a9e00000cc 8c REX.W movq rsi,0x7df500181ae5 ;; object: 0x7df500181ae5 <NativeContext[302]>
0x55a9e00000d6 96 call 0x55a9c846a300 (CEntry_Return1_ArgvOnStack_NoBuiltinExit) ;; near builtin entry
0x55a9e00000db 9b jmp 0x55a9e0000059 <+0x19>
0x55a9e00000e0 a0 nop
0x55a9e00000e1 a1 call [r13-0x28] ;; stack pointer - 0x28
0x55a9e00000e5 a5 call [r13-0x28] ;; stack pointer - 0x28
0x55a9e00000e9 a9 call [r13-0x20] ;; stack pointer - 0x28
0x55a9e00000ed ad nop
Inlined functions (count = 0)
Deoptimization Input Data (deopt points = 3)
index bytecode-offset pc
0 2 NA
1 2 NA
2 -1 9b
Safepoints (stack slots = 6, entries = 1, byte size = 16)
0x55a9e00000db 9b slots (sp->fp): 100000 deopt 2 trampoline: a9
RelocInfo (size = 5)
0x55a9e00000ce full embedded object (0x7df500181ae5 <NativeContext[12e]>)
0x55a9e00000d7 near builtin entry
The code from 0x55a9e0000040
enter function call, backup stack pointer and
current registers on stack.
The code between 0x55a9e000004f
and 0x55a9e00000d6
is optimized code of
our function. Let’s see what it has become.
REX.W movq rdx,[rbp+0x18]
move data (64-bit) from [rbp+0x18]
on stack
to rdx
. This is our n
variable.
testb rdx,0x1
, check last bit of rdx
is set. Which we concerned about.