Recently I came acrosss a LinkedIn post where someone said: in order to improve isEven(n: number) performance, we can use bitwise operator instead of modulo. At a glance I totally agree with that oppinion.

I was so curious how would JavaScript engine would optimize even check. Would it use modulo as normal, which is very slow, or it can be smart enough to use bitwise operator 🤔 . Due to its dynamic behavior, variable can be of any type and unknown to the compiler until run time.

First, let’s create a simple mod function.

function my_mod(n) {
  if (n % 2 == 1) {
    return true;
  }
  return false;
}
my_mod(2)

Now we need a way to check how V8 compile our code. Luckily, we can view bytecodes generated by V8 through node. We can use

node --v8-options

to show V8’s options that node allow passing through.

The following syntax for options is accepted (both '-' and '--' are ok):
  --print-bytecode (print bytecode generated by ignition interpreter)
        type: bool  default: --no-print-bytecode
  --print-bytecode-filter (filter for selecting which functions to print bytecode)
        type: string  default: --print-bytecode-filter="*"

There are --print-bytecode and --print-bytecode-filter option that allow us to view generated bytecode without executing our main.js file. The --print-bytecode-filter seems promising but I couldn’t make it to only print our mode function. While --print-bytecode option dump everything, including node’s runtime.

node --print-bytecode main.js > main.asmjs

Luckily, our function stay at the bottom of the file. So we can quickly remove everything at the top.

[generated bytecode for function: my_mod (0x00d0bcd5b401 <SharedFunctionInfo my_mod>)]
Bytecode length: 17
Parameter count 2
Register count 1
Frame size 8
   23 S> 0x7087e4cdc80 @    0 : 0b 03             Ldar a0
   29 E> 0x7087e4cdc82 @    2 : 4b 02 00          ModSmi [2], [0]
         0x7087e4cdc85 @    5 : c9                Star0
         0x7087e4cdc86 @    6 : 0d 01             LdaSmi [1]
   33 E> 0x7087e4cdc88 @    8 : 6f f9 01          TestEqual r0, [1]
         0x7087e4cdc8b @   11 : 9e 04             JumpIfFalse [4] (0x7087e4cdc8f @ 15)
   45 S> 0x7087e4cdc8d @   13 : 11                LdaTrue
   57 S> 0x7087e4cdc8e @   14 : ae                Return
   64 S> 0x7087e4cdc8f @   15 : 12                LdaFalse
   77 S> 0x7087e4cdc90 @   16 : ae                Return
Constant pool (size = 0)
Handler Table (size = 0)
Source Position Table (size = 16)
0x07087e4cdc99 <Other heap object (TRUSTED_BYTE_ARRAY_TYPE)>
[generated bytecode for function: my_mod (0x00d0bcd5b401 <SharedFunctionInfo my_mod>)]
Bytecode length: 17
Parameter count 2
Register count 1
Frame size 8
   23 S> 0x7087e4cdc80 @    0 : 0b 03             Ldar a0
   29 E> 0x7087e4cdc82 @    2 : 4b 02 00          ModSmi [2], [0]
         0x7087e4cdc85 @    5 : c9                Star0
         0x7087e4cdc86 @    6 : 0d 01             LdaSmi [1]
   33 E> 0x7087e4cdc88 @    8 : 6f f9 01          TestEqual r0, [1]
         0x7087e4cdc8b @   11 : 9e 04             JumpIfFalse [4] (0x7087e4cdc8f @ 15)
   45 S> 0x7087e4cdc8d @   13 : 11                LdaTrue
   57 S> 0x7087e4cdc8e @   14 : ae                Return
   64 S> 0x7087e4cdc8f @   15 : 12                LdaFalse
   77 S> 0x7087e4cdc90 @   16 : ae                Return
Constant pool (size = 0)
Handler Table (size = 0)
Source Position Table (size = 16)
0x07087e4cdc99 <Other heap object (TRUSTED_BYTE_ARRAY_TYPE)>

We can see it generate ModSmi [2], [0] and TestEqual r0, [1] bytecode, identical to our JavaScript code, calculate modulo then compare with 1. As each V8 bytecode later on will be interpreted as C++ functions, I don’t think we can map it 1:1 to assembly instructions.

But I don’t stop here and make an assumption. I will dive until I hit the bottom or I’m out of breath. For this, we need the real V8, the only one who can give us all information we need.

To my pass self

At this point, I’m so grateful to my pass self for writing this blog.

Build V8 from source

For this, we need a x86_64.debug build. And we also need to update our main.js script a little in order to make turbofan optimize my_mod function. We either have to make unoptimized calls until feedback collection kicks in on its own or, do a trick to force turbofan to optimize at our will. I’ll show you the trick.

function my_mod(n) {
  if (n % 2 == 1) {
    return true;
  }
  return false;
}

my_mod(0);
my_mod(1);
my_mod(2);

%PrepareFunctionForOptimization(my_mod);
%OptimizeFunctionOnNextCall(my_mod);

my_mod(2);
./out/x64.debug/d8 --allow-natives-syntax --print-opt-code main-opt.js

We get following output.

--- Raw source ---
(n) {
 if (n  % 2 == 1)
   return true;
 return false;
}


--- Optimized code ---
optimization_id = 0
source_position = 340
kind = TURBOFAN_JS
name = my_mod
compiler = turbofan
address = 0x176e001401a1

Instructions (size = 176)
0x55a9e0000040     0   push rbp
0x55a9e0000041     1   REX.W movq rbp,rsp
0x55a9e0000044     4   push rsi
0x55a9e0000045     5   push rdi
0x55a9e0000046     6   push rax
0x55a9e0000047     7   REX.W subq rsp,0x8
0x55a9e000004b     b   REX.W movq [rbp-0x20],rsi
0x55a9e000004f     f   REX.W cmpq rsp,[r13-0x60] (external value (StackGuard::address_of_jslimit()))
0x55a9e0000053    13   jna 0x55a9e00000b7  <+0x77>
0x55a9e0000059    19   REX.W movq rdx,[rbp+0x18]   ;; rdx = n
0x55a9e000005d    1d   testb rdx,0x1               ;; test bit rdx
0x55a9e0000060    20   jnz 0x55a9e00000e1  <+0xa1> ;; if 1, jump 0x55a9e00000e1
                                                   ;; so this can jump to return true
0x55a9e0000066    26   REX.W movq rcx,rdx          ;; else rcx = rdx
0x55a9e0000069    29   sarl rcx, 1                 ;; rcx >> 1 why? (keep sign bit)
0x55a9e000006b    2b   testl rdx,rdx
0x55a9e000006d    2d   jl 0x55a9e000007b  <+0x3b>
0x55a9e0000073    33   andl rcx,0x1
0x55a9e0000076    36   jmp 0x55a9e000008a  <+0x4a>
0x55a9e000007b    3b   negl rcx
0x55a9e000007d    3d   andl rcx,0x1
0x55a9e0000080    40   testl rcx,rcx
0x55a9e0000082    42   jz 0x55a9e00000e5  <+0xa5>
0x55a9e0000088    48   negl rcx
0x55a9e000008a    4a   cmpl rcx,0x1
0x55a9e000008d    4d   jz 0x55a9e00000b1  <+0x71>
0x55a9e0000093    53   REX.W leaq rax,[r14+0x55]
0x55a9e0000097    57   REX.W movq rcx,[rbp-0x18]
0x55a9e000009b    5b   REX.W movq rsp,rbp
0x55a9e000009e    5e   pop rbp
0x55a9e000009f    5f   REX.W cmpq rcx,0x2
0x55a9e00000a3    63   jg 0x55a9e00000a8  <+0x68>
0x55a9e00000a5    65   ret 0x10
0x55a9e00000a8    68   pop r10
0x55a9e00000aa    6a   REX.W leaq rsp,[rsp+rcx*8]
0x55a9e00000ae    6e   push r10
0x55a9e00000b0    70   retl
0x55a9e00000b1    71   REX.W leaq rax,[r14+0x71]
0x55a9e00000b5    75   jmp 0x55a9e0000097  <+0x57>
0x55a9e00000b7    77   movl rdx,0x40
0x55a9e00000bc    7c   push rdx
0x55a9e00000bd    7d   REX.W movq rbx,0x55a9c75f4000
0x55a9e00000c7    87   movl rax,0x1
0x55a9e00000cc    8c   REX.W movq rsi,0x7df500181ae5    ;; object: 0x7df500181ae5 <NativeContext[302]>
0x55a9e00000d6    96   call 0x55a9c846a300  (CEntry_Return1_ArgvOnStack_NoBuiltinExit)    ;; near builtin entry
0x55a9e00000db    9b   jmp 0x55a9e0000059  <+0x19>
0x55a9e00000e0    a0   nop
0x55a9e00000e1    a1   call [r13-0x28]  ;; stack pointer - 0x28
0x55a9e00000e5    a5   call [r13-0x28]  ;; stack pointer - 0x28
0x55a9e00000e9    a9   call [r13-0x20]  ;; stack pointer - 0x28
0x55a9e00000ed    ad   nop
Inlined functions (count = 0)

Deoptimization Input Data (deopt points = 3)
index  bytecode-offset    pc
    0                2    NA
    1                2    NA
    2               -1    9b

Safepoints (stack slots = 6, entries = 1, byte size = 16)
0x55a9e00000db     9b  slots (sp->fp): 100000  deopt      2 trampoline:     a9

RelocInfo (size = 5)
0x55a9e00000ce  full embedded object  (0x7df500181ae5 <NativeContext[12e]>)
0x55a9e00000d7  near builtin entry

The code from 0x55a9e0000040 enter function call, backup stack pointer and current registers on stack.

The code between 0x55a9e000004f and 0x55a9e00000d6 is optimized code of our function. Let’s see what it has become.

REX.W movq rdx,[rbp+0x18] move data (64-bit) from [rbp+0x18] on stack to rdx. This is our n variable.

testb rdx,0x1 , check last bit of rdx is set. Which we concerned about.