Debug V8 part 2

In previous part, we have learnt how to debug d8 with lldb. In this part we still use our debug.js script. But we’ll add an other parameter to d8 before running process.

lldb -- d8 --parse-only debug.js

With parse-only flag, we will forcus on how d8 parse source content and turn into lexical input elements.

Let’s add a breakpoint in v8::Shell::RunMain function and jump to it.

(lldb) b v8::Shell::RunMain
(lldb) run
frame #0: 0x000000010004ba68 d8`v8::Shell::RunMain(isolate=0x0000000118110000, last_run=true) at d8.cc:4578:12
   4575	}
   4576
   4577	int Shell::RunMain(Isolate* isolate, bool last_run) {
-> 4578	  for (int i = 1; i < options.num_isolates; ++i) {
   4579	    options.isolate_sources[i].StartExecuteInThread();
   4580	  }
   4581	  bool success = true;

At line 4578, we will loop through different isolate sources and execute them in threads. But as our options.num_isolates = 1, we don’t need to worry about multi-threads at this point.

4595
4596
4597
4598
4599
4600
4601
    {
      Context::Scope cscope(context);
      InspectorClient inspector_client(context, options.enable_inspector);
      PerIsolateData::RealmScope realm_scope(PerIsolateData::Get(isolate));
      if (!options.isolate_sources[0].Execute(isolate)) success = false;
      if (!CompleteMessageLoop(isolate)) success = false;
    }

Let’s jump into Execute(isolate) function.

3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
    // Use all other arguments as names of files to load and run.
    HandleScope handle_scope(isolate);
    Local<String> file_name =
        String::NewFromUtf8(isolate, arg).ToLocalChecked();
    Local<String> source = Shell::ReadFile(isolate, arg);
    if (source.IsEmpty()) {
      printf("Error reading '%s'\n", arg);
      base::OS::ExitProcess(1);
    }
    Shell::set_script_executed();
    Shell::update_script_size(source->Length());
    if (!Shell::ExecuteString(isolate, source, file_name, Shell::kNoPrintResult,
                              Shell::kReportExceptions,
                              Shell::kProcessMessageQueue)) {
      success = false;
      break;
    }

Ource debug.js will be read into Local<String> source variable. So let take a look at v8::Shell::ReadFile.

b v8::Shell::ReadFile
c

3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
Local<String> Shell::ReadFile(Isolate* isolate, const char* name,
                              bool should_throw) {
  std::unique_ptr<base::OS::MemoryMappedFile> file(
      base::OS::MemoryMappedFile::open(
          name, base::OS::MemoryMappedFile::FileMode::kReadOnly));
  if (!file) {
    if (should_throw) {
      std::ostringstream oss;
      oss << "Error loading file: \"" << name << '"';
      isolate->ThrowError(
          v8::String::NewFromUtf8(isolate, oss.str().c_str()).ToLocalChecked());
    }
    return Local<String>();
  }

  int size = static_cast<int>(file->size());
  char* chars = static_cast<char*>(file->memory());
  Local<String> result;
  if (i::FLAG_use_external_strings && i::String::IsAscii(chars, size)) {
    String::ExternalOneByteStringResource* resource =
        new ExternalOwningOneByteStringResource(std::move(file));
    result = String::NewExternalOneByte(isolate, resource).ToLocalChecked();
  } else {
    result = String::NewFromUtf8(isolate, chars, NewStringType::kNormal, size)
                 .ToLocalChecked();
  }
  return result;
}

629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
OS::MemoryMappedFile* OS::MemoryMappedFile::open(const char* name,
                                                 FileMode mode) {
  const char* fopen_mode = (mode == FileMode::kReadOnly) ? "r" : "r+";
  struct stat statbuf;
  // Make sure path exists and is not a directory.
  if (stat(name, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode)) {
    if (FILE* file = fopen(name, fopen_mode)) {
      if (fseek(file, 0, SEEK_END) == 0) {
        long size = ftell(file);  // NOLINT(runtime/int)
        if (size == 0) return new PosixMemoryMappedFile(file, nullptr, 0);
        if (size > 0) {
          int prot = PROT_READ;
          int flags = MAP_PRIVATE;
          if (mode == FileMode::kReadWrite) {
            prot |= PROT_WRITE;
            flags = MAP_SHARED;
          }
          void* const memory =
              mmap(OS::GetRandomMmapAddr(), size, prot, flags, fileno(file), 0);
          if (memory != MAP_FAILED) {
            return new PosixMemoryMappedFile(file, memory, size);
          }
        }
      }
      fclose(file);
    }
  }
  return nullptr;
}

If you feel overwhelmed by above functions, don’t worry, I did too. But focus on highlighted lines, I added the functions because they are so interesting. Source file can be very small as our debug.js, or very large, I actually don’t know if ECMAScript has a spec for max source size or not. But anyway, V8 scanner will scan source code token by token, so loading the whole file into memory is not efficient. And there will be tons of small read, seek operations on source file. Even though kernel code is very fast to do such operations, we can still improve by using mmap You can take a break on this blog, go checkout explaination video of mmap and then come back.
(*) Disclaimer: At the time of writing this part, I also discover mmap. So my explaination can be incorrect.

Hey, you’re back. So you know what’s nice about mmap. Hence, V8 only seeks for the end of file position to know file size and then maps source file to a address in V8 virtual memory space. Run thread step-out or f to jump up to Execute function. Now we can check if source informaiton is handled correctly.

(lldb) p arg
(const char *) $48 = 0x000000016fdff555 "debug.js"
(lldb) p source->Length()
(int) $49 = 55

File name is correct, and content length looks good.

Next, v8::Shell::ExecuteString function call will execute our source script. We set a breakpoint at that function and continue process.

(lldb) b v8::Shell::ExecuteString
Breakpoint 1: where = d8`v8::Shell::ExecuteString(v8::Isolate*, v8::Local<v8::String>, v8::Local<v8::String>, v8::Shell::PrintResult, v8::Shell::ReportExceptions, v8::Shell::ProcessMessageQueue) + 140 at d8.cc:685:15, address = 0x000000010002b2bc
c
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x000000010002b2bc d8`v8::Shell::ExecuteString(isolate=0x0000000118110000, source=(val_ = 0x0000000102020060), name=(val_ = 0x0000000102020058), print_result=kNoPrintResult, report_exceptions=kReportExceptions, process_message_queue=kProcessMessageQueue) at d8.cc:685:15
   682 	                          Local<String> name, PrintResult print_result,
   683 	                          ReportExceptions report_exceptions,
   684 	                          ProcessMessageQueue process_message_queue) {
-> 685 	  i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
   686 	  if (i::FLAG_parse_only) {
   687 	    i::VMState<PARSER> state(i_isolate);
   688 	    i::Handle<i::String> str = Utils::OpenHandle(*(source));
Target 0: (d8) stopped.

In this function, d8 opens the source and parse script inside it.

703
704
705
706
707
708
709
710
711
712
713
714
715
716
    i::ParseInfo parse_info(i_isolate, flags, &compile_state, &reusable_state);

    i::Handle<i::Script> script = parse_info.CreateScript(
        i_isolate, str, i::kNullMaybeHandle, ScriptOriginOptions());
    if (!i::parsing::ParseProgram(&parse_info, script, i_isolate,
                                  i::parsing::ReportStatisticsMode::kYes)) {
      parse_info.pending_error_handler()->PrepareErrors(
          i_isolate, parse_info.ast_value_factory());
      parse_info.pending_error_handler()->ReportErrors(i_isolate, script);

      fprintf(stderr, "Failed parsing\n");
      return false;
    }
    return true;

Hopefully now you are more familiar with lldb step movements like step over, step into, adding breakpoint and continue to next breakpoint. I will skip lldb commands like run, breakpoint set, continue, thread step-in, thread step-out.

Now we take a look at v8::internal::parsing::ParseProgram function. Because it is short, I put the whole function code here.

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
bool ParseProgram(ParseInfo* info, Handle<Script> script,
                  MaybeHandle<ScopeInfo> maybe_outer_scope_info,
                  Isolate* isolate, ReportStatisticsMode mode) {
  DCHECK(info->flags().is_toplevel());
  DCHECK_NULL(info->literal());

  VMState<PARSER> state(isolate);

  // Create a character stream for the parser.
  Handle<String> source(String::cast(script->source()), isolate);
  isolate->counters()->total_parse_size()->Increment(source->length());
  std::unique_ptr<Utf16CharacterStream> stream(
      ScannerStream::For(isolate, source));
  info->set_character_stream(std::move(stream));

  Parser parser(isolate->main_thread_local_isolate(), info, script);

  // Ok to use Isolate here; this function is only called in the main thread.
  DCHECK(parser.parsing_on_main_thread_);
  parser.ParseProgram(isolate, script, info, maybe_outer_scope_info);
  MaybeReportStatistics(info, script, isolate, &parser, mode);
  return info->literal() != nullptr;
}

As you can see, we read source code by a character stream, but at this point, our stream is still empty. Let’s step into parser.ParseProgram call.

536
537
538
539
540
541
542
543
544
545
546
547
548
  // Initialize parser state.
  DeserializeScopeChain(isolate, info, maybe_outer_scope_info,
                        Scope::DeserializationMode::kIncludingVariables);

  DCHECK_EQ(script->is_wrapped(), info->is_wrapped_as_function());
  if (script->is_wrapped()) {
    maybe_wrapped_arguments_ = handle(script->wrapped_arguments(), isolate);
  }

  scanner_.Initialize();
  FunctionLiteral* result = DoParseProgram(isolate, info);
  MaybeProcessSourceRanges(info, result, stack_limit_);
  PostProcessParseResult(isolate, info, result);

After a few internal checks, the scanner start initializing. This is where our code being translated to Javascript lexical elements. The structure of scanner source stream is documented in header file.

190
191
192
193
194
195
196
197
198
199
200
201
202
Fields describing the location of the current buffer physically in memory,
and semantically within the source string.

                 0              buffer_pos_   pos()
                 |                        |   |
                 v________________________v___v_____________
                 |                        |        |        |
  Source string: |                        | Buffer |        |
                 |________________________|________|________|
                                          ^   ^    ^
                                          |   |    |
                  Pointers:   buffer_start_   |    buffer_end_
                                        buffer_cursor_

Let’s checkout what is in current buffer.

(lldb) p scanner()->source_->buffer_start_
(const uint16_t *) $40 = 0x0000000102016032
(lldb) x/s 0x0000000102016032
0x102016032: "l"

We got our first character of source file ( l in let keywork). Because we knew source length is 55. We can print out the whole file.

(lldb) x/55s 0x0000000102016032
0x102016032: "l"
0x102016034: "e"
0x102016036: "t"
0x102016038: " "
0x10201603a: "a"
0x10201603c: " "
0x10201603e: "="
0x102016040: " "
0x102016042: "1"
0x102016044: "\n"
0x102016046: "v"
0x102016048: "a"
0x10201604a: "r"
0x10201604c: " "
0x10201604e: "b"
0x102016050: " "
0x102016052: "="
0x102016054: " "
0x102016056: "2"
0x102016058: "\n"
0x10201605a: "c"
0x10201605c: "o"
0x10201605e: "n"
0x102016060: "s"
0x102016062: "t"
0x102016064: " "
0x102016066: "c"
0x102016068: " "
0x10201606a: "="
0x10201606c: " "
0x10201606e: "3"
0x102016070: "\n"
0x102016072: "c"
0x102016074: "o"
0x102016076: "n"
0x102016078: "s"
0x10201607a: "o"
0x10201607c: "l"
0x10201607e: "e"
0x102016080: "."
0x102016082: "l"
0x102016084: "o"
0x102016086: "g"
0x102016088: "("
0x10201608a: "a"
0x10201608c: " "
0x10201608e: "+"
0x102016090: " "
0x102016092: "b"
0x102016094: " "
0x102016096: "+"
0x102016098: " "
0x10201609a: "c"
0x10201609c: ")"
0x10201609e: "\n"

That’s exact our script from very first part. Wow! I can’t believe we go this far. But it’s just the beginning. I’ll see you next time.