First of all, we are trying to improve Node.js servers use CPU-intensive JavaScript operations. The Node.js built-in asynchronous I/O operations are more efficient than Workers or Child Processes can be.
Now let’s determine what kind of parallelization we need for our server. Does it need to share memory? Does it need to share a port? Or does it do different things in isolation? Those questions will help us pick the right parallelization module for the task.
child_process Link to heading
The
node:child_process
module provides the ability to spawn subprocesses in a manner that is similar, but not identical, topopen(3)
. This capability is primarily provided by thechild_process.spawn()
function:
import { spawn } from 'node:child_process';
const ls = spawn('ls', ['-lh', '/usr']);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
It allows you to execute external commands or running a
Node.js script (with fork()
).
Imagine a movie theater sells drinks, popcorn and gifts from other vendors.
child_process
will be useful when we need to call external
process, or a CPU-intensive script from current process.cluster Link to heading
Clusters of Node.js processes can be used to run multiple instances of Node.js that can distribute workloads among their application threads. When process isolation is not needed, use the
worker_threads
module instead, which allows running multiple application threads within a single Node.js instance.The cluster module allows easy creation of child processes that all share server ports.
The cluster
module is built on top of
child_process.fork()
. Its main purpose is to create
network applications that can utilize all available CPU
cores on a machine.
import cluster from 'node:cluster';
import http from 'node:http';
import { availableParallelism } from 'node:os';
import process from 'node:process';
const numCPUs = availableParallelism();
if (cluster.isPrimary) {
console.log(`Primary ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
}
else {
// Workers can share any TCP connection
// In this case it is an HTTP server
http.createServer((req, res) => {
res.writeHead(200); res.end('hello world\n');
}).listen(8000);
console.log(`Worker ${process.pid} started`);
}
Running Node.js will now share port 8000 between the workers:
$ node server.js
Primary 3596 is running
Worker 4324 started
Worker 4520 started
Worker 6056 started
Worker 5644 started
Imagine a theater directs guests to different identical room to watch the same movie.
You can’t share memory between primary process and child process. But you can still communicate through message channel.
import cluster from 'node:cluster';
import http from 'node:http';
import { availableParallelism } from 'node:os';
import process from 'node:process';
if (cluster.isPrimary) {
// Keep track of http requests let numReqs = 0;
setInterval(() => {
console.log(`numReqs = ${numReqs}`);
}, 1000);
// Count requests
function messageHandler(msg) {
if (msg.cmd && msg.cmd === 'notifyRequest') {
numReqs += 1;
}
}
// Start workers and listen for messages containing notifyRequest
const numCPUs = availableParallelism();
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
for (const id in cluster.workers) {
cluster.workers[id].on('message', messageHandler);
}
} else {
// Worker processes have a http
server. http.Server((req, res) => {
res.writeHead(200);
res.end('hello world\n');
// Notify primary about the request
process.send({ cmd: 'notifyRequest' });
}).listen(8000);
}
worker_threads Link to heading
node:worker_threads
module enables the use of
threads that execute JavaScript in parallel.To access it:
import worker from 'node:worker_threads';
Unlike child_process
or cluster
, worker_threads
can
share memory. They do so by transferring ArrayBuffer
instances or sharing SharedArrayBuffer
instances.
Imagine when you buy tickets, some popcorn and drink, the accountant will print tickets, while the other employee will grab some popcorn, and an other will pour drink into a cup. Then they will give all those things to you.
poolifier Link to heading
poolifier
is a library that implement worker pool usingworker_threads
andcluster
. It allows you to use fixed or dynamic worker pool without any complexity.
// worker.ts
import type { Server } from 'node:http'
import type { AddressInfo } from 'node:net'
import express, { type Express, type Request, type Response } from 'express'
import { ClusterWorker } from 'poolifier'
import type { WorkerData, WorkerResponse } from './types.js'
class ExpressWorker extends ClusterWorker<WorkerData, WorkerResponse> {
private static server: Server
public constructor () {
super(ExpressWorker.startExpress, {
killHandler: () => {
ExpressWorker.server.close()
},
})
}
private static readonly factorial = (n: bigint | number): bigint => {
if (n === 0 || n === 1) {
return 1n
}
n = BigInt(n)
let factorial = 1n
for (let i = 1n; i <= n; i++) {
factorial *= i
}
return factorial
}
private static readonly startExpress = (
workerData?: WorkerData
): WorkerResponse => {
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion
const { port } = workerData!
const application: Express = express()
// Parse only JSON requests body
application.use(express.json())
application.all('/api/echo', (req: Request, res: Response) => {
res.send(req.body).end()
})
application.get('/api/factorial/:number', (req: Request, res: Response) => {
const { number } = req.params
res
.send({
number: ExpressWorker.factorial(Number.parseInt(number)).toString(),
})
.end()
})
let listenerPort: number | undefined
ExpressWorker.server = application.listen(port, () => {
listenerPort = (ExpressWorker.server.address() as AddressInfo).port
console.info(
`⚡️[express server]: Express server is started in cluster worker at http://localhost:${listenerPort.toString()}/`
)
})
return {
port: listenerPort ?? port,
status: true,
}
}
}
export const expressWorker = new ExpressWorker()
// server.ts
import { dirname, extname, join } from 'node:path'
import { fileURLToPath } from 'node:url'
import { availableParallelism, FixedClusterPool } from 'poolifier'
import type { WorkerData, WorkerResponse } from './types.js'
const workerFile = join(
dirname(fileURLToPath(import.meta.url)),
`worker${extname(fileURLToPath(import.meta.url))}`
)
const pool = new FixedClusterPool<WorkerData, WorkerResponse>(
availableParallelism(),
workerFile,
{
enableEvents: false,
errorHandler: (e: Error) => {
console.error('Cluster worker error:', e)
},
onlineHandler: () => {
pool
.execute({ port: 8080 })
.then(response => {
if (response.status) {
console.info(
`Express is listening in cluster worker on port ${response.port?.toString()}`
)
}
return undefined
})
.catch((error: unknown) => {
console.error('Express failed to start in cluster worker:', error)
})
},
}
)
Don’t believe me Link to heading
I lied
Even if it sounds convincing, don’t take any of my words as truth. Instead, do your own research and experiment to find out the result.
I have made some simple performance tests but you can create your own quite easily.
For simple Hello World! node http server Link to heading
$ wrk -t 10 -c 100 -d 60 http://localhost:8000 # this is single process server
Running 1m test @ http://localhost:8000
10 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.30ms 1.28ms 110.13ms 99.70%
Req/Sec 7.89k 338.74 8.55k 88.29%
4721090 requests in 1.00m, 688.86MB read
Requests/sec: 78551.96
Transfer/sec: 11.46MB
$ wrk -t 10 -c 100 -d 60 http://localhost:8001 # this is cluster with 8 child_process
Running 1m test @ http://localhost:8001
10 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.38ms 5.62ms 126.62ms 92.84%
Req/Sec 10.05k 1.11k 19.48k 75.45%
6007714 requests in 1.00m, 0.86GB read
Requests/sec: 99956.59
Transfer/sec: 14.58MB
Looking at the stats, we only gain 1.27x more requests/sec, but 1.8x latency
with 8 extra CPUs 🤔
In this test, the server only take the request and response with Hello World!
text, therefore Node.js built-in asynchronous I/O operations handle this kind
of work much better than spawning bunch of clusters.
Of course, in real world, we don’t have such simple Hello World!
so
this is not surprising.
The cluster mode shines when we need to do CPU intensive tasks. So we need to simulate one to see the difference.
CPU-bound task Link to heading
Our task
function will have to run through 1 million iterations loop.
async function task() {
let x = Math.random();
for (let i = 0; i < 1000_000; i++) {
x = x + Math.random() * i;
}
return x;
}
export default task;
[!TIP]
node:crypto
module often handles CPU-intensive tasks.
And don’t get me wrong when I use async
for non-async
code.
In reality, we often see async
function handle complex logic.
Those functions have hidden cost if we just look at where
they are called thinking we can get away without performance issue.
wrk -t 10 -c 100 -d 60 --timeout 1s http://127.0.0.1:8000
Running 1m test @ http://127.0.0.1:8000
10 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 515.43ms 146.44ms 985.61ms 82.31%
Req/Sec 24.89 17.68 130.00 67.89%
9545 requests in 1.00m, 1.58MB read
Socket errors: connect 0, read 0, write 0, timeout 141
Requests/sec: 158.81
Transfer/sec: 26.93KB
wrk -t 10 -c 100 -d 60 --timeout 1s http://127.0.0.1:8001
Running 1m test @ http://127.0.0.1:8001
10 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 132.52ms 42.04ms 981.30ms 92.05%
Req/Sec 76.52 18.18 343.00 74.20%
45640 requests in 1.00m, 7.56MB read
Requests/sec: 759.42
Transfer/sec: 128.79KB
In this test, I add a --timeout 1s
option to see how many slow requests
happen. In our single process server, 141 requests got timed out. Avg
latency is much higher, requests/sec is much lower compare to the
server with multiple clusters.
Again, don’t take any numbers in this blog post as your conclusion. I cooked them to look nice. I hope you can run your experiment and have your own conclusion.