High Performance NodeJS

26
High Performance NodeJS by Harimurti Prasetio

Transcript of High Performance NodeJS

High Performance NodeJSby Harimurti Prasetio

Introduction

Name : Harimurti Prasetio

email : [email protected]

twitter : http://twitter.com/harippe

facebook : https://www.facebook.com/harippe.murti

GitHub : http://github.com/aerios

Overview

What is NodeJS

How NodeJS works

High Performance in NodeJS

Case Study

What is NodeJS?

A javascript platform that run on top of V8 engine*

Non-blocking I/O

Single-threaded by nature

Originally developed by Ryan Dahl for his internal project

How NodeJS works

NodeJS use Event Loop in its core, provided by libuv library. Event Loop is

single-threaded and running indefinitely. Event Loop is responsible for :

abstracting I/O access from external request

invoke handler for I/O operation and delegates the operation to the handler

receive event from I/O handler regarding operation completion (success or

error)

trigger any callbacks associated with the event

I/O Handler

Event Loop

Main Thread

(event loop)

thread

thread

thread

thread

Event callback

Event Loop

As we can see, although NodeJS is single-threaded, but internally it still use

multithreading for I/O operation. This strategy ensure NodeJS to still able

process next request while waiting for the result from I/O operations.

Another benefit from using single-threaded environment is no memory

synchronization needed between callbacks. If two or more callbacks are

manipulating a variable, no race condition will occur. This feature, in my

personal experience, is the reason why develops application using NodeJS is

easy.

High Performance in NodeJS

In a sense, high performance means the ability to use all available resource

provided by host to deliver higher throughput

Depend on application requirement

It is harder to optimize NodeJS for CPU-intensize application than I/O-intensive application

Due to its single-threaded nature, it is almost impossible to perform parallel

programming using multithreading

almost, it means that there are several ways to use multithreading, but with limited

functionality

Case Study

Suppose our web application, built using NodeJS, need to be enhanced with

analytic functionality. After several discussion and benchmarking, it is decided

to perform data aggregation on application because performing JOIN and

GROUP on database will degrade its performance significantly.

Case Study

In the middle of the sprint, the developer bestowed with this task found that

during crunching very large dataset the application will freeze. He quickly

realize that the application stuck when it enter several tight loops, needed by

the APIs that provide analytic functionality. Changing the loops to native for

loops doesn’t make it either. So, he need to find a way so that the application

won’t freeze despite the tight loops.

Case Study

Tight loops is one case of CPU-intensive operation. Roughly there are 2 ways to

solve this problem :

1. Decrease input size

2. Increase the power

Number (1) is a no-go, because reducing the input size will produce misleading

output. So number (2) is the only option. But the question is, how do we

increase the (CPU) power when NodeJS is single-threaded by nature? How do

NodeJS application consume more CPU power explicitly from the host?

Case Study

After quick Google, there are several ways to achieve it :

1. Multiprocessing

2. WebWorker

3. Parallel.js

Multiprocessing

child_process Module

spawn

fork

exec

Multiprocess - spawn()

- enable NodeJS to create child process from another command

- suppose we need to run python script from our NodeJS application

- we can invoke the script via NodeJS using spawn

- syntax :

var spawn = require(“child_process”).spawn

var inst =

spawn(“py”,[“./path/to/python_script.py”,”parameter_for_py1”,”parameter_for_py2”])

Multiprocess - fork()

- special case of child_process.spawn

- enable NodeJS application to run another NodeJS application as its child

and perform bidirectional link between them

- by using fork, the parent and children can communicate via message

passing using send() method

- syntax

Multiprocess - fork()

main.js

var fork = require(“child_process”).fork

var inst = fork(“path/to/worker.js”,[“this is argument”])

inst.on(“message”,function(message){

console.log(message)

})

inst.send({data:”Hi worker”})

Multiprocess - fork()

worker.js

var argFromParent = process.argv[2]

process.on(“message”,function(message){

process.send({data : message.data,”response”:argFromParent})

})

Multiprocess - fork()

The result

{“data”:”Hi worker”,”response”:”this is argument”}

Multiprocess - fork()

By using fork(), the main application and its children can communicate back

and forth. This is the simplest form of message passing, a method for memory

sharing between different processes or actors. Using message passing :

main application can send data or command to its children

workers (child processes) can send back the output of calculation or

command execution

This feature enables developers to create simple job queue system using

purely NodeJS

Multiprocess - exec()

- NodeJS spawn a shell and execute the command within the shell

- any output or error is buffered and will be provided via callback

- useful to call bash command

- syntax :

var exec = require(“child_process”).exec

var inst = exec(“ls -lah ~/”,function(error,output,error){console.log(output)})

Multithreading in NodeJS

By default, multithreading is not supported in NodeJS. But some npm modules,

such as webworker-threadsand parallel.js enable developer to create new

thread. Both of these modules use WebWorker API, one of ES5 specification.

Multithreading in NodeJS

From my experience, using threads provided via WebWorker have some

benefits :

Lower resource overhead than multiprocess for initialization

Passing data back and forth incur lower overhead

The drawback for using threads:

Because the threads created using WebWorker are not native NodeJS thread,

it is not guaranted that several features provided by NodeJS is present

(non-blocking I/O, module loading, etc)

Demo

available at : https://github.com/aerios/bdd-3

Conclusion

NodeJS provide several ways to achieve high performance using parallel

programming

Developers can select tools provided natively by NodeJS or using

community-provided modules

In the end, high performance is not a problem that can be solved by simply

using tools. Understanding how things works and performing iterative

benchmarking is a must.

Reference

http://mcgill-csus.github.io/student_projects/Submission2.pdf

http://www.journaldev.com/7462/node-js-processing-model-single-threaded-

model-with-event-loop-architecture

http://nikhilm.github.io/uvbook/An%20Introduction%20to%20libuv.pdf

https://nikhilm.github.io/uvbook/threads.html

http://docs.libuv.org/en/v1.x/design.html

https://www.npmjs.com/package/webworker-threads

Thank You