Reaching
by @matteocollina
Ludicrous speed is from Spaceballs
Back in 2013..
MQTT and Node.js: Messaging in the Internet of Things
- MQTT broker written in node
- Open source
- fast up to 20.000 publish/second
- CPU bound
Fast?
can be faster!
Down the rabbit hole of performance optimizations
Achieving
Performance boost
Tools
- flamegraphs
- dtrace
- mdb
What worked
node --trace_opt --trace_inlining --trace_deopt
How node works
How to get fast
fast means we can do more I/O
- get an I/O event
- process the event
- release the CPU
- as fast as possible
V8
- can optimize our functions
- only when they run enough time
- in Mosca, I was allocating a ton of functions
- most of them anonymous
My enemy
[marking 0x1a308644a581
<JS Function (SharedFunctionInfo 0xf1f928b34e9)>
for recompilation, reason: small function,
ICs with typeinfo: 8/8 (100%), generic ICs: 0/8 (0%)]
My enemy
function (err, data) {
/* whatever is done here
is not going to be optimized */
}
Code time!
All benchmarks results
Code time!
http://npm.im/steed
var steed = require('steed')()
steed.map(new State(cb, 2), [1, 2, 3], multiply, done)
- steed.each
- steed.map
- steed.eachSeries
- steed.mapSeries
- steed.parallel
- steed.series
- steed.waterfall
- steed.queue
Rules for hot code path 1/2
- Do not allocate functions
- Make sure that V8 can optimize them
- When you are at 100% of CPU, everything is hot
The missing bit
The final (?) step of improvement
- In Mosca I was allocating a buffer for each packet
- copying over all the relevant information
- which would mean all those buffer neeeded to be collected
- solution: stream packet generation and avoid allocation
- memoize headers: so these do not need to be allocated anymore
- what made the difference: use cork()/uncork() to avoid crossing the JS/C++ barrier
Rules for hot code path 2/2
- GC time counts!
- Allocate as few objects as possible
- slow code that does not allocate
- might turn out to be faster
Improving node
- Using this process, I have found a perf bug in node
- which can make your streams run 5-10% faster
- in my case, it was the last 10%
- https://github.com/nodejs/node/pull/4354
- hint: your http requests are streams :)
- this is part of node v5.7.0+
- this has just been released in node v4.4.0
- and it is part of 6.0.0
Most code does not need to go
One last thing.. flamegraphs!
How to generate them
0x
- Check out 0x
- The easiest way to generate flamegraphs!
You also need a FAST logger
- Check out pino
- Up to 17x faster than alternatives
And a fast HTTP load testing tool
- Check out autocannon
- It can generate 10% more load than alternatives in C
- demo
http://npm.im/steed
This presentation
http://github.com/mcollina
Thanks!
If you need help with Node.js