Performance optimization
Bottleneck taxonomy: CPU/network/disk/GPU/memory-bound
Type of limitations
When you suspect your software to be slow in some places, the first thing you need is numbers, because feelings are not scientific.
If you cannot measure it, you cannot improve it. ~ Lord Kelvin
How? Benchmarks, metrics collection (Prometheus, Weight & Biases, htop, asitop, nvtop, etc.)
CPU bound
Use case:
- Running a function repeatedly in a loop / recursively
Make sure to use both all the CPU core thread and all the CPU cores. If it’s not enough, distribute the load over multiple hardware. Check the performance and complexity of your code, is it …?
Network bound
Use case:
- Downloading a lot of things, torrents, etc.
Get better network or distribute across different networks.
Disk-bound
Use case:
- Writing a bunch of files, logging
Check the speed of your disk, distribute.
GPU-bound
Assuming GPU computation here, not VRAM like in NVIDIA GPUs, if you are GPU memory bound, distribute (i.e. GPT3 400 GB model :)).
Use case:
- Deep learning models optimized for GPU (transformers for example)
- Blockchain mining
- Radio processing (fourrier transforms), cryptanalysis
- Cybersecurity
Check the performance and complexity of your code, is it …?
Memory-bound
Use case:
- Java
Don’t use Java :) Distribute, check memory complexity, is it ? Reduce redundancy in data structure, stop allocating massive amount of unnecessary memory Use Rust (joking)