← Frameworks

Performance optimization

Bottleneck taxonomy: CPU/network/disk/GPU/memory-bound

Type of limitations

When you suspect your software to be slow in some places, the first thing you need is numbers, because feelings are not scientific.

If you cannot measure it, you cannot improve it. ~ Lord Kelvin

How? Benchmarks, metrics collection (Prometheus, Weight & Biases, htop, asitop, nvtop, etc.)

CPU bound

Use case:

  • Running a function repeatedly in a loop / recursively

Make sure to use both all the CPU core thread and all the CPU cores. If it’s not enough, distribute the load over multiple hardware. Check the performance and complexity of your code, is it O(n),O(n2)O(n), O(n^2) …?

Network bound

Use case:

  • Downloading a lot of things, torrents, etc.

Get better network or distribute across different networks.

Disk-bound

Use case:

  • Writing a bunch of files, logging

Check the speed of your disk, distribute.

GPU-bound

Assuming GPU computation here, not VRAM like in NVIDIA GPUs, if you are GPU memory bound, distribute (i.e. GPT3 400 GB model :)).

Use case:

  • Deep learning models optimized for GPU (transformers for example)
  • Blockchain mining
  • Radio processing (fourrier transforms), cryptanalysis
  • Cybersecurity

Check the performance and complexity of your code, is it O(n),O(n2)O(n), O(n^2) …?

Memory-bound

Use case:

  • Java

Don’t use Java :) Distribute, check memory complexity, is it O(n),O(n2),...O(n), O(n^2), ... ? Reduce redundancy in data structure, stop allocating massive amount of unnecessary memory Use Rust (joking)

← back to Frameworks