Distributed Computing

# [[Epistemic status]] #shower-thought #to-digest # Changelog ```dataview TABLE WITHOUT ID file.mtime AS "Last Modified" FROM [[#]] SORT file.mtime DESC LIMIT 3 ``` # Related # TODO > [!TODO] TODO > clean up this garbage # Distributed Computing #computing At the early ages of computing, we ran programs on a single process, a single thread. As data and computing needs grew, we needed bigger machines until the Moore law ended, forcing us to find a new way to scale. Now we distribute computing over several machines and / or several threads. There is different ways to distribute computing but one thing is sure, it add a layer of complexity because of the need to partition data over "nodes" (threads or machines) , aggregate the computing ... ## Distribution strategies ### Three dimensional data In some cases, your data can be related to a tree dimension location (x,y,z). In that case the most obvious strategy is to divide the data over regions using bounding volume hierarchy data structure such as Octree ![[Bounded Volume Hierarchy]] Bottlenecks: (TODO) - Data points aren't spread, i.e. image 90% data points are in a single region ### Basic (TODO fix name) Imagine we don't care about how data is divided, just **partition** it and distribute over nodes. Typical example : [Kafka](https://kafka.apache.org/)