A Disjoint Set Union (DSU) data structure can be used to maintain knowledge of the connected components of a graph, and query for them quickly. In particular, we would like to support two operations:

`dsu.find(node x)`

, which outputs a unique id so that two nodes have the same id if and only if they are in the same connected component.`dsu.union(node x, node y)`

, which draws an edge`(x, y)`

in the graph, connecting the components with id`find(x)`

and`find(y)`

together.

To achieve this, we keep track of `parent`

, which remembers the id of a smaller node in the same connected component. If the node is it’s own parent, we call this the leader of that connected component.

A naive implementation of a DSU structure would look something like this:

```
class DSU(object):
def __init__(self):
self.parent = range(1001) # maximum number of entries
def find(self, x):
while self.parent[x] != x: # While x isn't the leader
x = self.parent[x]
return x
def union(x, y):
self.parent[find(x)] = find(y)
```

We use two techniques to improve the run-time complexity: path compression, and union-by-rank.

Path compression involves changing the `x = parent[x]`

in the `find`

function to `parent[x] = find(parent[x])`

. Basically, as we compute the correct leader for x, we should remember our calculation. After path compression, subsequent `find`

will return the id in two steps.

Union-by-rank involves distributing the workload of find across leaders evenly. Whenever we `dsu.union(x, y)`

, we have two leaders `xr`

, `yr`

and we have to choose whether we want `parent[x] = yr`

or `parent[y] = xr`

. We choose the leader that has a higher following to pick up a new follower. Specifically, the meaning of rank is that there are less than `2 ^ rank[x]`

followers of `x`

. This strategy can be shown to give us better bounds for how long the recursive loop in `dsu.find`

could run for.

```
class DSU(object):
def __init__(self):
self.par = range(1001)
self.rnk = [0] * 1001
def find(self, x):
if self.par[x] != x:
self.par[x] = self.find(self.par[x])
return self.par[x]
def union(self, x, y):
xr, yr = self.find(x), self.find(y)
if xr == yr:
return False
elif self.rnk[xr] < self.rnk[yr]:
self.par[xr] = yr
elif self.rnk[xr] > self.rnk[yr]:
self.par[yr] = xr
else: # if both leader have the same rank, increase the rank by 1
self.par[yr] = xr
self.rnk[xr] += 1
return True
```

# Complexity

Using both path compression, splitting, or halving and union by rank or size ensures that the amortized time per operation is only \(O(\alpha (n))\), which is optimal, where \(\alpha (n)\) is the inverse Ackermann function. This function has a value \(\alpha (n)<5\) for any value of n that can be written in this physical universe, so the disjoint-set operations take place in essentially **constant time**.

The space complexity is \(O(n)\) because of the extra space used by `parent`

and `rank`

. For more information, see wikipedia.