A Disjoint Set Union (DSU) data structure can be used to maintain knowledge of the connected components of a graph, and query for them quickly. In particular, we would like to support two operations:
dsu.find(node x), which outputs a unique id so that two nodes have the same id if and only if they are in the same connected component.
dsu.union(node x, node y), which draws an edge
(x, y)in the graph, connecting the components with id
To achieve this, we keep track of
parent, which remembers the id of a smaller node in the same connected component. If the node is it’s own parent, we call this the leader of that connected component.
A naive implementation of a DSU structure would look something like this:
class DSU(object): def __init__(self): self.parent = range(1001) # maximum number of entries def find(self, x): while self.parent[x] != x: # While x isn't the leader x = self.parent[x] return x def union(x, y): self.parent[find(x)] = find(y)
We use two techniques to improve the run-time complexity: path compression, and union-by-rank.
Path compression involves changing the
x = parent[x] in the
find function to
parent[x] = find(parent[x]). Basically, as we compute the correct leader for x, we should remember our calculation. After path compression, subsequent
find will return the id in two steps.
Union-by-rank involves distributing the workload of find across leaders evenly. Whenever we
dsu.union(x, y), we have two leaders
yr and we have to choose whether we want
parent[x] = yr or
parent[y] = xr. We choose the leader that has a higher following to pick up a new follower. Specifically, the meaning of rank is that there are less than
2 ^ rank[x] followers of
x. This strategy can be shown to give us better bounds for how long the recursive loop in
dsu.find could run for.
class DSU(object): def __init__(self): self.par = range(1001) self.rnk =  * 1001 def find(self, x): if self.par[x] != x: self.par[x] = self.find(self.par[x]) return self.par[x] def union(self, x, y): xr, yr = self.find(x), self.find(y) if xr == yr: return False elif self.rnk[xr] < self.rnk[yr]: self.par[xr] = yr elif self.rnk[xr] > self.rnk[yr]: self.par[yr] = xr else: # if both leader have the same rank, increase the rank by 1 self.par[yr] = xr self.rnk[xr] += 1 return True
Using both path compression, splitting, or halving and union by rank or size ensures that the amortized time per operation is only \(O(\alpha (n))\), which is optimal, where \(\alpha (n)\) is the inverse Ackermann function. This function has a value \(\alpha (n)<5\) for any value of n that can be written in this physical universe, so the disjoint-set operations take place in essentially constant time.
The space complexity is \(O(n)\) because of the extra space used by
rank. For more information, see wikipedia.