Parallel detection of SCC
onA directed graph is strongly connected if there is a path between all pairs of vertices. A strongly connected component (SCC) of a directed graph is a maximal strongly connected subgraph.
We can find all SCC serially in O(|V|+|E|)
time using Tarjan or Kosaraju’s algorithm. Here we’ll use the FW-BW-Trim algorithm for finding all strongly connected components in parallel.
FW-BW-Trim algorithm
Let’s say we pick a vertex u (pivot) in the graph. We represent its descendants (or the vertices discovered in forward sweep) as FWu and ancestors (the backward sweep) as BWu. Then the graph can be partitioned into 4 independent sets -
- SCC containing u: FWu BWu
- Vertices only in FW: FWu BWu
- Vertices only in BW: BWu FWu
- Remaining vertices in graph: V (FWu BWu)
Additional SCCs must be completely contained within one of the three partitions. Thus, we recursively apply the algorithm to each of the three partitions besides the pivot’s SCC. We utilize task parallelism for this.
An optimization to the algorithm is the Trim step which removes all trivial SCCs before starting FW-BW. The procedure is simple: all vertices that have an indegree or outdegree of zero are removed. Trimming can also be performed recursively, as removing a vertex will change the effective degrees of its neighbors.
Code
color[u]
represents the set vertex u belongs to, initially all vertices belong to the same set (color[u] = 0
) and we divide this set recursively.
While benchmarking, I found recursive trim did not give any significant speedup, so we’ll only perform a single iteration of trim. All trimmed verticed have color[u] = -1
.
We use a color generator which generates a unique color for marking descendants(cfw
), ancestors(cbw
) and the vertices belonging to the pivot’s SCC (cscc
). The color generator is basically an atomic variable which is incremented everytime we need to get a unique color.
We use DFS for traversing the graph starting from the pivot, although we can also use BFS.
@spawn
marks any piece of program for execution, and a task will be started to run that code automatically on an available thread. To read more about @spawn
, refer here.
The new visitor functions in LightGraphs simplify the code. We just have to import the newvisitfn!
, which is called whenever a new vertex is discovered during the traversal and modify our traversal state. All visitor functions return a VisitorReturnValue
. VSUCCESS
indicates that the traversal should continue normally, while VSKIP
means that we can skip exploring the current neighbor and move on to the next branch.
Results
I benchmarked the code with 6 threads on many real-world graphs and observed a slowdown of about 2-3X in comparison to Tarjan’s algorithm. The slowdown was probably due the following reasons -
-
Most real graphs have a single large SCC, this causes the algorithm to esentially run sequentially. Since we spawn a task for every color, if there’s only a single color with a large vertex set, it will reduce parallelism. One solution to increase parallelism is to use parallel BFS for traversal, but this will cause a lot of overhead.
-
Overhead in using
@spawn
: I observed that the code allocated a lot more memory when using spawn, I’m not sure what caused this.