DFO can be considered to be the result of DFS (depth-first search). The latter, along with BFS (breadth-first search), are two different ways to walk a graph. They both can visit all graph's nodes, as shown on this figure (a graph, DFS and BFS):

The main difference between these two methods is that the first one tries to walk the graph by moving between different child levels first, while the second (BFS) visits all children on the same level first, and then moves to the next level. Of course, this explanation is informal – refer to your discrete math book to find out more.
DFS is also useful when we want to classify the graph's edges. A graph may be quite complex, with many edges connecting nodes in a certain way. The next four categories are possible (this classification evolves naturally from DFS algorithm):
- tree edges;
- forward edges;
- back edges;
- cross edges.
However, SBCL uses DFS to find out the independent components and their elements only, and it does not classify the connections between them. Taking this into account, readers should probably consider DFS to be just a tree walk in our case.
We choose sb-c::find-initial-dfo (called in sb-c::%compile, 'src/compiler/main.lisp') as a starting point for the code discussion. It receives the list which contains the exactly one functional – the one we are compiling. Then, it does the next steps:
- fetches the component of the functional, and a list of functionals which are in this component;
- iterates over that list, calling sb-c::dfo-scavenge-dependency-graph for each functional;
- collects the component which was returned by sb-c::dfo-scavenge-dependency-graph (this collection does not happen always – the same component is not collected twice);
- deletes the original component of the input functional, in case the component's kind is :initial;
- after the components are collected, the last two actions: blocks numbering and components classification (in sb-c::separate-toplevelish-components) complete the function body.
Let us move to sb-c::dfo-scavenge-dependency-graph. What is bad about DFO code, is the individual functions size, which is impossible to fit into the book's standard page. So, please find its source (there is also the excellent comment) and then continue with this chapter.
There are four main branches in this function: the first performs a simple check to find out whether the functional is already in the given component, and returns that component. The second merges the old non-initial component into the given component. Third branch is related to the second, it checks the block's 'flag' slot to find the initial components which are already walked during DFO computation (this slot value is set in sb-c::find-initial-dfo-aux).
The most interesting and long branch is the fourth one. Here the function builds DFO for the functional and its elements. It should do the next things to accomplish that:
- move the input clambda structure instance from its old component to the new, as well as clambda bind and return blocks;
- call sb-c::find-initial-dfo-aux (see below);
- declare several local functions: 'scavenge-closure-var', 'scavenge-lambda', 'scavenge-entry' and so on. They all use sb-c::dfo-scavenge-dependency-graph directly or indirectly – after the certain simple checks are done;
- iterate over the dependency list of the input clambda (more precisely, over 'calls-or-closes' sparse set elements) and call one of the inner functions to scavenge every element. Eventually this ends with the recursive call into sb-c::dfo-scavenge-dependency-graph. This is a key point to understand: Python is merging all pieces of the entire functional representation into the single component recursively. In case you understand this sentence, you understand what is DFO computation in SBCL.
Function sb-c::find-initial-dfo-aux is shown here:

As you see, the code is quite simple. The strategy is to join the block's component with the given input component, unless the block's component is initial, the same as the input component, or the block is already processed by this function (its 'flag' slot is set). Actual flag setting occurs in the default branch, which is full of the interesting stuff. First of all, we have a call into sb-c::scavenge-home-dependency-graph here:

This function finds the right component to merge the block's successors into (recall that a block which contains bind node is the first in the functional blocks chain). In sb-c::find-initial-dfo-aux all these successors are recursively processed, and their common component is derived finally. The tiny function sb-c::remove-from-dfo (it is located in ir1util.lisp, as well as sb-c::add-to-dfo) is used to break the old next/prev relations of the block, and sb-c::add-to-dfo is used to insert the block after the component's head block.
The actual code which joins two components, sb-c::join-components, is quite simple and long enough to omit it in this book. We believe that readers also find this function to be easy understandable.
The last function to discuss in this chapter is sb-c::separate-toplevelish-components. It is called as a last form in sb-c::find-initial-dfo. Its job is to classify the toplevel components which were discovered during DFO computation. The function iterates over the list of these components, and creates three lists of them, by category: nontop-components, top-components and hairy-top-components. Classification rules are based on the analysis of the external references and toplevel functionals presence in a component.
As for now, that is all about DFO computation in SBCL. We will meet it again later, in sb-c::dfo-as-needed, but at this time it is enough to know that DFO in Python provides the certain connections between all elements of the same Lisp form representation in the compiler. These connections allow easy and consistent walking of the entire functional, which is important for many compilation-related algorithms.
The DFO code is quite clear, there are even wrappers like
ReplyDelete(scavenge-call (called-lambda)
(scavenge-lambda called-lambda))
to make the things obvious :)
My flow graph dumper module is still unfinished, mainly because I write a lot of CL code at work, and I am too tired in the evening to give it a try. But anyway I see that the dumper is required, so I hope to start it soon. But I cannot give it a clear estimate.
Chapter 5 will be here in 2-4 weeks. I think that the hard pieces (early IR1) are done, and the rest of the book will go fine. Since we are coming closer to the compilation itself, it will be more interesting for me to write about all that stuff.