Tuesday, July 26, 2011

Callgraph

There remains, as of this blog post, one major regression from the original dehydra-based DXR, and that is the lack of support for callgraph. The main problem I have here is trying to pin down the scope of the feature. To give an idea of why this hard, I pose the following challenge: how many different ways are there in C++ of invoking a function, as would be viewed from the generated assembly code? Here is my list:

Global function
Invocation of non-virtual member
Nonvirtual invocation of virtual member
Constructor, copy constructor invocations
Implicit destructor invocation
Functors (objects that implement operator())
C++0x lambdas (which appears to be a functor)
Count all of these as "one," if you like, since they're all still more or less boil down to a call addr instruction, so it really boils down to how pedantic you get about the differences. Most of these look fairly distinct in source code level as well too. But, on the plus side, being a statically-known function call, all of these are easy to handle.
Virtual member function invoked virtually
This is clearly different at an assembly level, since it requires a vtable lookup with the added possibility of thunking. From a handling perspective, though, this is more or less statically known, since we know the possible targets of a virtual method call are limited to subclasses of the method, if we assume that people aren't sneaky.
Function pointers
Easy to dip into, impossible to think about off the top of your head (how many tries does it take you to write the type of a function pointer that returns a function pointer without using typedefs?), and it requires data-flow analysis to solve. The main question is if it's even worth trying to at least plan to support function pointers, or if they should just be left out entirely.
Pointer-to-member function pointers
Theoretically easier than function pointers (because we again have a more limited state space), but they still suffer from the same need to data flow analysis. Not to mention that they are much harder to use, and that their implementations are much different from function pointers.
Template dependent name resolution
Templates, as far as C++ is concerned, is essentially a macro system that is more powerful and more inane than the C preprocessor. So if the expression that is to be called is dependent on the type parameter, then the function call can be any (or all) of the above depending on what was passed in.
throw
The entire exception handling semantics in C++ essentially gets handled by the ABI as a combination of magic function calls and magic data sections; undoubtedly a throw itself is a function call which gets to do stack unwinding. Most of this stuff isn't exposed in the code, it just leaks out in semantics.

One goal I've had in mind is to make sure I can do some reasonable support for dynamic languages as well as static languages. To that end, I've looked up other callgraph implementations to see what they do in the realm of indirect function calls. The answer is a mixture of either "do runtime instrumentation to build the callgraph" or "ignore them altogether." The closest I've seen is one implementation that indicated when the address of a function was taken.

2 comments:

jmdesp said...

I'd like to give you a useful feedback, but I don't really know which one.
Maybe you should remember it's best to not try to solve every problem at once. Of course, it's a pity to make a choice that makes it harder to add functionnalities later, but if you've already given it a thought and don't see how to organize it better, just go with it I think.
It would be good though if you manage to handle templates correctly and if the architectures allows to have a list of possible caller/callee in case we can't really know which one is the right one.

Anonymous said...

I'm curious: What other implementations did you look up?