Monday, January 25, 2010

Detecting function inlining

By definition, compiler optimizations should only affect the performance of the application. The behavior of the optimized code should be the same as the non-optimized code. If this is the case, how is it possible to detect whether a function has been inlined or not? Dear reader, read on (that's what readers do, right?).

Before we continue, let's take a step back and ask ourselves why it would ever be useful to detect function inlining. Let's take an example. Imagine the following code is part of a large application developed at company where we pair program. The function createObject() allocates an object on the stack (that is, it's a local variable) and returns a pointer to it:
int function() {
Object* o = createObject();
return o->field;
}
although it's bad practice to return a pointer to a local variable, function works and does precisely what you'd expect: it creates an object and returns one of its fields. As developers with good taste, we decide to clean up the code a bit by introducing a getter for field:
int function() {
Object* o = createObject();
return o->getField();
}
We compile the code and run the tests. There is a green light. Everything is good. We commit the code to the repository.

Not so fast! After a while the automated build system indicates a whole lot of failing tests. What happend?

Here's what happend: since we introduced a function call to getField we changed how the stack looks when reading the field field. And since the object is stack allocated, the object is overwritten by the call to getField. Good team players as we are, we do not want to have failing builds for long, so we revert our last check-in and start to inspect the problem closer.

We quickly determine that the only failing builds were the non-release builds, that is, those builds that failed were those that did not optimize the code. We stare into the air and ask ourselves... why?

At this point one of us remembers a blog he had read the night before about function inlining. He explains "if the call to createObject is inlined into function, then the object is allocated as a local variable in function instead of in createObject. This means that when getField is called, the object is not overwritten. It's not overwritten because its allocated in a different place on the stack. In a different stack frame, to more precise."

The other developer thinks about this for a few seconds and replies "ok, then let's just allocate the object on the heap instead of the stack". Before the blog-reading developer can say "memory leak" he adds "and, of course, delete it when we're done with it".

After a few minutes of coding and half an hour of waiting for the compiler to finish, we have a working solution that works on both optimized builds and on non-optimized builds. We commit the code and leave for the day.

When we get back the morning after, we have an angry mail in out inbox. Apparently our little fix made the performance of the application degrade horrendously -- we had placed a call to new inside an extremely tight loop. Doh! Stupid us. A better solution is needed. In order to understand the problem properly the pair programmers summarize:
  • For optimized builds the function is inlined, thus stack allocation work fine. With stack allocation the performance of the application is acceptable.
  • On non-optimized builds the function is not inlined, thus stack allocation does not work and we have to resort to heap allocation. However, heap allocation is not acceptable for performance.
When the problem is so clearly stated, the blog-reading developer exclaims "let's allocate on the stack if the function is inlined, and allocate on the heap otherwise!"

Ok, now let's leave those two developers, I think they can handle it by themselves now. I'm now going to explain how to do the trick the blog-reading developer suggested.

For every function that is called there is a stack frame which contains the function's local variables. Every stack frame has an address, which can be accessed via a register called EBP (extended base pointer, I think). This register can be read using the following code:
void* stack_frame() {
register void* ebp asm("ebp");
return ebp;
}
This function returns the pointer to the stack frame that is created when entering the function. So, as you can see, it's really simple to get the pointer to the current stack frame. Let's make another function and provide it with the pointer the stack frame of its caller:
bool is_inlined(void* callers_ebp) {
register void* my_ebp asm("ebp");
return my_ebp == callers_ebp;
}
This function is kind of magic because it will return true if its inlined and false otherwise. Using this we can write a function that allocates on the stack if the function is inlined, and allocates on the heap otherwise:
Object* createObject(void* callers_ebp) {
register void* my_ebp asm("ebp");
if (my_ebp == callers_ebp)
return &Object();
else
return new Object();
}
Pretty sweet.

Note that the EBP register is not used when compiling with -O2 or -O3 unless you use -fno-omit-frame-pointer. It should be possible to do the same trick without EPB by using the ESP register instead, but I haven't tested that.

(Disclaimer: I'm not suggesting that this is a good way for solving the problem the two developers in the story faced. What I do say it that in those rare circumstances when the behavior of a function need to change when its inlined, then this is one way to do it.)

2 comments:

Giorgio said...

Isn't returning a pointer to a variable allocated on the stack a recipe for a disaster? I mean, at the next function call it's going to be overwritten...

Torgny said...

You're right, the local variable will (possibly) be overwritten on the next function call. However, in the first code snippet in the post no function call was done so the local variable was not corrupted.

Anyway, I agree with out and I'm not suggesting that returning a pointer to a local variable is a good thing. I only tried to give some background for my story. :)