Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"06-23-11 - Map File Graphviz"

5 Comments -

1 – 5 of 5
Blogger jfb said...

The problem you'll find is that compilers, when optimizing, don't respect the boundaries of functions. This information does not meaningfully exist if you use whole-program optimization for example.

What you might do is use -O0 to determine the graph and -O3 to determine the sizes after inlining..

Anyway, gcc or Clang's -ffunction-sections will place each function in its own section in an ELF file. This means from the compiler's perspective that each function could be relocated separately in memory.

The result is that the relocation table for each function shows all functions it needs to call that have not been inlined. This will give you exactly what you want -- either use readelf (careful if capturing its output -- it clips function names to fit a table width) or write an ELF parser. It's not that bad of a format to parse - I did it for a (C#) project recently.

Looking at some test code of mine, -ffunction-sections doesn't appear to affect the code quality much, but that may depend on platform. I'm testing on ARM Thumb 2, and it outputs R_ARM_THM_CALL relocations for the linker (relative jump).

Alternatively, see if libclang has an API for dependencies. :)

June 26, 2011 at 8:22 AM

Blogger ryg said...

"This information does not meaningfully exist if you use whole-program optimization for example."
Now that's just wrong.

The compiler will choose to inline some functions completely, yes, but as long as you have symbolic debug information you can infer the actual call graph directly from the final binary, function-level linking or not. That may not correspond 1:1 to the source code but it's still useful.

It would be much better if linkers actually stored the extents for everything in the MAP-file (or, better in terms of getting information out, PDB for VC++). As it is *some* stuff has both position and size, some is just labels without size information, and things like jump tables for switches, compiler-generated immediate constants and multiple-inheritance this-pointer-adjustment thunks don't appear at all. If the symbol table was complete just the start positions would be fine, but with the holes you sometimes have to guess size based from the start address of the next symbol and that can be off badly.

Function-level linking is definitely the way to go for SPU projects (you want the full monty: "-ffunction-sections -fdata-sections -Wl,--gc-sections"). It shouldn't affect code quality at all; it does increase GNU ld link times substantially (since ld is slow), but if you're linking projects with <120k code+data that's a non-issue :).

June 26, 2011 at 1:56 PM

Blogger jfb said...

Hardly wrong. "It may not correspond to the source code"... yes... that's true. Here's an example from what I'm doing on ARM Thumb 2 with Clang.

It inlines most of my functions. I start with about 30 (4KB of code space :) and end up with about 3.

It identifies functions with no side effects. It moves them out of loops all the way up into main().

It identifies cases where I am using int32s as booleans. It replaces these with bytes. Even if it's 0 and 123, not 0 and 1.

I suggest reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html ... believe me, compilers work wonders these days. GCC probably does as well. For me, whole-program cuts the code size to a third of the original.

There's not much of the original structure left, so unless C++ inhibits a lot of optimizations I am getting in C, I expect you'll find it the same -- -ffunction-sections and ELF parsing will get you what you need, whole program optimizes too much to keep it.

By the way, most important take-away from the article linked above: signed int overflow being 'undefined', if you want to bounds check, you've got to cast to unsigned first. I was a bit surprised they went that far, seeing how many platforms are two's complement with rollover these days..

June 26, 2011 at 5:18 PM

Blogger cbloom said...

Come one, I'm sure ryg is well aware of undefined behavior in C. There's no need to get pedantic.

The issue is what you can figure out even after optimization, and how often that information is still useful.

What you really want is to look at the binary size using all the real optimizations that you will use in shipping, including all inlining etc.

In fact that's pretty crucial; one of the things that kills you with SPU ELFs is inlining. One of the things that was getting me is "memcpy" was getting inlined in 4 different places in my code, so I had 4 full copies of that routine (which is rather large on the SPU).

Ideally I'd see that "LZ_CopyMatch" is large and I'd be able to see that it's so big because "memcpy" got inlined into it.

I believe that you can figure this out pretty well. You can compile first without inlining and generate your whole call graph. Then compile again with your real shipping settings to get the actual function sizes.

That way you could see that "LZ_CopyMatch" was quite large, and you could see that "memcpy" was inlined into it. You couldn't actually tell how much of that largeness was due to memcpy, but it's a start.

June 26, 2011 at 5:34 PM

Blogger MSN said...

Assuming you are using COFF .obj files as inputs to the linker, COFF specifies fixups (relocations) per section which usually coincide with functions (at least with MSVC). The external fixups are the ones that you want to use as links.

Combine that with the .map file and you should get a nice dependency graph.

MSN

June 26, 2011 at 8:53 PM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.