One of the hard things with turning resources into bundles is that it's hard to define an exact metric to optimize. Generally with computers
if you can come up with a simple fast measure of whether a certain configuration is best, then you can through various techniques at it and
do well under that metric (see for example all the work on the Surface Area Heuristic for geometry).
Anyway, the problem with bundling is the optimal setup depends very much on how they're used.
If you just do a standard "load everything and immediately stall" kind of whole level loading, then it's pretty easy. In fact the optimal is
just to jam everything in the level into one bundle.
On the other hand, if you actually do paging and drop bundles in and out to reduce memory load, it's harder. Finest grain paging minimizes your
memory use, because it means you never hold a single object in memory that isn't needed. In that case, again its easy to make optimal bundles -
you just merge together resources which are always either loaded or not loaded at the same time (eg. the list of "sets" which want them is
identical).
More generally, you might load a big chunk for your level, then page pieces. To optimize for that you want the level load to be a big fast chunk,
then the rest to be in pages. Also, you might not want the pages to be all finest possible grain, if you have a little memory to waste you
probably want to merge some of them up to reduce seeks, at least to merge up many very small resources together.
One of the main ways to make bundles for Oodle is just to let Oodle watch your data loads and let it make the bundles for you.
(if you don't like that, you can always completely manually specify what resource goes in which bundle). In order to make it
possible for Oodle to figure out a good bundling you can also push & pop markers on a stack to define "sets" and maybe also do some
tagging.
One thing that's occurred to me is that even the basic idea of making bundles to just load the data fast is dependent on how you use it.
In particular, when you start trying to use the resource, and whether they always work as a group, etc.
For the straight loading path, I believe the key feature to optimize is the amount of time that the main thread spends waiting for resources;
eg. make that time as small as possible. That's not the same as maximizing throughput or minimizing latency - it depends on the usage pattern.
For example :
Case 1.
Request A,B,C
... do stuff ...
Block on {A,B,C}
/Process {A,B,C}
Case 2.
Request A,B,C
... do stuff ...
Block on A
Process A
Block on B
Process B
Block on C
Process C
These are similar load paths, but they're actually very different. In Case 1 the bundle optimizer should try to make {ABC} all available as soon as
possible. That means they should be blocked together as one unit to ensure there's no seek between them, and there's no need to make them available
one by one. In Case 2 you should again try to make ABC linear to avoid seeks, but really the most important thing is to make A available as quickly
as posible, because if it there is some time needed to get B it will be somewhat hidden by processing A.
Anyway, I'm kind of rambling and I'm not happy with all of this.
I've got a lot of complication currently because I'm trying to support a lot of different usages. One of those
usages that I've been trying to support is the "flat load".
"Float Load" = a game resource compiler knows how to write the bits to disk exactly the way that the game wants them
in memory. This lets you load up the resource and just point directly at it (maybe fix some pointers internal to
the resource, but ideally not). This was a big deal back in the old days; it allows "zero CPU use" streaming - you
just fire an async disk op, then when it's done you point at it. We did this on Xbox 1 for Stranger and it was crucial
to being able to seamlessly page so much of the game.
This is obviously the super fast awesome way to load resources, but I don't think that many people actually do this
any more. And it's less and less important all the time. For one thing, almost everybody is compressing data now
to have better bandwidth, so the "flat load" is a bit of a myth - you're actually streaming the data through a
decompressor. Almost every system now is multi-core, both on PC's and consoles, and people can afford to devote maybe
25% of one core to loading work, so the necessity of having "zero CPU use" streaming which the Flat Load offers is
going away.
Anyway, the reason the Flat Load is so complicated is because it means I have to support the case that the application
is pointing into the bundle memory. That means a lot of things. For systems with "cpu" and "gpu" separate memory
regions, it means I have to load the pieces of the bundle into the right regions. It means I need to communicate
with the app to know when it's okay for me to free that memory.
It also creates a huge issue for Bundle unloading and paging because you have to deal with the "resource transfer" issue.
I won't even get into this, but it's the source of much ridicule from Casey and the cause of some of our biggest pain
at Oddworld.
Anyway, I think I might get rid of the "Flat Load" completely and just assume that my job is just to page in the
resource bits, and the game will spin on this bits, and then I can throw them away. That lets me make things a
lot simpler at the low level, which would let me make the high level easier to use and neater.
I think my selling point will really be in all the neat friendly high level tools, like the load profiler, memory
use visualizer, disk-watcher for demand loading, console file transfers, all that kind of jazz.
BTW :
"Data Streaming" = Bringing in bits of data incrementally and processing or showing the bits as they come (eg. not
just waiting until all the data is in before it is shown). eg. Videos "stream".
"Data Paging" = Bringing in and out bits of data based on what's needed. When you run around a big seamless world game
like GTA it is "paging" in objects - NOT STREAMING.
"01-07-09 - Oodle Rambling"
No comments yet. -