I’m working on developing a novel programming language, working title ‘aevum’. As part of that process, I’ll be writing a series of articles about various aspects of language design and development.
Every long running computer program is going to need to obtain a variety of resources, and those resources are almost always finite. Memory, file handles, threads, GPU resources – all of these are relatively scarce, and exhausting the available supply will have dire consequences, anywhere from killing the program, to crashing the computer on which it runs.
Given this scarcity, it is essential that we can dispose of these resources as soon as we are finished using them (or at least, before they are needed elsewhere). Although that sounds simple enough, it turns out that there are a couple of hurdles to overcome.
The first hurdle relates to ownership. As long as every resource is owned exactly once (i.e. a member variable of one object, or a local variable to one function), then disposal is trivial – a resource is disposed of as soon as it’s parent is disposed of. But requiring single ownership of every object comes with disadvantages of its own: with strict single ownership you can’t easily maintain cyclic data structures such as bi-directional lists, graphs or trees.
On the other hand, if you elect to allow multiple ownership, you are then faced with the problem of how to determine when a resource is actually no longer being used. Obviously you can’t dispose of it as long as even a single owner still exists, but how do you determine that the last owner is gone? You can explicitly keep track of the list of owners for each resource (a la reference counting), at the expense of both storage and performance, or you can at regular intervals scan the entire system to determine objects without owners (a la tracing garbage collectors), at the cost of determinism and performance.
Manual resource disposal
Manual resource disposal was once a staple of imperative languages, and while one might hope that it would be included here as a mere historical footnote, that is sadly not the case. The majority of garbage collected languages (including Java, Python and C#, to name but a few), make little explicit provision for the disposal of scarce non-memory resources. Why they do offer some support for locally-scoped resources (python’s with statement, or C#’s using statement), long-lived or shared resources have to be managed by manual calls to a dispose method, potentially augmented by a manual reference counting system.
Why is this less than ideal? Primarily because it places the burden of resource disposal squarely on the programmer. Not only does it require a significant amount of additional code, but forgetting to call the disposal methods in one of any number of places will prevent the disposal of those resources.
Tracing garbage collection
Tracing garbage collectors are the bread and butter of modern programming languages. I’m not going to describe the workings in detail – there are many resources on the subject. Suffice it to say that at regular intervals we trace through all reachable objects, mark them as live, and dispose of everything else. The typical criticism of garbage collectors is that they are not deterministic, and collection may occur at any time, interrupting the normal executing of the program. While that represents a serious problem in hard real-time applications, there are a variety of ways to work around the problem, and I am mostly interested in a more subtle manifestation of the same issue.
The tracing portion of a garbage collection cycle has to touch all reachable memory, and the collection phase has to free every orphaned object, both of which may take a significant amount of time. For this reason, garbage collection is typically delayed until the system detects that it is about to run out of memory. That’s all very well, but what if it isn’t memory we are about to run out of? The garbage collector doesn’t know anything about, say, file handles, so even if you run out of file handles, as long as there is plenty of memory (and modern machines have plentiful memory) garbage collection won’t be triggered.
The typical solution to this problem is to require manual disposal of non-memory resources, which results in the same drawbacks we have already discussed above.
Reference counting
Reference counting is the darling of the C++ world, and sees pretty wide use even in other languages. Again, I’m not going to describe the workings in detail, but the basics are to attach a reference count to each object, increment that count each time a reference to the object is created, and decrement the count each time such a reference is destroyed. If the reference count drops to zero you know that there are no references to the object, and it can be deleted immediately.
Reference counting offers one key advantage over garbage collection: all objects can be disposed as soon as they are no longer needed. This is excellent from the perspective of disposing non-memory resources, but it unfortunately goes hand-in-hand with a number of flaws.
The first flaw is that reference counting in vulnerable to cycles, where objects A and B either directly or indirectly refer to each other, thereby preventing the reference count of either from ever reaching zero. This flaw is further is compounded by the fact that many common data structures (doubly-linked lists, bi-directional trees, cyclical graphs, etc.) all involve such circular references. We can mitigate this by making the programmer define which single references actually confers ownership (strong vs weak references), but this adds significant mental overhead for the programmer, and just emulates the ideal case of single ownership. We can also allow cycles, and run cycle detection to break these cycles, but that is roughly equivalent to garbage collection, and shares the same drawbacks.
The second flaw is that not only does the need to attach a reference count to every object consume additional memory, but the near-constant increment/decrement of reference counts also puts a considerable strain on the cache. This can be reduced by careful optimization of where reference counts are updated, and by deferring updates so as to batch them together, but the former adds to programmer complexity, and the latter drastically reduces the benefit of immediate disposal.
ARC (automatic reference counting)
Apple deployed reference counting as a crucial part of their API design with the advent of Mac OS X and their Objective-C frameworks. Initially this reference counting had to be done through manual calls to retain/release methods, and with the addition of some supporting constructs (such as autorelease pools) and a strongly documented convention for programmers to follow, this was very successful (albeit a little tedious).
After a brief foray into traditional garbage collection (which failed to meet the needs of the fast-growing iPhone ecosystem), they hit on a simpler idea: what if the compiler could perform reference counting for you? Mechanically following the same conventions provided to the programmer, and augmented by a couple of language annotations to influence behaviour, the compiler can guarantee to implement correct reference counting, and the optimiser can strip back out most redundant calls to remove much of the overhead thus introduced.
In general it is a very neat system, the major drawback being that it still relies on the programmer to annotate weak references correctly, and there remains some overhead in maintaining the necessary reference counts.
Rust
There are a number of interesting resource management ideas knocking around in the Mozilla foundation’s Rust, namely unique pointers, borrowed references, and automatic region management.
Unique pointers are declared with a ~ (tilde character), and they uniquely own the object they point to. As in, no other object holds a pointer to it. If you assign one unique pointer to another, the contents are transferred (not copied) and the original no longer points to the owned object. Unique pointers make resource management a piece of cake, because if the pointer uniquely owns its contents, then we can destroy the contents as soon as the pointer goes out of scope.
Of course, as I mentioned in the introduction, there are a whole class of data structures which are very hard to create with only unique pointers, and that’s where borrowed references come in. Rust lets you declare a reference with the & (ampersand character), and this can be used to ‘borrow’ a reference from a unique pointer. The borrowed reference refers to the same thing the unique pointer does, but it does not own the referenced object, and the compiler guarantees that the reference will not outlive the unique pointer (thus never impacting resource collection at all).
Since our references must be statically guaranteed not to outlive the matching unique pointer, we’d be quite limited in what we could do with these references. For example, we wouldn’t be able to store such a reference in a container, because the lifetime of the container might outlast our unique pointer. And this is why we need automatic region management: regions define scopes within which lifetimes exist, and by limiting the reference to the region containing the unique pointer, we guarantee that the reference cannot outlive the pointer. But regions are hierarchical, and automatically created for every scope, so that as long as a container is owned by a child region of the region holding our unique pointer, we can add references to that container freely, secure in the knowledge that the container too will not outlive the original unique pointer.
And the best part is that the compiler can statically determine a bunch of this at compile time, and hand you back nice little errors when you violate regions, thus avoiding most of the runtime overhead. There are of course limitations, the programmer still has to be cognisant of the difference between unique pointers and borrowed references, and attempts to move referenced objects will occasionally induce head-scratching compiler errors. But overall, it is a very solid approach to purely deterministic resource disposal with minimal overhead.
Can we do better?
Maybe.
Apple’s ARC and Rust’s unique/borrowed/region system are both very promising approaches to improving the accuracy and performance of resource disposal, while lowering the cognitive load on the programmer and reducing the incidence resource leaks. And they both avoid the crippling restrictions on programmers imposed by classical research approaches, such as complete immutability of data or linear types. However, both continue to have some cognitive overhead, and both are relatively new approaches with the possibility of as yet unforeseen complications.
But for now, the trend of compiler-centric resource disposal approaches seems to be here to stay.
Very clear & interesting summary!
I wonder if it’s possible (in a new language) for the alternative resource management strategies you mentioned to all be possible, as different implementations of “macros” or “inline types and procedures” which spell out the storage management strategies to be used by the datatypes in the rest of the program.
That is, in these macros, data normally only known to a compiler would have an explicit name, e.g. an object representing the lexical scope of a reference and knowing its properties (e.g. lifetime, as a node in a hierarchy of lifetimes like in Rust).
A pointer would have both an address and a lifetime of validity, and a pointer-slot (any lvalue able to contain a pointer, as understood by some lexical context) would either permit storing this lifetime explicitly (itself a pointer to a special object), or not store the lifetime but rather assume a specific lifetime. Storing a pointer into the latter type of slot would effectively alter its lifetime, and the macro that did this would assert that this alteration only reduced the lifetime or left it unchanged, never increased it (relative to the symbolic DAG of lifetime objects it understands).
(I am mostly thinking out loud here — I have thought about this a bit, but never worked out the details of an approach like this.)
This is to clarify what I was proposing yesterday.
I imagine a language in which small objects defined by user-coded classes can be passed around with the expectation that, when they’re used locally enough and with types declared well enough, the compiler will inline their storage (e.g. store their fields in registers and know their class implicitly) and sometimes optimize away some of their fields, and inline all their methods.
I propose that every lvalue, whether explicit (struct field or local variable) or implicit (for passing arguments or containing return values or intermediate results of subexpressions), also has an object like that, user-supplied from the lexical context, representing that value-slot. Also, every scope for lvalues (stack frame, smaller lexical scope, struct, storage region) has an associated “value-scope object”.
So up to three objects can be involved with one value stored in one slot: the value, the slot (aka lvalue), and the scope.
Then whenever the compiler wants to copy or move or swap some values, it compiles a generic function call on the involved lvalue slots; their types (classes) determine how that copy or move is implemented; that implementation code also has access to the value-scope object containing/controlling that slot, and of course to the value itself.
(It also compiles constructors and destructors for all these objects; those also have access to the relevant containing objects.)
Finally, we want the compiler to optimize well enough that in most cases, it knows the types of all objects mentioned, so it needn’t store their classes explicitly, and it knows all the method implementations at compile time, and inlines them (and further optimizes the resulting code). (I don’t know whether the current optimization techniques can quite do this, but it doesn’t sound impossible. Certainly there’s been a lot of progress on that, e.g. in pypy.)
The intent is that by properly coding those classes and methods, we could imitate any of the resource management strategies you described. If that is possible, then by further evolving their code we could experiment with fancier methods, and/or let the user declare which strategy to use with certain application-level types.
I hate to say it so bluntly because you have an excellent blog here that reflects the work of a talented developer, but here goes.
In serious game development (which I assume you are referring to since you mention GPU management), if you find yourself trying to figure out who owns a resource, you’re doing it wrong.
It doesn’t matter if it’s the memory representing player state, or a one-shot buffer on a sound card, the owner of a resource is “the indexed, fixed-sized pool I built to manage that resource”. By indexed, I mean you use names (strings, enums, 4-char codes…) to refer to your resources, and make sure all your code can deal with a lookup on that name where the resource has been pulled out from under your feet.
There’s a lot more to say about this approach. But in short, level load is the time to allocate all your pools. After level load, you shouldn’t be allocating or freeing anything except as needed within a pool’s management of the fixed resource. This is what I’ve found in the A-level console games I’ve worked on. I’ve seen A-level PC games where they get this quite wrong, and the resulting bugs and performance problems are bad.
In short, language tools won’t help you much. How do you treat the 25th bullet when you only have room for 24 – do you keep the newest one and kill the oldest? Or ignore the newest? Or prioritize based on whether a bullet is approaching a target? That code has to be written by you.