Adding run-time type information to the GNU toolchain and run-time

Sep 9, 2017



Run-time type information is of potential use to debuggers, run-time tracing or checking tools, dynamic linking, language interoperability techniques, garbage collectors and a variety of other tools and services. Our toolchains presently lack any common approach to this: in C there is no such notion, while in C++ and higher-level languages, each language or implementation invents its own. The result is a familiar catalogue of omissions and ad-hoc workarounds: conservative garbage collection, guesswork in debuggers, fragile manual approaches to interface description (consider strace or ltrace, or countless FFI systems), and plain old "doing without" useful features or checks (e.g. sanitizer-style tools could do greater checking given a notion of run-time type). A relatively low-lying notion of run-time type, roughly at the ABI level, could add value in all these areas in a uniform, process-wide fashion. I'll describe my research work on liballocs, a runtime and toolchain extension embodying one approach to this. Currently the system consists of link-time tooling and a runtime library, with a little source-to-source tooling to support C (and some C++). The basic idea is to postprocess DWARF debugging information into optional, loadable "meta-DSOs" containing type definitions and other information (including about heap allocation sites) in an efficient representation. The liballocs runtime, which logically sits very close to the dynamic linker, exposes a process-wide meta-level protocol for in-memory objects and their allocators. Its primary operation allows clients to query what is on the end of an arbitrary pointer. More generally, the protocol offers an abstract view of the address space in the form of "typed allocations". A key challenge is establishing what the memory objects are, given a multitude of allocators. For this, liballocs has a hierarchical model: top-level allocations are memory mappings, "leaf" allocations contain user data (malloc chunks, stack frames, data segments, etc.), and intermediate levels are allocator-owned. Allocators are pluggable, accommodating vastly differing metadata indexing methods in each case (e.g. heap versus stack) and also allowing for user allocators, although default implementations accommodate most custom heap allocators out-of-the-box. Currently, the overheads involved are usually (but not always) less than 10% of execution time and a modest cost in memory; these can undoubtedly be improved further. Various applications have so far been prototyped on top of liballocs: instrumentation for dynamically checking pointer casts; run-time pointer queries during debugging; a metaprogramming-style interface description available to tracing tools; and for FFI-less interop with dynamic languages (prototyped in a hacked version of V8). I'll give an overview of the system and existing applications, including some demos. I'll also detail some remaining rough edges and work needed, and discuss ideas for integrating it more closely into (most likely) gcc, glibc and gdb. I will be extremely keen to obtain feedback, explore overlap with other typeor checking-flavoured tooling (e.g. libabigail, sanitizers), and engage on the potential for ABI-level working standards for type information. The research work has been published at Onward! 15 [1] and OOPSLA '16 [2].



About GNU

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Ada, and Go, as well as libraries for these languages (libstdc++,...). GCC was originally written as the compiler for the GNU operating system. The GNU system was developed to be 100% free software, free in the sense that it respects the user's freedom.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker