Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries

Jun 17, 2022

Speakers

About

The increasing popularity of WebAssembly as a compilation target creates a demand for understanding and reverse engineering WebAssembly binaries. An important first step in this process is to recover the types of functions in the binary. Unfortunately, there currently is no automated approach for obtaining type information beyond the four built-in, low-level types of WebAssembly. This paper presents SnowWhite, a learning-based approach for recovering precise, high-level parameter and return types for WebAssembly functions. SnowWhite distinguishes itself from prior work for other binary formats by representing the types-to-predict in an expressive type language. This language can describe a large number of complex types, instead of the fixed, and usually small type vocabulary used in prior binary type prediction approaches. As types are sentences in the type language, we formulate the prediction as a sequence prediction task and build on the success of neural sequence-to-sequence models. We evaluate SnowWhite on a large-scale dataset of 6.3 million type samples extracted from over 300,000 WebAssembly object files. The results show the type language to be more expressive than prior work, precisely describing 1,225 types instead the 7 to 35 types considered previously. Despite this expressiveness, the type prediction has high accuracy, exactly predicting 44.5% (75.2%) of all parameter types and 57.7% (80.5%) of all return types within the top-1 (top-5) predictions.

Organizer

Categories

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow PLDI