Improving the Effectiveness and Generality of GCC Auto-vectorization : PoC.

by · Sep 8, 2017 · 190 views ·

Vectorization of array manipulating programs uses knowledge of the order of array accesses. In many cases, permuting the array elements could lead to better vectorization. Identification of profitable permutation depends on the order of accesses in the program as well as the instructions available on the target architecture. GCC's approach of instruction selection based on standard pattern names fails to identify effective permutations because it takes a local view of the GIMPLE statements by considering them separately. Besides, the implementation strategy of SLP (Superword Level Parallelism) adopted by GCC enhances the effectiveness of vectorization in a limited manner. Finally, most of the times, these suboptimal vector instructions cannot be combined later in combiner pass to generate desired instructions as the vector instructions generated depend upon transformations the loop undergoes, which are irreversible at RTL level as loop information is lost by then. We propose to improve auto-vectorization in GCC by facilitating examination of related GIMPLE statements in a loop together and utilizing the information so derived to transform the computations in two orthogonal ways: (a) We propose an alternative mechanism to implement SLP by coalescsing ASTs of independent computations. (b) We try to decide where and which permutations should be effected based on the knowledge of the available target instructions. Our approach improves the generality of vectorization along three dimensions: Apart from being processor agnostic, it facilitates consideration of all possible permutations as against a small set of empirically observed permutations. It also generalizes the SLP transformations using a simple and generic intuition. This also increases the effectiveness vectorization because the generated program is more efficient.

Watch SlidesLive on mobile devices

© SlidesLive Inc.