Apr 4, 2021
The rapid demand for memory and computational resources by the emerging complex applications requires multi-core parallel systems capable to scale the execution of these applications. In this paper, we propose a distributed graph-theoretic framework for automatic parallelization in multi-core systems, where the goal is to minimize the data communication while accounting for intrinsic functional interdependence and balancing the workload among cores to improve the overall performance. Specifically, we design a generic and flexible greedy-based vertex cut framework for partitioning the LLVM IR graphs into clusters while taking into consideration the data communication and workload balance among clusters. Then, we map the clusters generated by the vertex cut algorithms onto a non-uniform memory access multi-core platform. Experimental results demonstrate that our proposed WB-Libra algorithm provides performance improvements of 1.56x and 1.86x over existing state-of-the-art approaches for 8 and 1024 clusters running on a multi-core platform, respectively.
The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker