Apr 4, 2021
Current hyperparameter tuning solutions lack complementary execution engines to efficiently leverage distributed computation, thus ignoring the possibility of intra- and inter-GPU sharing, which exhibits poor resource usage. In this paper, we present FluidExec, a generalized hyperparameter tuning execution engine, that coordinates between hyperparameter tuning jobs and cluster resources. FluidExec schedules evaluation trials in such jobs using a water-filling approach to make the best use of resources both at intra- and inter-GPU granularities to speed up the tuning process. By abstracting a hyperparameter tuning job as a sequence of TrialGroup, FluidExec can boost the performance of diverse hyperparameter tuning solutions. Our experiments show that FluidExec can speed up synchronous BOHB by 200%, and BOHB and ASHA by 30% while having similar final accuracy.
The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker