Dec 6, 2022
Recent years have witnessed significant success in Gradient Boosted Decision Trees (GBDT) for a wide range of machine learning applications. Generally, a consensus is agreement among GBDT's training algorithms that gradients and statistics are computed based on high-precision floating point. In this paper, we investigate an essentially important question but has been largely ignored by the previous literature - how many bits are in need for representing gradients in training GBDT? To solve this mystery, we propose to quantize all the high-precision gradients in a very simple yet effective way in the GBDT's training algorithm. Surprisingly, both our theoretical analysis and empirical studies show that the necessary precisions of gradients without hurting any performance can be quite low, e.g., 2 or 3 bits. With low-precision gradients, most arithmetic operations in GBDT training can be replaced by integer operations of 8, 16, or 32 bits. Promisingly, these findings may pave the way for much more efficient training of GBDT from several aspects: (1) speeding up the computation of gradients and histograms; (2) compressing the communication cost of high-precision statistical information during distributed training; (3) the inspiration of utilization and development of hardware architectures which well support low-precision computation. Benchmarked on CPU, GPU, and distributed clusters, we observe up to 2× speedup of our simple quantization strategy comparing with SOTA GBDT systems on extensive datasets, demonstrating the effectiveness and potential of the low-precision training of GBDT.Recent years have witnessed significant success in Gradient Boosted Decision Trees (GBDT) for a wide range of machine learning applications. Generally, a consensus is agreement among GBDT's training algorithms that gradients and statistics are computed based on high-precision floating point. In this paper, we investigate an essentially important question but has been largely ignored by the previous literature - how many bits are in need for representing gradients in training GBDT? To solve this…
Account · 957 followers
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker