Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization
H. Wang, P. Wu, and D. Padua. Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, page 295--305. ACM, (2014)
DOI: 10.1145/2544137.2544153
Abstract
The performance of R, a popular data analysis language, was never properly understood. Some claimed their R codes ran as efficiently as any native code, others quoted orders of magnitude slowdown of R codes with respect to equivalent C implementations. We found both claims to be true depending on how an R code is written. This paper introduces a first classification of R programming styles into Type I (looping over data), Type II (vector programming), and Type III (glue codes). The most serious overhead of R are mostly manifested on Type I R codes, whereas many Type III R codes can be quite fast. This paper focuses on improving the performance of Type I R codes. We propose the ORBIT VM, an extension of the GNU R VM, to perform aggressive removal of allocated objects and reduction of instruction path lengths in the GNU R VM via profile-driven specialization techniques. The ORBIT VM is fully compatible with the R language and is purely based on interpreted execution. It is a specialization JIT and runtime focusing on data representation specialization and operation specialization. For our benchmarks of Type I R codes, ORBIT is able to achieve an average of 3.5X speedups over the current release of GNU R VM and outperforms most other R optimization projects that are currently available.
%0 Conference Paper
%1 Wang:2014:ORV
%A Wang, Haichuan
%A Wu, Peng
%A Padua, David
%B Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
%D 2014
%I ACM
%K AST Bytecode Bytecode2Bytecode Interpreter Orbit R
%P 295--305
%R 10.1145/2544137.2544153
%T Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization
%X The performance of R, a popular data analysis language, was never properly understood. Some claimed their R codes ran as efficiently as any native code, others quoted orders of magnitude slowdown of R codes with respect to equivalent C implementations. We found both claims to be true depending on how an R code is written. This paper introduces a first classification of R programming styles into Type I (looping over data), Type II (vector programming), and Type III (glue codes). The most serious overhead of R are mostly manifested on Type I R codes, whereas many Type III R codes can be quite fast. This paper focuses on improving the performance of Type I R codes. We propose the ORBIT VM, an extension of the GNU R VM, to perform aggressive removal of allocated objects and reduction of instruction path lengths in the GNU R VM via profile-driven specialization techniques. The ORBIT VM is fully compatible with the R language and is purely based on interpreted execution. It is a specialization JIT and runtime focusing on data representation specialization and operation specialization. For our benchmarks of Type I R codes, ORBIT is able to achieve an average of 3.5X speedups over the current release of GNU R VM and outperforms most other R optimization projects that are currently available.
%@ 978-1-4503-2670-4
@inproceedings{Wang:2014:ORV,
abstract = {The performance of R, a popular data analysis language, was never properly understood. Some claimed their R codes ran as efficiently as any native code, others quoted orders of magnitude slowdown of R codes with respect to equivalent C implementations. We found both claims to be true depending on how an R code is written. This paper introduces a first classification of R programming styles into Type I (looping over data), Type II (vector programming), and Type III (glue codes). The most serious overhead of R are mostly manifested on Type I R codes, whereas many Type III R codes can be quite fast. This paper focuses on improving the performance of Type I R codes. We propose the ORBIT VM, an extension of the GNU R VM, to perform aggressive removal of allocated objects and reduction of instruction path lengths in the GNU R VM via profile-driven specialization techniques. The ORBIT VM is fully compatible with the R language and is purely based on interpreted execution. It is a specialization JIT and runtime focusing on data representation specialization and operation specialization. For our benchmarks of Type I R codes, ORBIT is able to achieve an average of 3.5X speedups over the current release of GNU R VM and outperforms most other R optimization projects that are currently available.},
acmid = {2544153},
added-at = {2014-03-19T15:03:27.000+0100},
articleno = {295},
author = {Wang, Haichuan and Wu, Peng and Padua, David},
biburl = {https://www.bibsonomy.org/bibtex/2ccc26dd39efd5e0f8b04199ab86ec725/gron},
booktitle = {Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization},
description = {Optimizing R VM},
doi = {10.1145/2544137.2544153},
interhash = {990826b7d2a8e2a41192f97101937db6},
intrahash = {ccc26dd39efd5e0f8b04199ab86ec725},
isbn = {978-1-4503-2670-4},
keywords = {AST Bytecode Bytecode2Bytecode Interpreter Orbit R},
location = {Orlando, FL, USA},
numpages = {11},
pages = {295--305},
publisher = {ACM},
series = {CGO'14},
timestamp = {2014-03-19T15:03:27.000+0100},
title = {Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization},
year = 2014
}