Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization.

R. Dossa, S. Huang, S. Ontañón, and T. Matsubara. IEEE Access, (2021)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Josef Huang

Pa Huang

Feiqing Huang

Zhida Huang

Ying Huang

Other publications of authors with the same name

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization.S. Huang, M. Noukhovitch, A. Hosseini, K. Rasul, W. Wang, and L. Tunstall. CoRR, (2024)CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms.S. Huang, R. Dossa, C. Ye, J. Braga, D. Chakraborty, K. Mehta, and J. Araújo. J. Mach. Learn. Res., (2022)Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks.R. Sullivan, A. Kumar, S. Huang, J. Dickerson, and J. Suarez. CoRR, (2023)EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine.J. Weng, M. Lin, S. Huang, B. Liu, D. Makoviichuk, V. Makoviychuk, Z. Liu, Y. Song, T. Luo, Y. Jiang and 2 other author(s). NeurIPS, (2022)Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform.S. Huang, J. Weng, R. Charakorn, M. Lin, Z. Xu, and S. Ontañón. CoRR, (2023)Zephyr: Direct Distillation of LM Alignment.L. Tunstall, E. Beeching, N. Lambert, N. Rajani, K. Rasul, Y. Belkada, S. Huang, L. von Werra, C. Fourrier, N. Habib and 4 other author(s). CoRR, (2023)Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform.S. Huang, J. Weng, R. Charakorn, M. Lin, Z. Xu, and S. Ontañón. ICLR, OpenReview.net, (2024)A Closer Look at Invalid Action Masking in Policy Gradient Algorithms.S. Huang, and S. Ontañón. FLAIRS, (2022)Griddly: A platform for AI research in games.C. Bamford, S. Huang, and S. Lucas. CoRR, (2020)Gym-µRTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning.S. Huang, S. Ontañón, C. Bamford, and L. Grela. CoG, page 1-8. IEEE, (2021)

BibSonomy

Disambiguation of "Huang, Shengyi"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization.

Please choose a person to relate this publication to

Josef Huang

Pa Huang

Feiqing Huang

Zhida Huang

Ying Huang

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Huang, Shengyi"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization.

Please choose a person to relate this publication to

Josef Huang

Pa Huang

Feiqing Huang

Zhida Huang

Ying Huang

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization.