Author of the publication

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks.

, , and . CoRR, (2018)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Blackwell Approachability and Low-Regret Learning are Equivalent, , and . CoRR, (2010)A Learning-Based Approach to Reactive Security., , , , , and . IEEE Trans. Dependable Secur. Comput., 9 (4): 482-493 (2012)Can a Transformer Represent a Kalman Filter?, and . CoRR, (2023)Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency., , , and . CoRR, (2024)How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?, , , , , and . CoRR, (2023)REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs., and . UAI, page 35-42. AUAI Press, (2009)Corrigendum to "Prediction, learning, uniform convergence, and scale-sensitive dimensions" J. Comput. Syst. Sci. 56 (2) (1998) 174-190., and . J. Comput. Syst. Sci., (March 2024)An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit., , and . ALT, volume 201 of Proceedings of Machine Learning Research, page 1166-1215. PMLR, (2023)FLAG: Fast Linearly-Coupled Adaptive Gradient Method., , , and . CoRR, (2016)Optimal Mean Estimation without a Variance., , , and . CoRR, (2020)