A general sample complexity analysis of vanilla policy gradient

Mar 28, 2022

Speakers

About

We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex optimization to obtain convergence guarantees and sample complexities for the vanilla policy gradient (PG) – REINFORCE and GPOMDP. Our only assumptions are that the expected return is smooth w.r.t. the policy parameters and that the second moment of its gradient satisfies a certain ABC assumption. The ABC assumption allows for the second moment of the gradient to be bounded by A≥ 0 times the suboptimality gap, B ≥ 0 times the norm of the full batch gradient and an additive constant C ≥ 0, or any combination of aforementioned. We show that the ABC assumption is more general than the commonly used assumptions on the policy space to prove convergence to a stationary point. We provide a single convergence theorem under the ABC assumption, and show that, despite the generality of the ABC assumption, we recover the 𝒪(ϵ^-4) sample complexity of PG. Our convergence theorem also affords greater flexibility in the choice of hyper parameters such as the step size and places no restriction on the batch size m. Even the single trajectory case (i.e., m=1) fits within our analysis. We believe that the generality of the ABC assumption may provide theoretical guarantees for PG to a much broader range of problems that have not been previously considered.

Organizer

About AISTATS 2022

AISTATS is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas. Since its inception in 1985, the primary goal of AISTATS has been to broaden research in these fields by promoting the exchange of ideas among them. We encourage the submission of all papers which are in keeping with this objective at AISTATS.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow AISTATS 2022