From e82feb7521cf5988076d307b9527bf7d547a5a02 Mon Sep 17 00:00:00 2001
From: Amit Parag <aparag@laas.fr>
Date: Mon, 21 Feb 2022 14:16:35 +0100
Subject: [PATCH] amit icra 22

---
 amit_icra_22.md | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 amit_icra_22.md

diff --git a/amit_icra_22.md b/amit_icra_22.md
new file mode 100644
index 0000000..c42af56
--- /dev/null
+++ b/amit_icra_22.md
@@ -0,0 +1,30 @@
+---
+title: "Value learning from trajectory optimization and Sobolev descent: A step toward reinforcement learning with superlinear convergence properties"
+subtitle: IEEE ICRA - International Conference on Robotics and Automation, 2022
+author:
+- Amit Parag ^1, 2^
+- Sébastien Kleff ^1,3^
+- Léo Saci ^1^
+- <a href="https://gepettoweb.laas.fr/index.php/Members/NicolasMansard">Nicolas Mansard</a> ^1,2^
+- <a href="https://homepages.laas.fr/ostasse/drupal/">Olivier Stasse</a> ^1,2^
+org:
+- ^1^ Gepetto Team, LAAS-CNRS, France.
+- ^2^ Artificial and Natural Intelligence Toulouse Institute, Toulouse.
+- ^3^ New York University, USA
+
+hal: https://hal.archives-ouvertes.fr/hal-03356261
+peertube: https://peertube.laas.fr/videos/embed/9a3c5258-e5b7-49a5-a153-02e804a06f65
+sourcecode: https://github.com/amitparag/Kuka-arm-DvP
+...
+
+## Abstract
+
+The recent successes in deep reinforcement learning largely rely on the capabilities of generating masses of data, which in turn implies the use of a simulator.
+In particular, current progress in multi body dynamic simulators are underpinning the implementation of reinforcement learning for end-to-end control of robotic systems.
+Yet simulators are mostly considered as black boxes while we have the knowledge to make them produce a richer information.
+In this paper, we are proposing to use the derivatives of the simulator to help with the convergence of the learning. 
+For that, we combine model-based trajectory optimization to produce informative trials using 1st- and 2nd-order simulation derivatives.
+These locally-optimal runs give fair estimates of the value function and its derivatives, that we use to accelerate the convergence of the critics using Sobolev learning.
+We empirically demonstrate that the algorithm leads to a faster and more accurate estimation of the value function.
+The resulting value estimate is used in model-predictive controller as a proxy for shortening the preview horizon.
+We believe that it is also a first step toward superlinear reinforcement learning algorithm using simulation derivatives, that we need for end-to-end legged locomotion.
-- 
GitLab