Jump to Content
Jie Tan

Jie Tan

I joined the Brain team at Google in 2016, working on deep learning, reinforcement learning and robotics. Before that, I was a Member of Technical Staff at the Computational Imaging group at Lytro, working on computer vision, SLAM, light field technology and image processing. I got my PhD of computer science from Georgia Tech in 2015, under the supervision of Greg Turk and Karen Liu.

My research focused on developing computational tools to understand, simulate and control human and animal motions in a complex environment. I developed fast and stable computer programs to simulate complex dynamic systems, such as fluid, soft body and articulated rigid bodies. I applied optimal control and machine learning techniques to enable computers to automatically learn skills inside a complex physical environment.

I am also interested in transferring the control policies that are learned in simulations to real robots. Policies learned in a simulation usually perform poorly on real robots due to the discrepancies between the simulated and the real system. I am developing tools to understand and model such discrepancies. I augmented the physical simulation using real-world data, which not only increases the simulation accuracy, but also improves the real-world performance of the controllers.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
    Anthony G. Francis
    Dmitry Kalashnikov
    Edward Lee
    Jake Varley
    Leila Takayama
    Mikael Persson
    Peng Xu
    Stephen Tu
    Xuesu Xiao
    Conference on Robot Learning (2022) (to appear)
    Preview abstract Despite decades of research, existing navigation systems still face real-world challenges when being deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints of Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers---a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves 40% better goal reached in cluttered environments and 65% better sociability when navigating around humans. View details
    Preview abstract Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success. View details
    Safe Reinforcement Learning for Legged Locomotion
    Jimmy Yang
    Peter J. Ramadge
    Sehoon Ha
    International Conference on Robotics and Automation (2022) (to appear)
    Preview abstract Designing control policies for legged locomotion is complex due to underactuation and discrete contact dynamics. To deal with this complexity, applying reinforcement learning to learn a control policy in the real world is a promising approach. However, safety is a bottleneck when robots need to learn in the real world. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy and a learner policy. The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally interfering with the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance. We verify the proposed framework in three locomotion tasks on a simulated quadrupedal robot: catwalk, two-leg balance, and pacing. On average, our method achieves 48.6% fewer falls and comparable or better rewards than the baseline methods. View details
    Preview abstract We propose an end-to-end framework to enablemultipurpose assistive mobile robots to autonomously wipetables and clean spills and crumbs. This problem is chal-lenging, as it requires planning wiping actions with uncertainlatent crumbs and spill dynamics over high-dimensional visualobservations, while simultaneously guaranteeing constraintssatisfaction to enable deployment in unstructured environments.To tackle this problem, we first propose a stochastic differentialequation (SDE) to model crumbs and spill dynamics and ab-sorption with the robot wiper. Then, we formulate a stochasticoptimal control for planning wiping actions over visual obser-vations, which we solve using reinforcement learning (RL). Wethen propose a whole-body trajectory optimization formulationto compute joint trajectories to execute wiping actions whileguaranteeing constraints satisfaction. We extensively validateour table wiping approach in simulation and on hardware. View details
    Learning Semantic-Aware Locomotion Skills from Human Demonstration
    Byron Boots
    Xiangyun Meng
    Yuxiang Yang
    Conference on Robot Learning (CoRL) 2022 (2022) (to appear)
    Preview abstract The semantics of the environment, such as the terrain type and property, reveals important information for legged robots to adjust their behaviors. In this work, we present a framework that learns semantics-adaptive gait controllers for quadrupedal robots. To facilitate learning, we separate the problem of gait planning and motor control using a hierarchical framework, which consists of a high-level image-conditioned gait policy and a low-level MPC-based motor controller. In addition, to ensure sample efficiency, we pre-train the perception model with an off-road driving dataset, and extract an embedding for downstream learning. To avoid policy evaluation in the noisy real world, we design a simple interface for human operation and learn from human demonstrations. Our framework learns to adjust the speed and gait of the robot based on terrain semantics, using 40 minutes of human demonstration data. We keep testing the performance of the controller on different trails. At the time of writing, the robot has walked 0.2 miles without failure. View details
    Preview abstract Reinforcement learning provides an effective tool for robots to acquire diverse skills in an automated fashion.For safety and data generation purposes, control policies are often trained in a simulator and later deployed to the target environment, such as a real robot. However, transferring policies across domains is often a manual and tedious process. In order to bridge the gap between domains, it is often necessary to carefully tune and identify the simulator parameters or select the aspects of the simulation environment to randomize. In this paper, we design a novel, adversarial learning algorithm to tackle the transfer problem. We combine a classic, analytical simulator with a differentiable, state-action dependent system identification module that outputs the desired simulator parameters. We then train this hybrid simulator such that the output trajectory distributions are indistinguishable from a target domain collection. The optimized hybrid simulator can refine a sub-optimal policy without any additional target domain data. We show that our approach outperforms the domain-randomization and target-domain refinement baselines on two robots and six difficult dynamic tasks. View details
    Fast and Efficient Locomotion via Learned Gait Transitions
    Yuxiang Yang
    Erwin Coumans
    Byron Boots
    Conference on Robot Learning (2021)
    Preview abstract We focus on the problem of developing energy efficient controllers for quadrupedal robots. Animals can actively switch gaits at different speeds to lower their energy consumption. In this paper, we devise a hierarchical learning framework, in which distinctive locomotion gaits and natural gait transitions emerge automatically with a simple reward of energy minimization. We use evolutionary strategies (ES) to train a high-level gait policy that specifies gait patterns of each foot, while the low-level convex MPC controller optimizes the motor commands so that the robot can walk at a desired velocity using that gait pattern. We test our learning framework on a quadruped robot and demonstrate automatic gait transitions, from walking to trotting and to fly-trotting, as the robot increases its speed. We show that the learned hierarchical controller consumes much less energy across a wide range of locomotion speed than baseline controllers. View details
    Learning to walk on complex terrains with vision
    Ale Escontrela
    Erwin Johan Coumans
    Peng Xu
    Sehoon Ha
    Conference on Robotic Learning (2021)
    Preview abstract Visual feedback is crucial for legged robots to safely and efficiently handle uneven terrains such as stairs. However, effectively training robots to effectively consume high dimensional visual input for locomotion is challenging. In this work, we propose a framework to train a vision-based locomotion controller for quadruped robots to traverse a variety of uneven environments. Our key idea is to model the locomotion controller as a hierarchical structure with a high-level vision policy and a low-level motion controller. The high-level vision policy takes as input the perceived vision inputs as well as robot states and outputs desired foothold placement and base movement of the robot, which is realized by low level motion controller composed of a position controller for swing legs and a MPC-based torque controller for stance legs. We train the vision policy using Deep Reinforcement Learning and demonstrate our approach on a variety of uneven environments such as step-stones, stairs, pillars, and moving platforms. We also deploy our policy on a real quadruped robot to walk over a series of random step-stones. View details
    Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous
    Rose E. Wang
    Dennis Lee
    Edward Lee
    Brian Andrew Ichter
    Conference on Robot Learning (CoRL) (2020)
    Preview abstract Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans. View details
    Preview abstract Imitation learning is a popular approach for training effective visual navigation policies. However, collecting expert demonstrations for a legged robot is less practical because the robot is hard to control, and it walks slowly and cannot run continuously for a long time. In this work, we propose a zero-shot imitation learning framework for training a visual navigation policy on a legged robot from human demonstration (third-person perspective) only, allowing for more cost-effective data collection with better navigation capability. However, imitation learning from third-person perspective demonstrations raises unique challenges. Human demonstrations are captured with different camera perspectives, therefore, we design a feature disentanglement network~(FDN) that extracts perspective-agnostic state features. We reconstruct missing action labels by either building an inverse model of the robot's dynamics in the feature space and applying it to the demonstrations or developing efficient GUI to label human demonstrations. We take a model-based imitation learning approach for training a visual navigation policy from the perspective-agnostic, action-labeled demonstrations. We show that our framework can learn an effective visual navigation policy for a legged robot, Laikago, from expert demonstrations in both simulated and real-world environments. Our approach is zero-shot as the robot never tries to navigate a certain navigation path in the testing environment before the testing phase. We also justify our framework by performing an ablation study and comparing it with baseline algorithms. View details
    Learning Agile Robotic Locomotion Skills by Imitating Animals
    Edward Lee
    Erwin Johan Coumans
    Jason Peng
    Sergey Levine
    Robotics: Science and Systems 2020, RSS Foundation (2020)
    Preview abstract Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually designed controllers have been able to emulate many complex behaviors, building such controllers often involves a tedious engineering process, and requires substantial expertise of the nuances of each skill. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a common framework is able to automatically synthesize controllers for a diverse repertoire behaviors. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to train adaptive policies in simulation, which can then be quickly finetuned and deployed in the real world. Our system enables an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns. View details
    Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning
    Yuxiang Yang
    Wenbo Gao
    Chelsea Finn
    International Conference on Intelligent Robots and Systems (IROS) (2020) (to appear)
    Preview abstract Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation operator and combine it with meta-learning based on evolutionary strategies. Our method significantly improves adaptation to changes in dynamics in high noise settings, which are common in robotics applications. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics. We observe that our method significantly outperforms prior gradient-based approaches, enabling the robot to adapt its policy to changes based on less than 3 minutes of real data. View details
    Learning Fast Adaptation with Meta Strategy Optimization
    Erwin Johan Coumans
    Sehoon Ha
    Learning Fast Adaptation with Meta Strategy Optimization (2020)
    Preview abstract The ability to walk in new situations is a key milestone on the path toward real-world applications of legged robots. In this work, we introduce a novel algorithm for training locomotion policies for legged robots that can quickly adapt to new scenarios with a handful of trials in the target environment. We extend the framework of strategy optimization that trains a control policy with additional latent parameters in the simulation and transfers to the real robot by optimizing the latent inputs. The key idea in our proposed algorithm, Meta Strategy Optimization (MSO), is to formulate the problem as a meta-learning process by exposing the same strategy optimization to both the training and testing phases. This change allows MSO to effectively learn locomotion skills as well as a latent space that is suitable for fast adaptation. We evaluate our method on a real quadruped robot and demonstrate successful adaptation in various scenarios, including sim-to-real transfer, walking with a weakened motor, or climbing up a slope. Furthermore, we analyze the generalization capability of the trained policy in simulated environments and show that our method outperforms previous methods in both simulated and real environments. View details
    Preview abstract Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch with simple reward signals. In addition, users can provide an open loop reference to guide the learning process if more control over the learned gait is needed. The control policies are learned in a physical simulator and then deployed to real robots. In robotics, policies trained in simulation often does not transfer to the real world. We narrow this reality gap by improving the physical simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in real world. View details
    Preview abstract We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity. View details
    Preview abstract We propose a simple drop-in noise-tolerant replacement for the standard finite difference procedure used ubiquitously in blackbox optimization. In our approach, parameter perturbation directions are defined by a family of deterministic or randomized structured matrices. We show that at the small cost of computing a Fast Fourier Transform (FFT), such structured finite differences consistently give higher quality approximation of gradients and Jacobians in comparison to vanilla approaches that use coordinate directions or random Gaussian perturbations. We show that linearization of noisy, blackbox dynamics using our methods leads to improved performance of trajectory optimizers like Iterative LQR and Differential Dynamic Programming on several classic continuous control tasks. By embedding structured exploration in implicit filtering methods, we are able to learn agile walking and turning policies for quadruped locomotion, that successfully transfer from simulation to actual hardware. We give a theoretical justification of our methods in terms of bounds on the quality of gradient reconstruction in the presence of noise. View details
    No Results Found