Aleksandra Faust
Aleksandra Faust is a Senior Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. Previously, Aleksandra founded and led Task and Motion Planning research in Robotics at Google, machine learning for self-driving car planning and controls in Waymo, and was a senior researcher in Sandia National Laboratories. She earned a Ph.D. in Computer Science at the University of New Mexico (with distinction), and a Master's in Computer Science from the University of Illinois at Urbana-Champaign. Her research interests include learning for safe and scalable reinforcement learning, learning to learn, motion planning, decision-making, and robot behavior. Aleksandra won IEEE RAS Early Career Award for Industry, the Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in the period of 2011-2014, and was named Distinguished Alumna by the University of New Mexico School of Engineering. Her work has been featured in the New York Times, PC Magazine, ZdNet, VentureBeat, and was awarded Best Paper in Service Robotics at ICRA 2018, Best Paper in Reinforcement Learning for Real Life (RL4RL) at ICML 2019, and Best Paper of IEEE Computer Architecture Letters in 2020.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Ofir Nachum
Yutaka Matsuo
Shane Gu
Izzeddin Gur
International Conference on Learning Representations (ICLR) (2024)
“Levels of AGI”: Operationalizing Progress on the Path to AGI
Jascha Sohl-dickstein
Allan Dafoe
Clement Farabet
Shane Legg
arXiv (2023)
Preview abstract
We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose “Levels of AGI” based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.
View details
Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles
David Brooks
Gu-Yeon Wei
Kshitij Bhardwaj
Paul Whatmough
Srivatsan Krishnan
Vijay Janapa Reddi
Zishen Wan
55th IEEE/ACM International Symposium on Microarchitecture®, IEEE (2022) (to appear)
Preview abstract
Building domain-specific accelerators is becoming increasingly paramount to meet the high-performance requirements under stringent power and real-time constraints. However, emerging application domains like autonomous vehicles are complex systems, where the constraints extend beyond just the computing stack. Manually selecting and navigating the design space to design custom and efficient domain-specific SoCs (DSSoC) is tedious and expensive. As such, there is a need for automated DSSoC design methodologies. In this paper, we use agile and autonomous UAVs as a case study for understanding how to automate the design of domain-specific SoCs for autonomous vehicles. Architecting a UAV DSSoC requires considering parameters such as sensor rate, compute throughput, and other physical characteristics (e.g., payload weight, thrust-to-weight ratio) that affect overall performance. Iterating over the many component choices results in a combinatorial explosion of the number of possible combinations: from 10s of thousands to billions, depending on implementation details. To navigate the DSSoC design space efficiently, we introduce \emph{AutoPilot}, a systematic methodology for automatically designing DSSoC for autonomous UAVs. AutoPilot uses machine learning to navigate the large DSSoC design space and automatically select a combination of autonomy algorithm and hardware accelerator while considering the cross-product effect across different UAV components. \autop consistently outperforms general-purpose hardware selections like Xavier NX and Jetson TX2, as well as dedicated hardware accelerators built for autonomous UAVs. DSSoC designs generated by \autop increase the number of missions on average by up to 2.25x, 1.62x and 1.43x for nano, micro, and mini-UAVs, respectively, over baselines. We also discuss how \autop can be extended to other related autonomous vehicles using the same set of principles.
View details
Tiny Robot Learning: Challenges and Directions for Machine Learning in Resource-Constrained Robots
Sabrina Neuman
Brian Plancher
Bardienus Pieter Duisterhof
Srivatsan Krishnan
Colby R. Banbury
Mark Mazumder
Shvetank Prakash
Jason Jabbour
Guido C. H. E. de Croon
Vijay Janapa Reddi
IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) special session on Low Power Autonomous Systems (2022) (to appear)
Preview abstract
Machine learning (ML) has become a pervasive tool across computing systems. An emerging application that stress-tests the challenges of ML system design is tiny robot learning, the deployment of ML on resource-constrained low-cost autonomous robots. Tiny robot learning lies at the intersection of embedded systems, robotics, and ML, compounding the challenges of these domains. Tiny robot learning is subject to challenges from size, weight, area, and power (SWAP) constraints; sensor, actuator, and compute hardware limitations; end-to-end system tradeoffs; and a large diversity of possible deployment scenarios. Tiny robot learning requires ML models to be designed with these challenges in mind, providing a crucible that reveals the necessity of holistic ML system design and automated end-to-end design tools for agile development. This paper gives a brief survey of the tiny robot learning space, elaborates on key challenges, and proposes promising opportunities for future work in ML system design.
View details
Multi-Task Learning with Sequence-Conditioned Transporter Networks
Michael Lim
Andy Zeng
Brian Andrew Ichter
Maryam Bandari
Erwin Johan Coumans
Claire Tomlin
Stefan Schaal
International Conference on Robotics and Automation 2022, IEEE (to appear)
Preview abstract
Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks.
View details
The Role of Compute in Autonomous Micro Aerial Vehicles: Optimizing for Flight Time and Energy Efficiency
Behzad Boroujerdian
Hasan Genc
Srivatsan Krishnan
Bardienus Pieter Duisterhof
Brian Plancher
Kayvan Mansoorshahi
Marcelino Almeida
Wenzhi Cui
Vijay Janapa Reddi
ACM Transactions on Computer Systems (TOCS) (2022) (to appear)
Preview abstract
Autonomous and mobile cyber-physical machines are becoming an inevitable part of our future. In particular,
Micro Aerial Vehicles (MAVs) have seen a resurgence in activity. With multiple use cases, such as surveillance,
search and rescue, package delivery, and more, these unmanned aerial systems are on the cusp of demonstrating
their full potential. Despite such promises, these systems face many challenges, one of the most prominent of
which is their low endurance caused by their limited onboard energy. Since the success of a mission depends on
whether the drone can finish it within such duration and before it runs out of battery, improving both the time
and energy associated with the mission are of high importance. Such improvements have traditionally arrived at
through the use of better algorithms. But our premise is that more powerful and efficient onboard compute can
also address the problem. In this paper, we investigate how the compute subsystem, in a cyber-physical mobile
machine, such as a Micro Aerial Vehicle , can impact mission time (time to complete a mission) and energy.
Specifically, we pose the question as “what is the role of computing for cyber-physical mobile robots?” We show
that compute and motion are tightly intertwined, and as such a close examination of cyber and physical processes
and their impact on one another is necessary. We show different “impact paths” through which compute impacts
mission metrics and examine them using a combination of analytical models, simulation, micro and end-to-end
benchmarking. To enable similar studies, we open sourced MAVBench, our tool-set, which consists of (1) a
closed-loop real-time feedback simulator and (2) an end-to-end benchmark suite comprised of state-of-the-art
kernels. By combining MAVBench, analytical modeling, and an understanding of various compute impacts, we
show up to 2X and 1.8X improvements for mission time and mission energy for two optimization case studies.
Our investigations, as well as our optimizations, show that cyber-physical co-design, a methodology with which
both the cyber and physical processes/quantities of the robot are developed with consideration of one another,
similar to hardware-software co-design, is necessary for arriving at the design of the optimal robot.
View details
Roofline Model for UAVs: A Bottleneck Analysis Tool for Onboard Compute Characterization of Autonomous Unmanned Aerial Vehicles
Srivatsan Krishnan
Zishen Wan
Kshitij Bhardwaj
Ninad Jadhav
Vijay Janapa Reddi
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (2022)
Preview abstract
We introduce an early-phase bottleneck analysis and characterization model called the F-1 for designing computing systems that target autonomous Unmanned Aerial Vehicles (UAVs). The model provides insights by exploiting the fundamental relationships between various components in the autonomous UAV, such as sensor, compute, and body dynamics. To guarantee safe operation while maximizing the performance (e.g., velocity) of the UAV, the compute, sensor, and other mechanical properties must be carefully selected or designed. The F-1 model provides visual insights that can aid a system architect in understanding the optimal compute design or selection for autonomous UAVs. The model is experimentally validated using real UAVs, and the error is between 5.1\% to 9.5\% compared to real-world flight tests. An interactive web-based tool for the F-1 model called Skyline is available for free of cost use at: https://bit.ly/skyline-tool
View details
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization
Sungryull Sohn
Hyunjae Woo
Jongwook Choi
lyubing Qiang
Izzeddin Gur
Honglak Lee
Uncertainty in Artificial Intelligence (UAI) (2022) (to appear)
Preview abstract
We tackle real-world problems with complex structures beyond the pixel-based game or simulator. We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph that defines a set of subtasks and their dependencies that are unknown to the agent. Different from the previous meta-RL methods trying to directly infer the unstructured task embedding, our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks, and use it as a prior to improve the task inference in testing. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks than various existing algorithms such as meta reinforcement learning, hierarchical reinforcement learning, and other heuristic agents.
View details
Metrics-only Training Neural Network for Switching among an Array of Feedback Controllers for Bicycle Model Navigation
Marco A. Carmona
Dejan Milutinovic
American Controls Conference (ACC) (2022) (to appear)
Preview abstract
The paper proposes a novel training approach for a neural network to perform switching among an array of computationally generated stochastic optimal feedback controllers. The training is based on the outputs of off-line computed lookup-table metric (LTM) values that store information about individual controller performances. Our study is based on a problem of bicycle kinematic model navigation through a sequence of gates and a more traditional approach to the training is based on kinematic variables (KVs) describing the bicycle-gate relative position. We compare the LTM and KV based training approaches to the navigation problem and find that the LTM training has a faster convergence with less variations than the KV based training. Our results include numerical simulations illustrating the work of the LTM trained neural network switching policy.
View details
Less is More: Generating Grounded Navigation Instructions from Landmarks
Jordi Orbay
Izzeddin Gur
Peter Anderson
CVPR (2022) (to appear)
Preview abstract
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. Existing generators suffer from poor visual grounding, causing them to rely on language priors and hallucinate objects. Our MARKY-MT5 system addresses this by focusing on visual landmarks; it comprises a first stage landmark detector and a second stage generator -- a multimodal, multilingual, multitask encoder-decoder. To train it, we bootstrap grounded landmark annotations on top of the Room-across-Room (RxR) dataset. Using text parsers, weak supervision from RxR's pose traces, and a multilingual image-text encoder trained on 1.8b images, we identify 1.1m English, Hindi and Telugu landmark descriptions and ground them to specific regions in panoramas. On Room-to-Room, human wayfinders obtain success rates (SR) of 71% following MARKY-MT5's instructions, just shy of their 75% SR following human instructions -- and well above SRs with other generators. Evaluations on RxR's longer, diverse paths obtain 61-64% SRs on three languages. Generating such high-quality navigation instructions in novel environments is a step towards conversational navigation tools and could facilitate larger-scale training of instruction-following agents.
View details
QuaRL: Quantization for Fast and Environmentally Sustainable Reinforcement Learning
Gabe Barth-Maron
Maximilian Lam
Sharad Chitlangia
Srivatsan Krishnan
Vijay Janapa Reddi
Zishen Wan
Transactions on Machine Learning Research (TMLR) 2022 (2022)
Preview abstract
Deep reinforcement learning continues to show tremendous potential in achieving task-level autonomy, however, its computational and energy demands remain prohibitively high. In this paper, we tackle this problem by applying quantization to reinforcement learning. To that end, we introduce a novel Reinforcement Learning (RL) training paradigm, \textit{ActorQ}, to speed up actor-learner distributed RL training. \textit{ActorQ} leverages 8-bit quantized actors to speed up data collection without affecting learning convergence. Our quantized distributed RL training system, \textit{ActorQ}, demonstrates end-to-end speedups of 1.5 - 2.5 , and faster convergence over full precision training on a range of tasks (Deepmind Control Suite) and different RL algorithms (D4PG, DQN). Furthermore, we compare the carbon emissions (Kgs of CO2) of \textit{ActorQ} versus standard reinforcement learning on various tasks. Across various settings, we show that \textit{ActorQ} enables more environmentally friendly reinforcement learning by achieving 2.8 less carbon emission and energy compared to training RL-agents in full-precision. Finally, we demonstrate empirically that aggressively quantized RL-policies (up to 4/5 bits) enable significant speedups on quantization-friendly (supports native quantization) resource-constrained edge devices, without degrading accuracy. We believe that this is the first of many future works on enabling computationally energy-efficient and sustainable reinforcement learning. The source code for QuaRL is available here for the public to use: \url{https://bit.ly/quarl-tmlr}.
View details
Evolving Reinforcement Learning Algorithms
JD Co-Reyes
Yingjie Miao
Daiyi Peng
Sergey Levine
Honglak Lee
International Conference on Learning Representations (ICLR) (2021) (to appear)
Preview abstract
We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.
View details
Environment Generation for Zero-Shot Compositional Reinforcement Learning
Izzeddin Gur
Yingjie Miao
Jongwook Choi
Manoj Tiwari
Honglak Lee
Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) (2021)
Preview abstract
Many real-world problems are compositional – solving them requires completing interdependent sub-tasks, either in series or in parallel, that can be represented as a dependency graph. Deep reinforcement learning (RL) agents often struggle to learn such complex tasks due to the long time horizons and sparse rewards. To address this problem, we present Compositional Design of Environments (CoDE), which trains a Generator agent to automatically build a series of compositional tasks tailored to the RL agent’s current skill level. This automatic curriculum not only enables the agent to learn more complex tasks than it could have otherwise, but also selects tasks where the agent’s performance is weak, enhancing its robustness and ability to generalize zero-shot to unseen tasks at test-time. We analyze why current environment generation techniques are insufficient for the problem of generating compositional tasks, and propose a new algorithm that addresses these issues. Our results assess learning and generalization across multiple compositional tasks, including the real-world problem of learning to navigate and interact with web pages. We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments. We contribute two new benchmark frameworks for generating compositional tasks, compositional MiniGrid and gMiniWoB for web navigation.CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.
View details
Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation
Srivatsan Krishnan
Behzad Boroujerdian
William Fu
Vijay Janapa Reddi
Machine Learning, vol. 110 (2021), pp. 2501-2540
Preview abstract
We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to 40% longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.
View details
Visual Navigation Among Humans With Optimal Control as a Supervisor
Varun Tolani
Somil Bansal
Claire Tomlin
IEEE Robotics and Automation Letters (RA-L) (2021) (to appear)
Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation
Srivatsan Krishnan
Behzad Boroujerdian
William Fu
Vijay Janapa Reddi
Machine Learning, vol. 110 (2021), 2501–2540
Preview abstract
We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to 40% longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.
View details
Tiny Robot Learning (tinyRL) for Source Seeking on a Nano Quadcopter
Bardienus Pieter Duisterhof
Srivatsan Krishnan
Jonathan J. Cruz
Colby R. Banbury
William Fu
Guido C. H. E. de Croon
Vijay Janapa Reddi
IEEE International Conference on Robotics and Automation (ICRA) (2021) (to appear)
Preview abstract
We present fully autonomous source seeking onboard a highly
constrained nano quadcopter, by contributing
application-specific system and observation feature design
to enable inference of a deep-RL policy onboard a nano
quadcopter. Our deep-RL algorithm finds a high-performance
solution to a challenging problem, even in presence of high
noise levels and generalizes across real and simulation
environments with different obstacle configurations. We
verify our approach with simulation and in-field testing on
a CrazyFlie using only the cheap and ubiquitous Cortex-M4
microcontroller unit. The results show that by end-to-end
application-specific system design, our contribution
consumes almost three times less additional power, as
compared to competing learning-based navigation approach
onboard a nano quadcopter. Thanks to our observation space,
which we carefully design within the resource constraints,
our solution achieves a 94% success rate in cluttered and
randomized test environments, as compared to the previously
achieved 80%. We also compare our strategy to a simple
finite state machine (FSM), geared towards efficient
exploration, and demonstrate that our policy is more robust
and resilient at obstacle avoidance as well as up to 70%
more efficient in source seeking. To this end, we
contribute a cheap and lightweight end-to-end tiny robot
learning (tinyRL) solution, running onboard a nano
quadcopter, that proves to be robust and efficient in a
challenging task.
View details
SparseDice: Imitation Learning for Temporally Sparse Data via Regularization
Alberto Camacho
Izzeddin Gur
Marcin Lukasz Moczulski
Ofir Nachum
Unsupervised Reinforcement Learning Workshop, collocated with ICML 2021 (2021)
Preview abstract
Imitation learning learns how to act by observing the behavior of an expert demonstrator. We are concerned with a setting where the demonstrations comprise only a subset of state-action pairs (as opposed to the whole trajectories). Our setup reflects the limitations of real-world problems when accessing the expert data. For example, user logs may contain incomplete traces of behavior, or in robotics non-technical human demonstrators may describe trajectories using only a subset of all state-action pairs. A recent approach to imitation learning via distribution matching, ValueDice, tends to overfit when demonstrations are temporally sparse. We counter the overfitting by contributing regularization losses. Our empirical evaluation with Mujoco benchmarks shows that we can successfully learn from very sparse and scarce expert data. Moreover, (i) the quality of the learned policies is often comparable to those learned with full expert trajectories, and (ii) the number of training steps required to learn from sparse data is similar to the number of training steps when the agent has access to full expert trajectories.
View details
Joint Attention for Multi-Agent Coordination and Social Learning
Dennis Lee
Jiaxing Wu
ICRA Workshop on Social Intelligence in Humans and Robots (2021)
Preview abstract
Joint attention — the ability to purposefully coordinate your attention with another person, and mutually attend to the same thing — is an important milestone in human cognitive development. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents’ ability to solve difficult coordination tasks, by helping overcome the problem of exploring the combinatorial multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents’ ability to learn from experts present in their environment, even when performing single-agent tasks. Taken together, these findings suggest that joint attention may be a useful inductive bias for improving multi-agent learning.
View details
Avoidance Critical Probabilistic Roadmaps for Motion Planning in Dynamic Environments
Felipe Felix Arias
Brian Andrew Ichter
Nancy M. Amato
IEEE International Conference on Robotics and Automation (ICRA) (2021) (to appear)
Preview abstract
Motion planning among dynamic obstacles is
an essential capability towards navigation in the real-world.
Sampling-based motion planning algorithms find solutions by
approximating the robot’s configuration space through a graph
representation, predicting or computing obstacles’ trajectories,
and finding feasible paths via a pathfinding algorithm. In this
work, we seek to improve the performance of these subproblems
by identifying regions critical to dynamic environment navi-
gation and leveraging them to construct sparse probabilistic
roadmaps. Motion planning and pathfinding algorithms should
allow robots to prevent encounters with obstacles, irrespective
of their trajectories, by being conscious of spatial context
cues such as the location of chokepoints (e.g., doorways).
Thus, we propose a self-supervised methodology for learning
to identify regions frequently used for obstacle avoidance from
local environment features. As an application of this concept, we
leverage a neural network to generate hierarchical probabilistic
roadmaps termed Avoidance Critical Probabilistic Roadmaps
(ACPRM). These roadmaps contain motion structures that
enable efficient obstacle avoidance, reduce the search and
planning space, and increase a roadmap’s reusability and
coverage. ACPRMs are demonstrated to achieve up to five
orders of magnitude improvement over grid-sampling in the
multi-agent setting and up to ten orders of magnitude over a
competitive baseline in the multi-query setting.
View details
The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines
Srivatsan Krishnan
Zishen Wan
Kshitij Bhardwaj
Paul Whatmough
Gu-Yeon Wei
David Brooks
Vijay Janapa Reddi
IEEE Computer architecture letters (CAL), vol. 19 (2020), pp. 38 - 42
Preview abstract
We introduce the “Formula-1” (F-1) roofline model to understand the role of computing in aerial autonomous machines. The model provides insights by exploiting the fundamental relationships between various components in an aerial robot, such as sensor framerate, compute performance, and body dynamics (physics). The model serves as a tool that can aid computer and cyber-physical system architects to understand the optimal design (or selection) of various components in the development of autonomous machines.
View details
Preview abstract
Sampling-based motion planning techniques have emerged as an efficient algorithmic paradigm for solving complex motion planning problems. These approaches use a set of probing samples to construct an implicit graph representation of the robot’s state space, allowing arbitrarily accurate representations as the number of samples increases to infinity. In practice, however, solution trajectories only rely on a few critical states, often defined by structure in the state space (e.g., doorways). In this work we propose a general method to identify these critical states via graph-theoretic techniques (betweenness centrality) and learn to predict criticality from only local environment features. These states are then leveraged more heavily via global connections within a hierarchical graph, termed Critical Probabilistic Roadmaps. Critical PRMs are demonstrated to achieve up to three orders of magnitude improvement over uniform sampling, while preserving the guarantees and complexity of sampling-based motion planning. A video is available at https://youtu.be/AYoD-pGd9ms.
View details
Safe Policy Learning for Continuous Control
Ofir Nachum
Mohammad Ghavamzadeh
Conference on Robot Learning (CoRL) (2020)
Preview abstract
We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through near-safe policies, i.e.,~policies that keep the agent in desirable situations, both during training and at convergence. We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them. Our algorithms can use any standard policy gradient (PG) method, such as deep deterministic policy gradient (DDPG) or proximal policy optimization (PPO), to train a neural network policy, while enforcing near-constraint satisfaction for every policy update by projecting either the policy parameter or the selected action onto the set of feasible solutions induced by the state-dependent linearized Lyapunov constraints. Compared to the existing constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Moreover, in practice our action-projection algorithm often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with the state-of-the-art baselines on several simulated (MuJoCo) tasks, as well as a real-world robot obstacle-avoidance problem, demonstrating their effectiveness in terms of balancing performance and constraint satisfaction.
View details
Neural Collision Clearance Estimator for Batched Motion Planning
Brian Andrew Ichter
Maryam Bandari
Edward Lee
The 14th International Workshop on the Algorithmic Foundations of Robotics (WAFR) (2020)
Preview abstract
We present a neural network collision checking heuristic, ClearanceNet, and a planning algorithm, CN-RRT. ClearanceNet learns to predict separation distance (minimum distance between robot and workspace) with respect to a workspace. CN-RRT then efficiently computes a motion plan by leveraging three key features of ClearanceNet. First, CN-RRT explores the space by expanding multiple nodes at the same time, processing batches of thousands of collision checks. Second, CN-RRT adaptively relaxes its clearance requirements for more difficult problems. Third, to repair errors, CN-RRT shifts its nodes in the direction of ClearanceNet’s gradient and repairs any residual errors with a traditional RRT, thus maintaining theoretical probabilistic completeness guarantees. In configuration spaces with up to 30 degrees of freedom, ClearanceNet achieves 845x speedup over traditional collision detection methods, while CN-RRT accelerates motion planning by up to 42% over a baseline and finds paths up to 36% more efficient. Experiments on an 11 degree of freedom robot in a cluttered environment confirm the
method’s feasibility on real robots.
View details
Preview abstract
Imitation learning is a popular approach for training effective visual navigation policies. However, collecting expert demonstrations for a legged robot is less practical because the robot is hard to control, and it walks slowly and cannot run continuously for a long time. In this work, we propose a zero-shot imitation learning framework for training a visual navigation policy on a legged robot from human demonstration (third-person perspective) only, allowing for more cost-effective data collection with better navigation capability. However, imitation learning from third-person perspective demonstrations raises unique challenges. Human demonstrations are captured with different camera perspectives, therefore, we design a feature disentanglement network~(FDN) that extracts perspective-agnostic state features. We reconstruct missing action labels by either building an inverse model of the robot's dynamics in the feature space and applying it to the demonstrations or developing efficient GUI to label human demonstrations. We take a model-based imitation learning approach for training a visual navigation policy from the perspective-agnostic, action-labeled demonstrations. We show that our framework can learn an effective visual navigation policy for a legged robot, Laikago, from expert demonstrations in both simulated and real-world environments. Our approach is zero-shot as the robot never tries to navigate a certain navigation path in the testing environment before the testing phase. We also justify our framework by performing an ablation study and comparing it with baseline algorithms.
View details
Quantized Reinforcement Learning (QuaRL)
Srivatsan Krishnan
Sharad Chitlangia
Maximilian Lam
Zishen Wan
Vijay Janapa Reddi
1st Workshop on Resource-Constrained Machine Learning (ReCoML) @ MLSys (2020) (to appear)
Preview abstract
Recent work has shown that quantization can help reduce the memory, compute, and energy demands of deep neural networks without significantly harming their quality. However, whether these prior techniques, applied traditionally to image-based models, work with the same efficacy to the sequential decision making process in reinforcement learning remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the effects of quantization on various deep reinforcement learning policies with the intent to reduce their computational resource demands. We apply techniques such as post-training quantization and quantization aware training to a spectrum of reinforcement learning tasks (such as {Pong}, {Breakout}, {BeamRider} and more) and training algorithms (such as PPO, A2C, DDPG, and DQN). Across this spectrum of tasks and learning algorithms, we show that policies can be quantized to 6-8 bits of precision without loss of accuracy. We also show that certain tasks and reinforcement learning algorithms yield policies that are more difficult to quantize due to their effect of widening the models' distribution of weights and that quantization aware training consistently improves results over post-training quantization and oftentimes even over the full precision baseline. Finally, we demonstrate real-world applications of quantization for reinforcement learning. We use half-precision training to train a Pong model 50% faster, and we deploy a quantized reinforcement learning based navigation policy to an embedded system, achieving an 18X speedup and a 4X reduction in memory usage over an unquantized policy.
View details
Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous
Rose E. Wang
Dennis Lee
Edward Lee
Brian Andrew Ichter
Conference on Robot Learning (CoRL) (2020)
Preview abstract
Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans.
View details
Long-Range Indoor Navigation with PRM-RL
Anthony Francis
Marek Fiser
Tsang-Wei Lee
IEEE Transactions on Robotics (T-RO) (2020), pp. 19
Fast Deep Swept Volume Estimator
John E. G. Baxter
Satomi Sugaya
Mohammad R. Yousefi
Lydia Tapia
The International Journal of Robotics Research (IJRR) (2020) (to appear)
Preview abstract
Despite decades of research on efficient swept volume computation for robotics, computing the exact swept volume is intractable and approximate swept volume algorithms have been computationally prohibitive for applications such as motion and task planning. In this work, we employ Deep Neural Networks (DNNs) for fast swept volume estimation. Since swept volume is a property of robot kinematics, a DNN can be trained off-line once in a supervised manner and deployed in any environment. The trained DNN is fast during on-line swept volume geometry or size inferences. Results show that DNNs can accurately and rapidly estimate swept volumes caused by rotational, translational and prismatic joint motions. Sampling-based planners using the learned distance are up to 5x more efficient and identify paths with smaller swept volumes on simulated and physical robots. Results also show that swept volume geometry estimation with a DNN is over 98.9% accurate and 1200x faster than an octree-based swept volume algorithm.
View details
RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies
Marek Fiser
Lydia Tapia
IEEE Robotics and Automation Letters (RA-L) (2019)
Preview abstract
This paper addresses two challenges facing sampling-based kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference. Video: https://youtu.be/dDMVMTOI8KY
View details
Learning to Navigate the Web
Izzeddin Gur
Dilek Hakkani-Tur
International Conference on Learning Representations (ICLR) (2019)
Preview abstract
Learning in environments with large state and action spaces, and sparse rewards, can hinder a Reinforcement Learning (RL) agent’s learning through trial-anderror. For instance, following natural language instructions on the Web (such as booking a flight ticket) leads to RL settings where input vocabulary and number of actionable elements on a page can grow very large. Even though recent
approaches improve the success rate on relatively simple environments with the help of human demonstrations to guide the exploration, they still fail in environments where the set of possible instructions can reach millions. We approach the aforementioned problems from a different perspective and propose guided RL approaches that can generate unbounded amount of experience for an agent to learn from. Instead of learning from a complicated instruction with a large vocabulary, we decompose it into multiple sub-instructions and schedule a curriculum in which an agent is tasked with a gradually increasing subset of these relatively easier sub-instructions. In addition, when the expert demonstrations are not available, we propose a novel meta-learning framework that generates new instruction following tasks and trains the agent more effectively. We train DQN, deep reinforcement learning agent, with Q-value function approximated with a novel QWeb neural network architecture on these smaller, synthetic instructions. We evaluate the ability of our agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. The QWeb agent outperforms the baseline without using any human demonstration achieving 100% success rate on several difficult environments.
View details
Toward Exploring End-to-End Learning Algorithms for Autonomous Aerial Machines
Srivatsan Krishnan
Behzad Boroujerdian
Vijay Janapa Reddi
Algorithms and Architectures for Learning in-the-Loop Systems in Autonomous Flight @ ICRA (2019)
Preview abstract
We develop AirLearning, a tool suite for endto-end closed-loop UAV analysis, equipped with a customized yet randomized environment generator in order to expose the UAV with a diverse set of challenges. We take Deep Q networks (DQN) as an example deep reinforcement learning algorithm and use curriculum learning to train a point to point obstacle avoidance policy. While we determine the best policy based on the success rate, we evaluate it under strict resource constraints on an embedded platform such as RasPi 3. Using hardware in the loop methodology, we quantify the policy’s performance with quality of flight metrics such as energy consumed, endurance and the average length of the trajectory. We find that the trajectories produced on the embedded platform are very different from those predicted on the desktop, resulting in up to 26.43% longer trajectories. Quality of flight metrics with hardware in the loop characterizes those differences in simulation, thereby exposing how the choice of onboard compute contributes to shortening or widening of
‘Sim2Real’ gap.
View details
Learning Navigation Behaviors End-to-End with AutoRL
Marek Fiser
Anthony Francis
IEEE Robotics and Automation Letters (RA-L), vol. 4 (2019), pp. 2007-2014
Preview abstract
We learn end-to-end point-to-point and path-following navigation behaviors that avoid moving obstacles. These policies receive noisy lidar observations and output robot linear and angular velocities. The policies are trained in small, static environments with AutoRL, an evolutionary automation layer around Reinforcement Learning (RL) that searches for a deep RL reward and neural network architecture with large-scale hyper-parameter optimization. AutoRL first finds a reward that maximizes task completion, and then finds a neural network architecture that maximizes the cumulative of the found reward. Empirical evaluations, both in simulation and on-robot, show that AutoRL policies do not suffer from the catastrophic forgetfulness that plagues many other deep reinforcement learning algorithms, generalize to new environments and moving obstacles, are robust to sensor, actuator, and localization noise, and can serve as robust building blocks for larger navigation tasks. Our path-following and point-to-point policies are respectively 23% and 26% more successful than comparison methods across new environments. Video at: https://youtu.be/0UwkjpUEcbI
View details
Preview abstract
Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: https://youtu.be/svdaOFfQyC8.
View details
Preview abstract
Deep Reinforcement Learning (RL) has recently emerged as a solution for moving obstacle avoidance. Deep RL learns to simultaneously predict obstacle motions and corresponding avoidance actions directly from robot sensors, even for obstacles with different dynamics models. However, deep RL methods typically cannot guarantee policy convergences, i.e., cannot provide probabilistic collision avoidance guarantees. In contrast, stochastic reachability (SR), a computationally expensive formal method that employs a known obstacle dynamics model, identifies the optimal avoidance policy and provides strict convergence guarantees. The availability of the optimal solution for versions of the moving obstacle problem provides a baseline to compare trained deep RL policies. In this paper, we compare the expected cumulative reward and actions of these policies to SR, and find the following. 1) The state-value function approximates the optimal collision probability well, thus explaining the high empirical performance. 2) RL policies deviate from the optimal significantly thus negatively impacting collision avoidance in some cases. 3) Evidence suggests that the deviation is caused, at least partially, by the actor net failing to approximate the action corresponding to the highest state-action value.
View details
MAVBench: Micro Aerial Vehicle Benchmarking
Behzad Boroujerdian
Hasan Genc
Srivatsan Krishnan
Wenzhi Cui
Vijay Janapa Reddi
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), IEEE, pp. 894-907
Preview abstract
Abstract—Unmanned Aerial Vehicles (UAVs) are getting closer to becoming ubiquitous in everyday life. Among them, Micro Aerial Vehicles (MAVs) have seen an outburst of attention recently, specifically in the area with a demand for autonomy. A key challenge standing in the way of making MAVs autonomous is that researchers lack the comprehensive understanding of how performance, power, and computational bottlenecks affect MAV applications. MAVs must operate under a stringent power budget, which severely limits their flight endurance time. As such, there is a need for new tools, benchmarks, and methodologies to foster the systematic development of autonomous MAVs. In this paper, we introduce the “MAVBench” framework which consists of a closed-loop simulator and an end-to-end application benchmark suite. A closed-loop simulation platform is needed to probe and understand the intra-system (application data flow) and inter-system (system and environment) interactions in MAV applications to pinpoint bottlenecks and identify opportunities for hardware and software co-design and optimization. In addition to the simulator, MAVBench provides a benchmark suite, the first of its kind, consisting of a variety of MAV applications designed to enable computer architects to perform characterization and develop future aerial computing systems. Using our open source, end-to-end experimental platform, we uncover a hidden, and thus far unexpected compute to total system energy relationship in MAVs. Furthermore, we explore the role of compute by presenting three case studies targeting performance, energy and reliability. These studies confirm that an efficient system design can improve MAV’s battery consumption by up to 1.8X.
View details
FollowNet: Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning
Pararth Shah
Marek Fiser
Dilek Hakkani-Tur
Third Machine Learning in Planning and Control of Robot Motion Workshop at ICRA (2018)
Preview abstract
Abstract— Understanding and following directions provided
by humans can enable robots to navigate effectively in unknown
situations. We present FollowNet, an end-to-end differentiable
neural architecture for learning multi-modal navigation poli-
cies. FollowNet maps natural language instructions as well
as visual and depth inputs to locomotion primitives. Fol-
lowNet processes instructions using an attention mechanism
conditioned on its visual and depth input to focus on the
relevant parts of the command while performing the navigation
task. Deep reinforcement learning (RL) a sparse reward learns
simultaneously the state representation, the attention function,
and control policies. We evaluate our agent on a dataset
of complex natural language directions that guide the agent
through a rich and realistic dataset of simulated homes. We
show that the FollowNet agent learns to execute previously
unseen instructions described with a similar vocabulary, and
successfully navigates along paths not encountered during
training. The agent shows 30% improvement over a baseline
model without the attention mechanism, with 52% success rate
at novel instructions.
View details
Preview abstract
Robot motion planning often requires finding trajectories
that balance different user intents, or preferences.
One of these preferences is usually arrival at the goal, while
another might be obstacle avoidance. Here, we formalize these,
and similar, tasks as preference balancing tasks (PBTs) on
acceleration controlled robots, and propose a motion planning
solution, PrEference Appraisal Reinforcement Learning
(PEARL). PEARL uses reinforcement learning on a restricted
training domain, combined with features engineered from usergiven
intents. PEARL’s planner then generates trajectories in
expanded domains for more complex problems. We present an
adaptation for rejection of stochastic disturbances and offer indepth
analysis, including task completion conditions and behavior
analysis when the conditions do not hold. PEARL is evaluated on
five problems, two multi-agent obstacle avoidance tasks and three
that stochastically disturb the system at run-time: 1) a multiagent
pursuit problem with 1000 pursuers, 2) robot navigation
through 900 moving obstacles, which is is trained with in an
environment with only 4 static obstacles, 3) aerial cargo delivery,
4) two robot rendezvous, and 5) flying inverted pendulum. Lastly,
we evaluate the method on a physical quadrotor UAV robot with
a suspended load influenced by a stochastic disturbance.
View details
Resilient Computing with Reinforcement Learning on a Dynamical System: Case study in Sorting
Brad Aimone
Conrad James
Lydia Tapia
57th IEEE Conference on Decision and Control (2018)
Preview abstract
This paper poses general computation as a feedback-control problem. This formulation allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing is then posed as a sequential decision making problem, solved with RL, and analyzed with Lyapunov stability theory to assess agent's progression to the goal and resilience. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort.
View details
Deep Neural Networks for Swept Volume Prediction Between Configurations
Hao-Tien Chiang
Lydia Tapia
Third Workshop on Machine Learning in Planning and Control of Robot Motion at ICRA (2018)
Preview abstract
Swept Volume (SV), the volume displaced by an object when it is moving along a trajectory, is considered a useful metric for motion planning. First, SV has been used to identify collisions along a trajectory, because it directly measures the amount of space required for an object to move.
Second, in sampling-based motion planning SV is as an excellent distance metric, because it correlates to the likelihood of success of the expensive local planning step between two sampled configurations.
However, in both of these applications, traditional SV algorithms are too computationally expensive for efficient motion planning. In this work, we train Deep Neural Networks (DNNs) to learn the size of SV for specific robot geometries. Results for two robots, a 6 degree of freedom (DOF) rigid body and a 7 DOF fixed-based manipulator, indicate that the network estimations are very close to the true size of SV and is more than 1500 times faster than a state of the art SV estimation algorithm.
View details
Fast Swept Volume Estimation with Deep Learning
Satomi Sugaya
Lydia Tapia
The 13th International Workshop on the Algorithmic Foundations of Robotics (WAFR) (2018)
Preview abstract
Swept volume, the volume displaced by a moving object, is an ideal distance metric for sampling-based motion planning because it directly correlates to the amount of motion between two states. However, even approximate algorithms are computationally prohibitive. Our fundamental approach is the application of deep learning to efficiently estimate swept volume computation within a 5%-10% error for all robots tested, from rigid bodies to manipulators. However, even inference via the trained network can be computationally costly given the often hundreds of thousands of computations required by sampling-based motion planning. To address this, we demonstrate an efficient hierarchical approach for applying our trained estimator. This approach first pre-filters samples using a weighted Euclidean estimator trained via swept volume. Then, it selectively applies the deep neural network estimator. The first estimator, although less accurate, has metric space properties. The second estimator is a high-fidelity unbiased estimator without metric space properties. We integrate the hierarchical selection approach in both roadmap-based and a tree-based sampling motion planners. Empirical evaluation on the robot set demonstrates that hierarchal application of the metrics yields up to 5000 times faster planning than state of the art swept volume approximation and up to five times higher probability of finding a collision-free trajectory under a fixed time budget than the traditional Euclidean metric.
View details
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
Oscar Ramirez
Marek Fiser
Ken Oslund
Anthony Francis
James Davidson
Lydia Tapia
IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia (2018), pp. 5113-5120
Preview abstract
We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL) agents. The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints with-
out knowledge of the large-scale topology, while the sampling-based planners provide an approximate map of the space of possible configurations of the robot from which collision-
free trajectories feasible for the RL agents can be identified. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor
navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. These evaluations included both simulated environments and
on-robot tests. Our results show improvement in navigation task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-
RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an
environment 63 million times larger than used in training.
View details
Why Compute Matters for UAV Energy Efficiency?
Behzad Boroujerdian
Hasan Genc
Srivatsan Krishnan
Vijay Janapa Reddi
International Symposium on Aerial Robotics (2018)
Preview abstract
Unmanned Aerial Vehicles (UAVs) are getting closer to becoming ubiquitous in everyday life. Although the researchers in the robotic domain have made rapid progress in recent years, hardware and software architects in the computer architecture community lack the comprehensive understanding of how performance, power, and computational bottlenecks affect UAV applications. Such an understanding enables system architects to design microchips tailored for aerial agents. This paper is an attempt by computer architects to initiate the discussion between the two academic domains by investigating the underlying compute systems’ impact on aerial robotic applications. To do so, we identify performance and energy constraints and examine the impact of various compute knobs such as processor cores and frequency on these constraints. Our experiment show that such knobs allow for up to 5X speed up for a wide class of applications.
View details
No Results Found