The ability to predict how an environment changes based on forces applied to it is fundamental for a robot to achieve specific goals. Traditionally in robotics, this problem is addressed through the use of pre-specified models or physics simulators, taking advantage of prior knowledge of the problem structure. While these models are general and have broad applicability, they depend on accurate estimation of model parameters such as object shape, mass, friction etc. On the other hand, learning based methods such as Predictive State Representations or more recent deep learning approaches have looked at learning these models directly from raw perceptual information in a model-free manner. These methods operate on raw data without any intermediate parameter estimation, but lack the structure and generality of model-based techniques. In this talk, I will present some work that tries to bridge the gap between these two paradigms by proposing a specific class of deep visual dynamics models (SE3-Nets) that explicitly encode strong physical and 3D geometric priors (specifically, rigid body dynamics) in their structure. As opposed to traditional deep models that reason about dynamics/motion a pixel level, we show that the physical priors implicit in our network architectures enable them to reason about dynamics at the object level - our network learns to identify objects in the scene and to predict rigid body rotation and translation per object. I will present results on applying our deep architectures to two specific problems: 1) Modeling scene dynamics where the task is to predict future depth observations given the current observation and an applied action and 2) Real-time visuomotor control of a Baxter manipulator based only on raw depth data. We show that: 1) Our proposed architectures significantly outperform baseline deep models on dynamics modelling and 2) Our architectures perform comparably or better than baseline models for visuomotor control while operating at camera rates (30Hz) and relying on far less information.
Organizers: Franzi Meier
Machine learning has become a popular application domain for modern optimization techniques, pushing its algorithmic frontier. The need for large scale optimization algorithms which can handle millions of dimensions or data points, typical for the big data era, have brought a resurgence of interest for first order algorithms, making us revisit the venerable stochastic gradient method [Robbins-Monro 1951] as well as the Frank-Wolfe algorithm [Frank-Wolfe 1956]. In this talk, I will review recent improvements on these algorithms which can exploit the structure of modern machine learning approaches. I will explain why the Frank-Wolfe algorithm has become so popular lately; and present a surprising tweak on the stochastic gradient method which yields a fast linear convergence rate. Motivating applications will include weakly supervised video analysis and structured prediction problems.
Organizers: Philipp Hennig
In my talk I will present my work regarding 3D mapping using lidar scanners. I will give an overview of the SLAM problem and its main challenges: robustness, accuracy and processing speed. Regarding robustness and accuracy, we investigate a better point cloud representation based on resampling and surface reconstruction. Moreover, we demonstrate how it can be incorporated in an ICP-based scan matching technique. Finally, we elaborate on globally consistent mapping using loop closures. Regarding processing speed, we propose the integration of our scan matching in a multi-resolution scheme and a GPU-accelerated implementation using our programming language Quasar.
Organizers: Simon Donne
In this talk we will address the problem of 3D reconstruction of rigid and deformable objects from a single depth video stream. Traditional 3D registration techniques, such as ICP and its variants, are wide-spread and effective, but sensitive to initialization and noise due to the underlying correspondence estimation procedure. Therefore, we have developed SDF-2-SDF, a dense, correspondence-free method which aligns a pair of implicit representations of scene geometry, e.g. signed distance fields, by minimizing their direct voxel-wise difference. In its rigid variant, we apply it for static object reconstruction via real-time frame-to-frame camera tracking and posterior multiview pose optimization, achieving higher accuracy and a wider convergence basin than ICP variants. Its extension to scene reconstruction, SDF-TAR, carries out the implicit-to-implicit registration over several limited-extent volumes anchored in the scene and runs simultaneous GPU tracking and CPU refinement, with a lower memory footprint than other SLAM systems. Finally, to handle non-rigidly moving objects, we incorporate the SDF-2-SDF energy in a variational framework, regularized by a damped approximately Killing vector field. The resulting system, KillingFusion, is able to reconstruct objects undergoing topological changes and fast inter-frame motion in near-real time.
Organizers: Fatma Güney
Under acute threat, biological agents need to choose adaptive actions to survive. In my talk, I will provide a decision-theoretic view on this problem and ask, what are potential computational algorithms for this choice, and how are they implemented in neural circuits. Rational design principles and non-human animal data tentatively suggest a specific architecture that heavily relies on tailored algorithms for specific threat scenarios. Virtual reality computer games provide an opportunity to translate non-human animal tasks to humans and investigate these algorithms across species. I will discuss the specific challenges for empirical inference on underlying neural circuits given such architecture.
Organizers: Michel Besserve
Visual Question Answering is one of the applications of Deep Learning that is pushing towards real Artificial Intelligence. It turns the typical deep learning process around by only defining the task to be carried out after the training has taken place, which changes the task fundamentally. We have developed a range of strategies for incorporating other information sources into deep learning-based methods, and the process taken a step towards developing algorithms which learn how to use other algorithms to solve a problem, rather than solving it directly. This talk thus covers some of the high-level questions about the types of challenges Deep Learning can be applied to, and how we might separate the things its good at from those that it’s not.
Organizers: Siyu Tang
Enabling robots for interaction with humans and unknown environments has been one of the primary goals of robotics research over decades. I will outline how human-centered robot design, nonlinear soft-robotics control inspired by human neuromechanics and physics grounded learning algorithms will let robots become a commodity in our near-future society. In particular, compliant and energy-controlled ultra-lightweight systems capable of complex collision handling enable high-performance human assistance over a wide variety of application domains. Together with novel methods for dynamics and skill learning, flexible and easy-to-use robotic power tools and systems can be designed. Recently, our work has led to the first next generation robot Franka Emika that has recently become commercially available. The system is able to safely interact with humans, execute and even learn sensitive manipulation skills, is affordable and designed as a distributed interconnected system.
Organizers: Eva Laemmerhirt
In this talk I introduce the neural statistician as an approach for meta learning. The neural statistician learns to appropriately summarise datasets through a learnt statistic vector. This can be used for few shot learning, by computing the statistic vectors for the presented data, and using these statistics as context variables for one-shot classification and generation. I will show how we can generalise the neural statistician to a context aware learner that learns to characterise and combine independently learnt contexts. I will also demonstrate an approach for meta-learning data augmentation strategies. Acknowledgments: This work is joint work with Harri Edwards, Antreas Antoniou, and Conor Durkan.
Organizers: Philipp Hennig
The field of transportation is undergoing a seismic change with the coming introduction of autonomous driving. The technologies required to enable computer driven cars involves the latest cutting edge artificial intelligence algorithms along three major thrusts: Sensing, Planning and Mapping. Prof. Amnon Shashua, Co-founder and Chairman of Mobileye, will describe the challenges and the kind of machine learning algorithms involved, but will do that through the perspective of Mobileye’s activity in this domain.
The fundamental building block in many learning models is the distance measure that is used. Usually, the linear distance is used for simplicity. Replacing this stiff distance measure with a flexible one could potentially give a better representation of the actual distance between two points. I will present how the normal distribution changes if the distance measure respects the underlying structure of the data. In particular, a Riemannian manifold will be learned based on observations. The geodesic curve can then be computed—a length-minimizing curve under the Riemannian measure. With this flexible distance measure we get a normal distribution that locally adapts to the data. A maximum likelihood estimation scheme is provided for inference of the parameters mean and covariance, and also, a systematic way to choose the parameter defining the Riemannian manifold. Results on synthetic and real world data demonstrate the efficiency of the proposed model to fit non-trivial probability distributions.
Organizers: Philipp Hennig
In this talk I will first outline my different research projects. I will then focus on the EACare project, a quite newly started multi-disciplinary collaboration with the aim to develop an embodied system, capable of carrying out neuropsychological tests to detect early signs of dementia, e.g., due to Alzheimer's disease. The system will use methods from Machine Learning and Social Robotics, and be trained with examples of recorded clinician-patient interactions. The interaction will be developed using a participatory design approach. I describe the scope and method of the project, and report on a first Wizard of Oz prototype.