Imagine a futuristic version of Google Street View that could dial up any possible place in the world, at any possible time. Effectively, such a service would be a recording of the plenoptic function—the hypothetical function described by Adelson and Bergen that captures all light rays passing through space at all times. While the plenoptic function is completely impractical to capture in its totality, every photo ever taken represents a sample of this function. I will present recent methods we've developed to reconstruct the plenoptic function from sparse space-time samples of photos—including Street View itself, as well as tourist photos of famous landmarks. The results of this work include the ability to take a single photo and synthesize a full dawn-to-dusk timelapse video, as well as compelling 4D view synthesis capabilities where a scene can simultaneously be explored in space and time.
One of the most striking characteristics of human behavior in contrast to all other animal is that we show extraordinary variability across populations. Human cultural diversity is a biological oddity. More specifically, we propose that what makes humans unique is the nature of the individual ontogenetic process, that results in this unparalleled cultural diversity. Hence, our central question is: How is human ontogeny adapted to cultural diversity and how does it contribute to it? This question is critical, because cultural diversity does not only entail our predominant mode of adaptation to local ecologies, but is key in the construction of our cognitive architecture. The colors we see, the tones that we hear, the memories we form, the norms we adhere to are all the consequence of an interaction between our emerging cognitive system and our lived experiences. While psychologists make careers measuring cognitive systems, we are terrible at measuring experience as are anthropologists, sociologists, etc. The standard methods all face unsurmountable limitations. In our department, we hope to apply Machine Learning, Deep Learning and Computer Vision to automatically extract developmentally important indicators of humans’ daily experience. Similarly to the way that modern sequencing technologies allow us to study the human genotype at scale, applying AI methods to reliably quantify humans’ lived experience would allow us to study the human behavioral phenotype at scale, and fundamentally alter the science of human behavior and its application in education, mental health and medicine: The phenotyping revolution.
Organizers: Timo Bolkart
The Covid-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries, resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity without triggering a second outbreak. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restriction, and public health. Many approaches model infection and encounters as binary events. With such approaches, called binary contact tracing, once a case is confirmed by a positive lab test result, it is propagated to people who were contacts of the infected person, typically recommending that these individuals should self-quarantine. This approach ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier self-quarantine. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate risk predictions. Methods which may use such information have been proposed, but these typically require access to the graph of social interactions and/or centralization of sensitive personal data, which is incompatible with reasonable privacy and security constraints. We use an agent-based epidemiological simulation to develop and test ML methods that can be deployed to a smartphone to locally predict an individual's risk of infection from their contact history and other information, while respecting strong privacy and security constraints. We use this risk score to provide personalized recommendations to the user via an app, an approach we call probabilistic risk awareness (PRA). We show that PRA can significantly reduce spread of the disease compared to other methods, for equivalent average mobility and realistic assumptions about app adoption, and thereby save lives.
Fermions, particles with half-integer spin like the electron, proton and neutron, obey the Pauli principle: They cannot share one and the same quantum state. This “anti social” behavior is directly observed in experiments with ultracold gases of fermionic atoms: Pauli blocking in momentum space for a free Fermi gas, and in real space in gases confined to an optical lattice. When fermions interact, new, rather “social” behavior emerges, i.e. hydrodynamic flow, superfluidity and magnetism. The interplay of Pauli’s principle and strong interactions poses great difficulties to our understanding of complex Fermi systems, from nuclei to high-temperature superconducting materials and neutron stars. I will describe experiments on atomic Fermi gases where interactions become as strong as allowed by quantum mechanics – the unitary Fermi gas, fermions immersed in a Bose gas and the Fermi-Hubbard lattice gas. Sound and heat transport distinguish collisionally hydrodnamic from superfluid flow, while spin transport reveals the underlying mechanism responsible for quantum magnetism.
In this visual feast, Scott recounts results and revelations from four years of experimentation using machine learning as a ‘creative collaborator’ in his artistic process. He makes the case that AI, rather than rendering artists obsolete, will empower us and expand our creative horizons. In this visual feast, Scott shares an eclectic range of successes and failures encountered in his efforts to create powerful, but artistically controllable neural networks to use as tools to represent and abstract the human figure. Scott also gives a behinds-the-scenes look at creating the work for his recent Artist+AI exhibition in London.
Organizers: Ahmed Osman
In this talk, I will introduce the notion of 'canonicalization' and how it can be used to solve 3D computer vision tasks. I will describe Normalized Object Coordinate Space (NOCS), a 3D canonical container that we have developed for 3D estimation, aggregation, and synthesis tasks. I will demonstrate how NOCS allows us to address previously difficult tasks like category-level 6DoF object pose estimation, and correspondence-free multiview 3D shape aggregation. Finally, I will discuss future directions including opportunities to extend NOCS for tasks like articulated and non-rigid shape and pose estimation.
Organizers: Timo Bolkart
Motivated by the current COVID-19 outbreak, we introduce a novel epidemic model based on marked temporal point processes that is specifically designed to make fine-grained spatiotemporal predictions about the course of the disease in a population. Our model can make use and benefit from data gathered by a variety of contact tracing technologies and it can quantify the effects that different testing and tracing strategies, social distancing measures, and business restrictions may have on the course of the disease. Building on our model, we use Bayesian optimization to estimate the risk of exposure of each individual at the sites they visit from historical longitudinal testing data. Experiments using real COVID-19 data and mobility patterns from several towns and regions in Germany and Switzerland demonstrate that our model can be used to quantify the effects of tracing, testing, and containment strategies at an unprecedented spatiotemporal resolution. To facilitate research and informed policy-making, particularly in the context of the current COVID-19 outbreak, we are releasing an open-source implementation of our framework at https://github.com/covid19-model.
Organizers: Bernhard Schölkopf
This talk is devoted to modern methods for attosecond and femtosecond laser spectro-microscopy with the special focus on applications that require extreme spatial resolution. In the first part, I discuss how high-harmonic generation by high-energy, high-power light transients holds promise to deliver the required photon flux and photon energy for attosecond pump-probe spectroscopy at high spatiotemporal resolution in order to capture electron-dynamic in matter. I demonstrate the first prototype high-energy field synthesizer based on Yb:YAG, thin-disk laser technology for generating high-energy light transients. In the second part of my talk, I show resolving the complex electric field of light at PHz frequency by means of electro-optic sampling in ambient air, and discuss the potential of the technique in molecular spectroscopy and high-resolution, label-free imaging. 1. A. Alismail et al., "Multi-octave, CEP-stable source for high-energy field synthesis," Science Advances 6, eaax 3408 (2020) 2. H. Wang et al., "High Energy, Sub-Cycle, Field Synthesizers," IEEE Journal of Selected Topics in Quantum Electronics, (2019). 3. A. Sommer et al., " Attosecond nonlinear polarization and energy transfer in dielectrics," Nature 534, 86 (2016). 4. H. Fattahi, "Sub-cycle light transients for attosecond, X-ray, four-dimensional imaging," The Contemporary Physics Journal, 57, 1 (2016). 5. H. Fattahi et al., "Third-generation femtosecond technology," Optica 1, 45 (2014).
Humans perform object manipulation in order to execute a specific task. Seldom is such action started with no goal in mind. In contrast, traditional robotic grasping (first stage for object manipulation) seems to focus purely on getting hold of the object—neglecting the goal of the manipulation. In this light, most metrics used in robotic grasping do not account for the final task in their judgement of quality and success. Since the overall goal of a manipulation task shapes the actions of humans and their grasps, the task itself should shape the metric of success. To this end, I will present a new metric centred on the task. The task is also very important in another action of object manipulation: the object handover. In the context of object handovers, humans display a high degree of flexibility and adaptation. These characteristics are key for robots to be able to interact with the same fluency and efficiency with humans. I will present my work on human-human and robot-human handovers and explain why an understanding of the task is of importance for robotic grasping.
Organizers: Katherine J. Kuchenbecker
In this talk I will present an overview and the latest results of the project Aerial Outdoor Motion Capture (AirCap), running at the Perceiving Systems department. AirCap's goal is to achieve markerless and unconstrained human motion capture (MoCap) in unknown and unstructured outdoor environments. To this end, we have developed a flying MoCap system using a team of autonomous aerial robots with on-board, monocular RGB cameras. Our system is endowed with a range of novel functionalities which was developed by our group over the last 3 years. These include, i) cooperative detection and tracking that enables DNN-based detectors on board flying robots, ii) active cooperative perception in aerial robot teams to minimize joint tracking uncertainty, and iii) markerless human pose and shape estimation using images acquired from multiple views and approximately calibrated cameras. We have conducted several real experiments along with ground truth comparisons to validate our system. Overall, for outdoor scenarios we have demonstrated the first fully autonomous flying MoCap system involving multiple aerial robots.
Organizers: Katherine J. Kuchenbecker
In this talk I will consider the problem of scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled, outdoor image. This task is highly ill-posed, but we show that multiview self-supervision, a natural lighting prior and implicit lighting estimation allow an image-to-image CNN to solve the task, seemingly learning some general principles of shape-from-shading along the way. Adding a neural renderer and sky generator GAN, our approach allows us to synthesise photorealistic relit images under widely varying illumination. I will finish by briefly describing recent work in which some of these ideas have been combined with deep face model fitting replacing parameter regression with correspondence prediction enabling fully unsupervised training.
Organizers: Timo Bolkart
Licklider and Taylor (1968) envisioned computational machinery that could enable better communication between humans than face-to-face interaction. In the last fifty years, we have used computing to develop various means of communication, such as mail, messaging, phone calls, video conversation, and virtual reality. These are, however, a proxy of face-to-face communication that aims at encoding words, expressions, emotions, and body language at the source and decoding them reliably at the destination. The true revolution of personal computing has not begun yet because we have not been able to tap the real potential of computing for social communication. A computational machinery that can understand and create a four-dimensional audio-visual world can enable humans to describe their imagination and share it with others. In this talk, I will introduce the Computational Studio: an environment that allows non-specialists to construct and creatively edit the 4D audio-visual world from sparse audio and video samples. The Computational Studio aims to enable everyone to relive old memories through a form of virtual time travel, to automatically create new experiences, and share them with others using everyday computational devices. There are three essential components of the Computational Studio: (1) how can we capture 4D audio-visual world?; (2) how can we synthesize the audio-visual world using examples?; and (3) how can we interactively create and edit the audio-visual world? The first part of this talk introduces the work on capturing and browsing in-the-wild 4D audio-visual world in a self-supervised manner and efforts on building a multi-agent capture system. The applications of this work apply to social communication and to digitizing intangible cultural heritage, capturing tribal dances and wildlife in the natural environment, and understanding the social behavior of human beings. In the second part, I will talk about the example-based audio-visual synthesis in an unsupervised manner. Example-based audio-visual synthesis allows us to express ourselves easily. Finally, I will talk about the interactive visual synthesis that allows us to manually create and edit visual experiences. Here I will also stress the importance of thinking about a human user and computational devices when designing content creation applications. The Computational Studio is a first step towards unlocking the full degree of creative imagination, which is currently limited to the human mind by the limits of the individual's expressivity and skill. It has the potential to change the way we audio-visually communicate with others.