Optoacoustic imaging is increasingly attracting the attention of the biomedical research community due to its excellent spatial and temporal resolution, centimeter scale penetration into living tissues, versatile endogenous and exogenous optical absorption contrast. State-of-the-art implementations of multi-spectral optoacoustic tomography (MSOT) are based on multi-wavelength excitation of tissues to visualize specific molecules within opaque tissues. As a result, the technology can noninvasively deliver structural, functional, metabolic, and molecular information from living tissues. The talk covers most recent advances pertaining ultrafast imaging instrumentation, multi-modal combinations with optical and ultrasound methods, intelligent reconstruction algorithms as well as smart optoacoustic contrast and sensing approaches. Our current efforts are also geared toward exploring potential of the technique in studying multi-scale dynamics of the brain and heart, monitoring of therapies, fast tracking of cells and targeted molecular imaging applications. MSOT further allows for a handheld operation thus offers new level of precision for clinical diagnostics of patients in a number of indications, such as breast and skin lesions, inflammatory diseases and cardiovascular diagnostics.
Organizers: Metin Sitti
In this talk I will present an overview of our recent works that learn deep geometric models for the 3D face from large datasets of scans. Priors for the 3D face are crucial for many applications: to constrain ill posed problems such as 3D reconstruction from monocular input, for efficient generation and animation of 3D virtual avatars, or even in medical domains such as recognition of craniofacial disorders. Generative models of the face have been widely used for this task, as well as deep learning approaches that have recently emerged as a robust alternative. Barring a few exceptions, most of these data-driven approaches were built from either a relatively limited number of samples (in the case of linear models of the shape), or by synthetic data augmentation (for deep-learning based approaches), mainly due to the difficulty in obtaining large-scale and accurate 3D scans of the face. Yet, there is a substantial amount of 3D information that can be gathered when considering publicly available datasets that have been captured over the last decade. I will discuss here our works that tackle the challenges of building rich geometric models out of these large and varied datasets, with the goal of modeling the facial shape, expression (i.e. motion) or geometric details. Concretely, I will talk about (1) an efficient and fully automatic approach for registration of large datasets of 3D faces in motion; (2) deep learning methods for modeling the facial geometry that can disentangle the shape and expression aspects of the face; and (3) a multi-modal learning approach for capturing geometric details from images in-the-wild, by simultaneously encoding both facial surface normal and natural image information.
Organizers: Jinlong Yang
Motivated by the low voltage driven actuation of ionic Electroactive Polymers (iEAPs)  , recently we began investigating ionic elastomers. In this talk I will discuss the preparation, physical characterization and electric bending actuation properties of two novel ionic elastomers; ionic polymer electrolyte membranes (iPEM), and ionic liquid crystal elastomers (iLCE). Both materials can be actuated by low frequency AC or DC voltages of less than 1 V. The bending actuation properties of the iPEMs are outperforming most of the well-developed iEAPs, and the not optimized first iLCEs are already comparable to them. Ionic liquid crystal elastomers also exhibit superior features, such as the alignment dependent actuation, which offers the possibility of pre-programed actuation pattern at the level of cross-linking process. Additionally, multiple (thermal, optical and electric) actuations are also possible. I will also discuss issues with compliant electrodes and possible soft robotic applications.  Y. Bar-Cohen, Electroactive Polyer Actuators as Artficial Muscles: Reality, Potential and Challenges, SPIE Press, Bellingham, 2004.  O. Kim, S. J. Kim, M. J. Park, Chem. Commun. 2018, 54, 4895.  C. P. H. Rajapaksha, C. Feng, C. Piedrahita, J. Cao, V. Kaphle, B. Lüssem, T. Kyu, A. Jákli, Macromol. Rapid Commun. 2020, in print.  C. Feng, C. P. H. Rajapaksha, J. M. Cedillo, C. Piedrahita, J. Cao, V. Kaphle, B. Lussem, T. Kyu, A. I. Jákli, Macromol. Rapid Commun. 2019, 1900299.
“There’s something about the outside of a horse that is good for the inside of a man”, Churchill allegedly said. The horse’s motion has captured the interest of humans throughout history. Understanding of the mechanics of horse motion has been sought in early work by Aristotle (300 BC), in pioneering photographic studies by Muybridge (1880) as well as in modern day scientific publications.
The horse (Equus callabus ferus) is a remarkable animal athlete with outstanding running capabilities. The efficiency of its locomotion is explained by specialised anatomical features, which limit the degrees of freedom of movement and reduce energy consumption. Theoretical mechanical models are quite well suited to describe the essence of equine gaits and provide us with simple measures for analysing gait asymmetry. Such measures are well needed, since agreement between veterinarians is moderate to poor when it comes to visual assessment of lameness.
The human visual system has indeed clear limitations in perception and interpretation of horse motion. This limits our abilities to understand the horse, not only to detect lameness and to predict performance, but also to interpret its non-verbal communication and to detect signs of illness or discomfort.
This talk will provide a brief overview of existing motion analysis techniques and models in equine biomechanics. We will discuss future possibilities to achieve more accessible, sensitive and complex ways of analysing the motion of the horse.
Traditional voice conversion methods rely on parallel recordings of multiple speakers pronouncing the same sentences. For real-world applications however, parallel data is rarely available. We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice. We firstly compute spectrograms from waveform data and then perform a domain translation using a Generative Adversarial Network (GAN) architecture. An additional siamese network helps preserving speech information in the translation process, without sacrificing the ability to flexibly model the style of the target speaker. We test our framework with a dataset of clean speech recordings, as well as with a collection of noisy real-world speech examples. Finally, we apply the same method to perform music style transfer, translating arbitrarily long music samples from one genre to another, and showing that our framework is flexible and can be used for audio manipulation applications different from voice conversion.
Human pose estimation from monocular images is one of the most challenging and computationally demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically-connected joints or limbs, leading to inference quadratic in the number of pixels.
As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility. In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interaction cliques.
First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered.
Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement. These techniques allow for sparse and efficient inference on the order of minutes per image or video clip.
As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and more.
Finally, we apply these techniques to higher-order cliques, extending the idea of poselets to structured models. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets.
Organizers: Michel Besserve
Pose estimation and tracking has been a focus of computer vision research for many years. Despite many successes, however, most approaches to date are still not able to recover physically realistic (natural looking) 3d motions and are restricted to captures indoors or with simplified backgrounds. In the first part of this talk, I will briefly introduce a class of models that use physics to constrain the motion of the subject to more realistic interpretations.
In particular, we formulate the pose tracking problem as one of inference of control mechanisms which implicitly (through physical simulation) generate the kinematic motion matching the image observations. This formulation of the problem has a number of benefits with respect to more traditional kinematic models. In the second part of the talk, I will describe a new proof-of-concept framework for capturing human motion in outdoor environments where traditional motion capture systems, including marker-less motion systems, would typically be inapplicable.
The proposed system consists of a number of small body-mounted cameras, placed on all major segments of the body, and is capable of recovering the underlying skeletal motion by observing the scene as it changes, within each camera view, with the motion of the subjects’ body.
Organizers: Michel Besserve
Shape analysis aims to describe either a single shape or a population of shapes in an efficient and informative way. This is a key problem in various applications such as mesh deformation and animation, object recognition, and mesh parameterization.
I will present a number of approaches to process shapes that are nearly isometric. The first approach computes the correspondence information between a population of shapes in this setting. Second and third are approaches to morph between two shapes and to segment a population of shapes into near-rigid components. Next, I will present an approach for isometry-invariant shape description and feature extraction.
Furthermore, I will present an algorithm to compute the correspondence information between human bodies in varying postures. In addition to being nearly isometric, human body shapes share the same geometric structure, and we can take advantage of this prior geometric information to find accurate correspondences. Finally, I will discuss some applications of shape analysis in computer-aided design.
We propose a geometric approach to articulated tracking, where the human pose representation is expressed on the Riemannian manifold of joint positions. This is in contrast to conventional methods where the problem is phrased in terms of intrinsic parameters of the human pose. Our model is based on a physically natural metric that also has strong links to neurological models of human motion planning. Some benefits of the model is that it allows for easy modeling of interaction with the environment, for data-driven optimization schemes and for well-posed low-pass filtering properties.
To apply the Riemannian model in practice, we derive simulation schemes for Brownian motion on manifolds as well as computationally efficient approximation schemes. The resulting algorithms seem to outperform gold standards both in terms of accuracy and running times.
Organizers: Michel Besserve
A pure refinement procedure for non-rigid registration can be highly effective for establishing dense correspondences between pairs of scanned data, even for significant deformations. I will explain how to design robust non-rigid algorithms and why it is important to couple the optimization of correspondence positions, warping field, and overlapping regions. I will show several applications where it has been successfully applied ranging from film/game production to radiation oncology. One particular interest of mine is facial animation. I will present a fully integrated system for real-time facial performance capture and expression transfer and give a live demo of our latest technology, faceshift. At the end of the talk I
Organizers: Gerard Pons-Moll
Many machine vision/image processing algorithms are designed to be real-time and fully automatic. These attributes are essential, e.g., for stereo robotics vision applications. Visual Effects Studios, however, posses giant server farms and command armies of artists to perform intelligent initialization or provide guidance to algorithms. On the other hand, motion pictures have very high accuracy requirements and the ability to influence an algorithm manually is often more important than other factors, generally considered crucial in Academia. In this talk I will highlight some scenarios, where Academia and the Visual Effects industry disagree.
In the era of perpetually increasing computational capabilities, multi-camera acquisition systems are being increasingly used to capture parameterization-free articulated 3D shapes. These systems allow marker-less shape acquisition and are useful for a wide range of applications in the entertainment, sports, surveillance industries and also in interactive, and augmented reality systems. The availability of vast amount of 3D shape data has increased interest in 3D shape analysis methods. Segmentation and Matching are two important shape analysis tasks. 3D shape segmentation is a subjective task that involves dividing a given shape into constituent parts by assigning each part with a unique segment label.
In the case of 3D shape matching, a dense vertex-to-vertex correspondence between two shapes is desired. However, 3D shapes analysis is particularly difficult in the case of articulated shapes due to complex kinematic poses. These poses induce self-occlusions and shadow effects which cause topological changes such as merging and splitting. In this work we propose robust segmentation and matching methods for articulated 3D shapes represented as mesh-graphs using graph spectral methods.
This talk is divided into two parts. Part one of the talk will focus on 3D shape segmentation, attempted both in an unsupervised and semi-supervised setting by analysing the properties of discrete Laplacian eigenspaces of mesh-graphs. In the second part, 3D shape matching is analysed in a multi-scale heat-diffusion framework derived from Laplacian eigenspace. We believe that this framework is well suited to handle large topological changes and we substantiate our beliefe by showing promising results on various publicly available real mesh datasets.
Organizers: Sebastian Trimpe
Capturing human motion or objects by vision technology has been intensively studied. Although humans interact very often with other persons or objects, most of the previous work has focused on capturing a single object or the motion of a single person. In this talk, I will highlight four projects that deal with human-human or human-object interactions. The first project addresses the problem of capturing skeleton and non-articulated cloth motion of two interacting characters. The second project aims to model spatial hand-object relations during object manipulation. In the third project, an affordance detector is learned from human-object interactions. The fourth project investigates how human motion can be exploited for object discovery from depth video streams.
n this talk I will present recent work on two different topics from low- and high-level computer vision: Intrinsic Image Recovery and Efficient object detection. By intrinsic image decomposition we refer to the challenging task of decoupling material properties from lighting properties given a single image. We propose a probabilistic model that incorporates previous attempts exploiting edge information and combine it with a novel prior on material reflectances in the image. This results in a random field model with global, latent variables and pixel-accurate output reflectance values. I will present experiments on a recently proposed ground-truth database.
The proposed model is found to outperform previous models that have been proposed. Then I will also discuss some possible future developments in this field. In the second part of the talk I will present an efficient object detection scheme that breaks the computational complexity of commonly used detection algorithms, eg sliding windows. We pose the detection problem naturally as a structured prediction problem for which we decompose the inference procedure into an adaptive best-first search.
This results in test-time inference that scales sub-linearly in the size of the search space and detection requires usually less than 100 classifier evaluations. This paves the way for using strong (but costly) classifiers such as non-linear SVMs. The algorithmic properties are demonstrated using the VOC'07 dataset. This work is part of the Visipedia project, in collaboration with Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder and Pietro Perona.
3D shape correspondence methods seek on two given shapes for pairs of surface points that are semantically equivalent. We present three automatic algorithms that address three different aspects of this problem: 1) coarse, 2) dense, and 3) partial correspondence. In 1), after sampling evenly-spaced base vertices on shapes, we formulate the problem of shape correspondence as combinatorial optimization over the domain of all possible mappings of bases, which then reduces within a probabilistic framework to a log-likelihood maximization problem that we solve via EM (Expectation Maximization) algorithm.
Due to computational limitations, we change this algorithm to a coarse-to-fine one (2) to achieve dense correspondence between all vertices. Our scale-invariant isometric distortion measure makes partial matching (3) possible as well.