Applying data-driven approaches to non-rigid 3D reconstruction has been difficult, which we believe can be attributed to the lack of a large-scale training corpus. One recent approach proposes self-supervision based on non-rigid reconstruction. Unfortunately, this method fails for important cases such as highly non-rigid deformations. We first address this problem of lack of data by introducing a novel semi-supervised strategy to obtain dense interframe correspondences from a sparse set of annotations. This way, we obtain a large dataset of 400 scenes, over 390,000 RGB-D frames, and 2,537 densely aligned frame pairs; in addition, we provide a test set along with several metrics for evaluation. Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline. Here, we propose a new neural network that operates on RGB-D frames, while maintaining robustness under large non-rigid deformations and producing accurate predictions. Our approach significantly outperforms both existing non-rigid reconstruction methods that do not use learned data terms, as well as learning-based approaches that only use self-supervision.
Organizers: Vassilis Choutas
How can we tell that a video is playing backwards? People's motions look wrong when the video is played backwards--can we develop an algorithm to distinguish forward from backward video? Similarly, can we tell if a video is sped-up? We have developed algorithms to distinguish forwards from backwards video, and fast from slow. Training algorithms for these tasks provides a self-supervised task that facilitates human activity recognition. We'll show these results, and applications of these unsupervised video learning tasks. We also present a method to retime people in videos --- manipulating and editing the time over which the motions of individuals occurs. Our model not only disentangles the motions of each person in the video, but it also correlates each person with the scene changes they generate, and thus re-times the corresponding shadows, reflections, and motion of loose clothing appropriately.
Organizers: Yinghao Huang
In recent years, commodity 3D sensors have become widely available, spawning significant interest in both offline and real-time 3D reconstruction. While state-of-the-art reconstruction results from commodity RGB-D sensors are visually appealing, they are far from usable in practical computer graphics applications since they do not match the high quality of artist-modeled 3D graphics content. One of the biggest challenges in this context is that obtained 3D scans suffer from occlusions, thus resulting in incomplete 3D models. In this talk, I will present a data-driven approach towards generating high quality 3D models from commodity scan data, and the use of these geometrically complete 3D models towards semantic and texture understanding of real-world environments.
Organizers: Yinghao Huang
In this talk, Majid Taghavi will briefly discuss the demand for high-performance electromechanical transducers, the current challenges, and approaches he has been pursuing to tackle them. He will discuss multiple electromechanical concepts and devices that he has delivered for low-power energy harvesting, self-powered sensors, and artificial muscle technologies. Majid Taghavi will look into piezoelectric, triboelectric, electrostatic, dielectrophoretic, and androphilic phenomena, and will show his observations and innovations in coupling physical phenomena and developing smart materials and intelligent devices.
Organizers: Metin Sitti
Prof. Eric Dufresne will describe some experiments on some simple composites of elastomers and droplets. First, we will consider their composite mechanical properties. He will show how simple liquid droplets can counterintuitively stiffen the material, and how magnetorheological fluid droplets can provide elastomers with magnetically switchable shape memory. Second, we consider the nucleation, growth, and ripening of droplets within an elastomer. Here, a variety of interesting phenomena emerge: size-tunable monodisperse droplets, shape-tunable droplets, and ripening of droplets along stiffness gradients. We are exploiting these phenomena to make materials with mechanically switchable structural color.
Organizers: Metin Sitti
Computation has fundamentally changed the way we study nature. New data collection technology, such as GPS, high definition cameras, UAVs, genotyping, and crowdsourcing, are generating data about wild populations that are orders of magnitude richer than any previously collected. Unfortunately, in this domain as in many others, our ability to analyze data lags substantially behind our ability to collect it. In this talk I will show how computational approaches can be part of every stage of the scientific process of understanding animal sociality, from intelligent data collection (crowdsourcing photographs and identifying individual animals from photographs by stripes and spots - Wildbook.org) to hypothesis formulation (by designing a novel computational framework for analysis of dynamic social networks), and provide scientific insight into collective behavior of zebras, baboons, and other social animals.
Organizers: Aamir Ahmad
Prof. Pietro Valdastri's talk will focus on Medical Capsule Robots. Capsule robots are cm-size devices that leverage extreme miniaturization to access and operate in environments that are out of reach for larger robots. In medicine, capsule robots can be designed to be swallowed like a pill and to diagnose and treat mortal diseases, such as cancer. The talk will move from capsule robots for the inspection of the digestive tract toward a new generation of surgical robots and devices, having a relevant reduction in size, invasiveness, and cost as the main drivers for innovation. During the talk, we will discuss the recent enabling technologies that are being developed at the University of Leeds to transform medical robotics. These technologies include magnetic manipulation of capsule robots, hydraulic and pneumatic actuation, real-time tracking of capsule position and orientation, ultra-low-cost design, frugal innovation, and autonomy in robotic endoscopy. Prof. Russell Harris has been researching new manufacturing processes for over 20 years. He has several research projects focussing on robotics, and is particularly interested in how new manufacturing processes can be an enabler to advanced robotic devices and components. In this talk he will discuss some of this research and where he believes there may be new opportunities for collaborative research across manufacturing and robotics.
Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation task from visual observations involving cluttered and irregularly shaped objects. We investigate two GNN approaches and empirically assess their capability to generalize to scenarios with novel and an increasing number of objects. The first, Graph Networks (GN) based approach, considers explicitly defined edge attributes and not only does it consistently underperform an auto-encoder baseline that we modified to predict future states, our results indicate how different edge attributes can significantly influence the predictions. Consequently, we develop the Auto-Predictor that does not rely on explicitly defined edge attributes. It outperforms the baseline and the GN-based models. Overall, our results show the sensitivity of GNN-based approaches to the task representation, the efficacy of relational inductive biases and advocate choosing lightweight approaches that implicitly reason about relations over ones that leave these decisions to human designers.
Organizers: Siyu Tang
Future cities and infrastructure systems will evolve into complex conglomerates where autonomous aerial, aquatic and ground-based robots will coexist with people and cooperate in symbiosis. To create this human-robot ecosystem, robots will need to respond more flexibly, robustly and efficiently than they do today. They will need to be designed with the ability to move across terrain boundaries and physically interact with infrastructure elements to perform sensing and intervention tasks. Taking inspiration from nature, aerial robotic systems can integrate multi-functional morphology, new materials, energy-efficient locomotion principles and advanced perception abilities that will allow them to successfully operate and cooperate in complex and dynamic environments. This talk will describe the scientific fundamentals, design principles and technologies for the development of biologically inspired flying robots with adaptive morphology that can perform monitoring and manufacturing tasks for future infrastructure and building systems. Examples will include flying robots with perching capabilities and origami-based landing systems, drones for aerial construction and repair, and combustion-based jet thrusters for aerial-aquatic vehicles.
Organizers: Metin Sitti
In the first part of the talk, I am going to present our work on human pose estimation in the Wild, capturing unconstrained images and videos containing an a priori unknown number of people, often occluded and exhibiting a wide range of articulations and appearances. Unlike conventional top-down approaches that first detect humans with the off-the-shelf object detector and then estimate poses independently per bounding box, our formulation performs joint detection and pose estimation. In the first stage we indiscriminately localise body parts of every person in the image with the state-of-the-art ConvNet-based keypoint detector. In the second stage we perform assignment of keypoints to people based on a graph partitioning approach, that minimizes an integer linear program under a set of contraints and with the vertex and edge costs computed by our ConvNet. Our method naturally generalises to articulated tracking of multiple humans in video sequences. Next, I will discuss our work on learning accurate 3D object shape and camera pose from a collection of unlabeled category-specific images. We train a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error: given several views of an object, the projections of the predicted shapes to the predicted camera poses should match the provided views. To deal with pose ambiguity, we introduce an ensemble of pose predictors that we then distill it to a single "student" model. To allow for efficient learning of high-fidelity shapes, we represent the shapes by point clouds and devise a formulation allowing for differentiable projection of these. Finally, I will talk about how to reconstruct an appearance of three-dimensional objects, namely a method for generating a 3D human avatar from an image. Our model predicts a full texture map, clothing segmentation and displacement map. The learning is done in the UV-space of the SMPL model, which turns the hard 3D inference problem into image-to-image translation task, where we can use deep neural networks to encode appearance, geometry and clothing layout. Our model is trained on a dataset of over 4000 3D scans of humans in diverse clothing.
Fingertip skin friction plays a critical role during object manipulation. We will describe a simple and reliable method to estimate the fingertip static coefficient of friction (CF) continuously and quickly during object manipulation, and we will describe a global expression of the CF as a function of the normal force and fingertip moisture. Then we will show how skin hydration modifies the skin deformation dynamics during grip-like contacts. Certain motor behaviours observed during object manipulation could be explained by the effects of skin hydration. Then the biomechanics of the partial slip phenomenon will be described, and we will examine how this partial slip phenomenon is related to the subjective perception of fingertip slip.
A new concept of using permanent magnet systems for guiding superparamagnetic nano-particles (SPP) on arbitrary trajectories over a large volume is presented. The same instrument can also be used for magnetic resonance imaging (MRI) using the inherent contrast of the SPP . The basic idea is to use one magnet system, which provides a strong, homogeneous, dipolar magnetic field to magnetize and orient the particles, and a second constantly graded, quadrupolar field, superimposed on the first, to generate a force on the oriented particles. As a result, particles are guided with constant force and in a single direction over the entire volume. Prototypes of various sizes were constructed to demonstrate the principle in two dimensions on several nanoparticles, which were moved along a rough square by manual adjustment of the force angle . Surprisingly even SPP with sizes < 100 nm could be moved with speeds exceeding 10 mm/s due to reversible agglomeration, for which a first hydrodynamic model is presented. Furthermore, a more advanced system with two quadrupoles is presented which allows canceling the force, hence stopping the SPP and moving them around sharp edges. Additionally, this system also allows for MRI and some first experiments are presented. Recently this concept was combined with liquid crystalline elastomers with incorporated SPP to create “micro-robots” whose coarse maneuvers are performed by a MagGuider-system while there microscopic actuation is controlled either by light or temperature . 1. O. Baun, PB, JMMM 439 (2017) 294-304. doi: 10.1016/j.jmmm.2017.05.001 2. D. Ditter, PB et al. Adv. Functional Mater. 1902454 (2019) doi: 10.1002/adfm.201902454
Organizers: Metin Sitti
Conversational agents in the form of virtual agents or social robots are rapidly becoming wide-spread. Humans use non-verbal behaviors to signal their intent, emotions and attitudes in human-human interactions. Conversational agents therefore need this ability as well in order to make an interaction pleasant and efficient. An important part of non-verbal communication is gesticulation: gestures communicate a large share of non-verbal content. Previous systems for gesture production were typically rule-based and could not represent the range of human gestures. Recently the gesture generation field has shifted to data-driven approaches. We follow this line of research by extending the state-of-the-art deep-learning based model. Our model leverages representation learning to enhance speech-gesture mapping. We provide analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations. We also analyze the importance of smoothing of the produced motion and emphasize how challenging it is to evaluate gesture quality. In the future we plan to enrich input signal by taking semantic context (text transcription) as well, make the model probabilistic and evaluate our system on the social robot NAO.