Header logo is de


2020


no image
ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems

Wenk, P., Abbati, G., Osborne, M. A., Schölkopf, B., Krause, A., Bauer, S.

Proceedings of the 34th Conference on Artificial Intelligence (AAAI), 34(4):6364-6371, AAAI Press, Febuary 2020, AAAI Technical Track: Machine Learning (conference)

ei

link (url) DOI [BibTex]

2020


link (url) DOI [BibTex]


no image
Interpretable and Differentially Private Predictions

Harder, F., Bauer, M., Park, M.

Proceedings of the 34th Conference on Artificial Intelligence (AAAI), 34(4):4083-4090, AAAI Press, Febuary 2020, AAAI Technical Track: Machine Learning (conference)

ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
A Commentary on the Unsupervised Learning of Disentangled Representations

Locatello, F., Bauer, S., Lucic, M., Rätsch, G., Gelly, S., Schölkopf, B., Bachem, O.

Proceedings of the 34th Conference on Artificial Intelligence (AAAI), 34(9):13681-13684, AAAI Press, Febuary 2020, Sister Conference Track (conference)

ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image

Paschalidou, D., Gool, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Humans perceive the 3D world as a set of distinct objects that are characterized by various low-level (geometry, reflectance) and high-level (connectivity, adjacency, symmetry) properties. Recent methods based on convolutional neural networks (CNNs) demonstrated impressive progress in 3D reconstruction, even when using a single 2D image as input. However, the majority of these methods focuses on recovering the local 3D geometry of an object without considering its part-based decomposition or relations between parts. We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives, where simple parts are represented with fewer primitives and more complex parts are modeled with more components. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.

avg

pdf suppmat Video 2 Project Page Slides Poster Video 1 [BibTex]

pdf suppmat Video 2 Project Page Slides Poster Video 1 [BibTex]


Towards 5-DoF Control of an Untethered Magnetic Millirobot via MRI Gradient Coils
Towards 5-DoF Control of an Untethered Magnetic Millirobot via MRI Gradient Coils

Onder Erin, D. A. M. E. T., Sitti, M.

In IEEE International Conference on Robotics and Automation (ICRA), 2020 (inproceedings)

pi

[BibTex]

[BibTex]


no image
ACTrain: Ein KI-basiertes Aufmerksamkeitstraining für die Wissensarbeit [ACTrain: An AI-based attention training for knowledge work]

Wirzberger, M., Oreshnikov, I., Passy, J., Lado, A., Shenhav, A., Lieder, F.

66th Spring Conference of the German Ergonomics Society, 2020 (conference)

Abstract
Unser digitales Zeitalter lebt von Informationen und stellt unsere begrenzte Verarbeitungskapazität damit täglich auf die Probe. Gerade in der Wissensarbeit haben ständige Ablenkungen erhebliche Leistungseinbußen zur Folge. Unsere intelligente Anwendung ACTrain setzt genau an dieser Stelle an und verwandelt Computertätigkeiten in eine Trainingshalle für den Geist. Feedback auf Basis maschineller Lernverfahren zeigt anschaulich den Wert auf, sich nicht von einer selbst gewählten Aufgabe ablenken zu lassen. Diese metakognitive Einsicht soll zum Durchhalten motivieren und das zugrunde liegende Fertigkeitsniveau der Aufmerksamkeitskontrolle stärken. In laufenden Feldexperimenten untersuchen wir die Frage, ob das Training mit diesem optimalen Feedback die Aufmerksamkeits- und Selbstkontrollfertigkeiten im Vergleich zu einer Kontrollgruppe ohne Feedback verbessern kann.

re sf

link (url) Project Page [BibTex]


no image
Computationally Tractable Riemannian Manifolds for Graph Embeddings

Cruceru, C., Becigneul, G., Ganea, O.

37th International Conference on Machine Learning (ICML), 2020 (conference) Submitted

ei

[BibTex]

[BibTex]


no image
A Real-Robot Dataset for Assessing Transferability of Learned Dynamics Models

Agudelo-España, D., Zadaianchuk, A., Wenk, P., Garg, A., Akpo, J., Grimminger, F., Viereck, J., Naveau, M., Righetti, L., Martius, G., Krause, A., Schölkopf, B., Bauer, S., Wüthrich, M.

IEEE International Conference on Robotics and Automation (ICRA), 2020 (conference) Accepted

am al ei mg

Project Page PDF [BibTex]

Project Page PDF [BibTex]


Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
In recent years, Generative Adversarial Networks have achieved impressive results in photorealistic image synthesis. This progress nurtures hopes that one day the classical rendering pipeline can be replaced by efficient models that are learned directly from images. However, current image synthesis models operate in the 2D domain where disentangling 3D properties such as camera viewpoint or object pose is challenging. Furthermore, they lack an interpretable and controllable representation. Our key hypothesis is that the image generation process should be modeled in 3D space as the physical world surrounding us is intrinsically three-dimensional. We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain. We demonstrate that our model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images. Compared to pure 2D baselines, it allows for synthesizing scenes that are consistent wrt. changes in viewpoint or object pose. We further evaluate various 3D representations in terms of their usefulness for this challenging task.

avg

pdf suppmat Video 2 Project Page Video 1 Slides Poster [BibTex]

pdf suppmat Video 2 Project Page Video 1 Slides Poster [BibTex]


Worst-Case Risk Quantification under Distributional Ambiguity using Kernel Mean Embedding in Moment Problem
Worst-Case Risk Quantification under Distributional Ambiguity using Kernel Mean Embedding in Moment Problem

Zhu, J., Jitkrittum, W., Diehl, M., Schölkopf, B.

In 59th IEEE Conference on Decision and Control (CDC), 2020 (inproceedings) Accepted

ei

[BibTex]

[BibTex]


no image
Learning to Identify Physical Parameters from Video Using Differentiable Physics

Kandukuri, R., Achterhold, J., Moeller, M., Stueckler, J.

Accepted for publication at the 42th German Conference on Pattern Recognition (GCPR), 2020 (conference) Accepted

ev

link (url) [BibTex]

link (url) [BibTex]


no image
Practical Accelerated Optimization on Riemannian Manifolds

F Alimisis, F., Orvieto, A., Becigneul, G., Lucchi, A.

37th International Conference on Machine Learning (ICML), 2020 (conference) Submitted

ei

[BibTex]

[BibTex]


Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving
Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving

Prakash, A., Behl, A., Ohn-Bar, E., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Data aggregation techniques can significantly improve vision-based policy learning within a training environment, e.g., learning to drive in a specific simulation condition. However, as on-policy data is sequentially sampled and added in an iterative manner, the policy can specialize and overfit to the training conditions. For real-world applications, it is useful for the learned policy to generalize to novel scenarios that differ from the training conditions. To improve policy learning while maintaining robustness when training end-to-end driving policies, we perform an extensive analysis of data aggregation techniques in the CARLA environment. We demonstrate how the majority of them have poor generalization performance, and develop a novel approach with empirically better generalization performance compared to existing techniques. Our two key ideas are (1) to sample critical states from the collected on-policy data based on the utility they provide to the learned policy in terms of driving behavior, and (2) to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy's state distribution. We evaluate the proposed approach on the CARLA NoCrash benchmark, focusing on the most challenging driving scenarios with dense pedestrian and vehicle traffic. Our approach improves driving success rate by 16% over state-of-the-art, achieving 87% of the expert performance while also reducing the collision rate by an order of magnitude without the use of any additional modality, auxiliary tasks, architectural modifications or reward from the environment.

avg

pdf suppmat Video 2 Project Page Slides Video 1 [BibTex]

pdf suppmat Video 2 Project Page Slides Video 1 [BibTex]


no image
Planning from Images with Deep Latent Gaussian Process Dynamics

Bosch, N., Achterhold, J., Leal-Taixe, L., Stückler, J.

Proceedings of the 2nd Conference on Learning for Dynamics and Control (L4DC), 120, pages: 640-650, Proceedings of Machine Learning Research (PMLR), (Editors: Alexandre M. Bayen and Ali Jadbabaie and George Pappas and Pablo A. Parrilo and Benjamin Recht and Claire Tomlin and Melanie Zeilinger), 2020, arXiv:2005.03770 (conference)

ev

preprint project page poster [BibTex]

preprint project page poster [BibTex]


no image
Constant Curvature Graph Convolutional Networks

Bachmann*, G., Becigneul*, G., Ganea, O.

37th International Conference on Machine Learning (ICML), 2020, *equal contribution (conference) Submitted

ei

[BibTex]

[BibTex]


Learning Situational Driving
Learning Situational Driving

Ohn-Bar, E., Prakash, A., Behl, A., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering in rainy, limited visibility conditions with no lane markings to turning in a busy intersection while yielding to pedestrians. In contrast, we find that state-of-the-art sensorimotor driving models struggle when encountering diverse settings with varying relationships between observation and action. To generalize when making decisions across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, we develop a framework for learning a situational driving policy that effectively captures reasoning under varying types of scenarios. Our key idea is to learn a mixture model with a set of policies that can capture multiple driving modes. We first optimize the mixture model through behavior cloning, and show it to result in significant gains in terms of driving performance in diverse conditions. We then refine the model by directly optimizing for the driving task itself, i.e., supervised with the navigation task reward. Our method is more scalable than methods assuming access to privileged information, e.g., perception labels, as it only assumes demonstration and reward-based supervision. We achieve over 98% success rate on the CARLA driving benchmark as well as state-of-the-art performance on a newly introduced generalization benchmark.

avg

pdf suppmat Video 2 Project Page Video 1 Slides [BibTex]

pdf suppmat Video 2 Project Page Video 1 Slides [BibTex]


On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner
On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner

Schmitt, C., Donne, S., Riegler, G., Koltun, V., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
We propose a novel formulation for joint recovery of camera pose, object geometry and spatially-varying BRDF. The input to our approach is a sequence of RGB-D images captured by a mobile, hand-held scanner that actively illuminates the scene with point light sources. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. By integrating material clustering as a differentiable operation into the optimization process, we avoid pre-processing heuristics and demonstrate that our model is able to determine the correct number of specular materials independently. We provide a study on the importance of each component in our formulation and on the requirements of the initial geometry. We show that optimizing over the poses is crucial for accurately recovering fine details and that our approach naturally results in a semantically meaningful material segmentation.

avg

pdf Project Page Slides Video Poster [BibTex]

pdf Project Page Slides Video Poster [BibTex]


Optimizing Rank-based Metrics with Blackbox Differentiation
Optimizing Rank-based Metrics with Blackbox Differentiation

Rolinek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 7620-7630, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020, Best paper nomination (inproceedings)

Abstract
Rank-based metrics are some of the most widely used criteria for performance evaluation of computer vision models. Despite years of effort, direct optimization for these metrics remains a challenge due to their non-differentiable and non-decomposable nature. We present an efficient, theoretically sound, and general method for differentiating rank-based metrics with mini-batch gradient descent. In addition, we address optimization instability and sparsity of the supervision signal that both arise from using rank-based metrics as optimization targets. Resulting losses based on recall and Average Precision are applied to image retrieval and object detection tasks. We obtain performance that is competitive with state-of-the-art on standard image retrieval datasets and consistently improve performance of near state-of-the-art object detectors.

al

Code Long Oral Short Oral Arxiv Project Page [BibTex]

Code Long Oral Short Oral Arxiv Project Page [BibTex]


Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision

Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Learning-based 3D reconstruction methods have shown impressive results. However, most methods require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. Unfortunately, these approaches are currently restricted to voxel- and mesh-based representations, suffering from discretization or low resolution. In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. Our key insight is that depth gradients can be derived analytically using the concept of implicit differentiation. This allows us to learn implicit shape and texture representations directly from RGB images. We experimentally show that our single-view reconstructions rival those learned with full 3D supervision. Moreover, we find that our method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.

avg

pdf suppmat Video 2 Project Page Video 1 Video 3 Slides Poster [BibTex]

pdf suppmat Video 2 Project Page Video 1 Video 3 Slides Poster [BibTex]


no image
Divide-and-Conquer Monte Carlo Tree Search for goal directed planning

Parascandolo*, G., Buesing*, L., Merel, J., Hasenclever, L., Aslanides, J., Hamrick, J. B., Heess, N., Neitz, A., Weber, T.

2020, *equal contribution (conference) Submitted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation

Wang, R., Yang, N., Stückler, J., Cremers, D.

In Proceedings of the IEEE international Conference on Robotics and Automation (ICRA), 2020, arXiv:1904.10097 (inproceedings)

ev

[BibTex]

[BibTex]


no image
Learning to Adapt Multi-View Stereo by Self-Supervision

Mallick, A., Stückler, J., Lensch, H.

Proceedings of the British Machine Vision Conference (BMVC), 2020, to appear (conference) To be published

ev

link (url) [BibTex]

link (url) [BibTex]

2006


no image
Conformal Multi-Instance Kernels

Blaschko, M., Hofmann, T.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-6, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

ei

PDF Web [BibTex]

2006


PDF Web [BibTex]


no image
Adapting Spatial Filter Methods for Nonstationary BCIs

Tomioka, R., Hill, J., Blankertz, B., Aihara, K.

In IBIS 2006, pages: 65-70, 2006 Workshop on Information-Based Induction Sciences, November 2006 (inproceedings)

Abstract
A major challenge in applying machine learning methods to Brain-Computer Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock the method is trained on and that the method is applied to. Assuming the joint distributions of the whitened signal and the class label to be identical in two blocks, where the whitening is done in each block independently, we propose a simple adaptation formula that is applicable to a broad class of spatial filtering methods including ICA, CSP, and logistic regression classifiers. We characterize the class of linear transformations for which the above assumption holds. Experimental results on 60 BCI datasets show improved classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and (b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

ei

PDF [BibTex]

PDF [BibTex]


no image
A Linear Programming Approach for Molecular QSAR analysis

Saigo, H., Kadowaki, T., Tsuda, K.

In MLG 2006, pages: 85-96, (Editors: Gärtner, T. , G. C. Garriga, T. Meinl), International Workshop on Mining and Learning with Graphs, September 2006, Best Paper Award (inproceedings)

Abstract
Small molecules in chemistry can be represented as graphs. In a quantitative structure-activity relationship (QSAR) analysis, the central task is to find a regression function that predicts the activity of the molecule in high accuracy. Setting a QSAR as a primal target, we propose a new linear programming approach to the graph-based regression problem. Our method extends the graph classification algorithm by Kudo et al. (NIPS 2004), which is a combination of boosting and graph mining. Instead of sequential multiplicative updates, we employ the linear programming boosting (LP) for regression. The LP approach allows to include inequality constraints for the parameter vector, which turns out to be particularly useful in QSAR tasks where activity values are sometimes unavailable. Furthermore, the efficiency is improved significantly by employing multiple pricing.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Incremental Aspect Models for Mining Document Streams

Surendran, A., Sra, S.

In PKDD 2006, pages: 633-640, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
In this paper we introduce a novel approach for incrementally building aspect models, and use it to dynamically discover underlying themes from document streams. Using the new approach we present an application which we call “query-line tracking” i.e., we automatically discover and summarize different themes or stories that appear over time, and that relate to a particular query. We present evaluation on news corpora to demonstrate the strength of our method for both query-line tracking, online indexing and clustering.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
PALMA: Perfect Alignments using Large Margin Algorithms

Rätsch, G., Hepp, B., Schulze, U., Ong, C.

In GCB 2006, pages: 104-113, (Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics, September 2006 (inproceedings)

Abstract
Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Based Semi-Supervised Learning with Sharper Edges

Shin, H., Hill, N., Rätsch, G.

In ECML 2006, pages: 401-412, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning (ECML), September 2006 (inproceedings)

Abstract
In many graph-based semi-supervised learning algorithms, edge weights are assumed to be fixed and determined by the data points‘ (often symmetric)relationships in input space, without considering directionality. However, relationships may be more informative in one direction (e.g. from labelled to unlabelled) than in the reverse direction, and some relationships (e.g. strong weights between oppositely labelled points) are unhelpful in either direction. Undesirable edges may reduce the amount of influence an informative point can propagate to its neighbours -- the point and its outgoing edges have been ``blunted.‘‘ We present an approach to ``sharpening‘‘ in which weights are adjusted to meet an optimization criterion wherever they are directed towards labelled points. This principle can be applied to a wide variety of algorithms. In the current paper, we present one ad hoc solution satisfying the principle, in order to show that it can improve performance on a number of publicly available benchmark data sets.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

Deisenroth, MP., Ohtsuka, T., Weissel, F., Brunn, D., Hanebeck, UD.

In MFI 2006, pages: 371-376, (Editors: Hanebeck, U. D.), IEEE Service Center, Piscataway, NJ, USA, 6th IEEE International Conference on Multisensor Fusion and Integration, September 2006 (inproceedings)

Abstract
In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Uniform Convergence of Adaptive Graph-Based Regularization

Hein, M.

In COLT 2006, pages: 50-64, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
The regularization functional induced by the graph Laplacian of a random neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying manifold structure and second to the density of the data-generating probability measure. We identify in this paper the limit of the regularizer and show uniform convergence over the space of Hoelder functions. As an intermediate step we derive upper bounds on the covering numbers of Hoelder functions on compact Riemannian manifolds, which are of independent interest for the theoretical analysis of manifold-based learning methods.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Regularised CSP for Sensor Selection in BCI

Farquhar, J., Hill, N., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 14-15, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Time-Dependent Demixing of Task-Relevant EEG Signals

Hill, N., Farquhar, J., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 20-21, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
Given a spatial filtering algorithm that has allowed us to identify task-relevant EEG sources, we present a simple approach for monitoring the activity of these sources while remaining relatively robust to changes in other (task-irrelevant) brain activity. The idea is to keep spatial *patterns* fixed rather than spatial filters, when transferring from training to test sessions or from one time window to another. We show that a fixed spatial pattern (FSP) approach, using a moving-window estimate of signal covariances, can be more robust to non-stationarity than a fixed spatial filter (FSF) approach.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
A Sober Look at Clustering Stability

Ben-David, S., von Luxburg, U., Pal, D.

In COLT 2006, pages: 5-19, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Information Marginalization on Subgraphs

Huang, J., Zhu, T., Rereiner, R., Zhou, D., Schuurmans, D.

In ECML/PKDD 2006, pages: 199-210, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
Real-world data often involves objects that exhibit multiple relationships; for example, ‘papers’ and ‘authors’ exhibit both paper-author interactions and paper-paper citation relationships. A typical learning problem requires one to make inferences about a subclass of objects (e.g. ‘papers’), while using the remaining objects and relations to provide relevant information. We present a simple, unified mechanism for incorporating information from multiple object types and relations when learning on a targeted subset. In this scheme, all sources of relevant information are marginalized onto the target subclass via random walks. We show that marginalized random walks can be used as a general technique for combining multiple sources of information in relational data. With this approach, we formulate new algorithms for transduction and ranking in relational data, and quantify the performance of new schemes on real world data—achieving good results in many problems.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Active Learning for Sensitivity Analysis

Pfingsten, T.

In ECML 2006, pages: 353-364, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning, September 2006 (inproceedings)

Abstract
Designs of micro electro-mechanical devices need to be robust against fluctuations in mass production. Computer experiments with tens of parameters are used to explore the behavior of the system, and to compute sensitivity measures as expectations over the input distribution. Monte Carlo methods are a simple approach to estimate these integrals, but they are infeasible when the models are computationally expensive. Using a Gaussian processes prior, expensive simulation runs can be saved. This Bayesian quadrature allows for an active selection of inputs where the simulation promises to be most valuable, and the number of simulation runs can be reduced further. We present an active learning scheme for sensitivity analysis which is rigorously derived from the corresponding Bayesian expected loss. On three fully featured, high dimensional physical models of electro-mechanical sensors, we show that the learning rate in the active learning scheme is significantly better than for passive learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Supervised Probabilistic Principal Component Analysis

Yu, S., Yu, K., Tresp, V., Kriegel, H., Wu, M.

In KDD 2006, pages: 464-473, (Editors: Ungar, L. ), ACM Press, New York, NY, USA, 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006 (inproceedings)

Abstract
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Let It Roll – Emerging Sensorimotor Coordination in a Spherical Robot

Der, R., Martius, G., Hesse, F.

In Proc, Artificial Life X, pages: 192-198, Intl. Society for Artificial Life, MIT Press, August 2006 (inproceedings)

al

[BibTex]

[BibTex]


no image
A Continuation Method for Semi-Supervised SVMs

Chapelle, O., Chi, M., Zien, A.

In ICML 2006, pages: 185-192, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is non-convex and has many local minima, which often results in suboptimal performances. In this paper we propose to use a global optimization technique known as continuation to alleviate this problem. Compared to other algorithms minimizing the same objective function, our continuation method often leads to lower test errors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Trading Convexity for Scalability

Collobert, R., Sinz, F., Weston, J., Bottou, L.

In ICML 2006, pages: 201-208, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Personalized handwriting recognition via biased regularization

Kienzle, W., Chellapilla, K.

In ICML 2006, pages: 457-464, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
We present a new approach to personalized handwriting recognition. The problem, also known as writer adaptation, consists of converting a generic (user-independent) recognizer into a personalized (user-dependent) one, which has an improved recognition rate for a particular user. The adaptation step usually involves user-specific samples, which leads to the fundamental question of how to fuse this new information with that captured by the generic recognizer. We propose adapting the recognizer by minimizing a regularized risk functional (a modified SVM) where the prior knowledge from the generic recognizer enters through a modified regularization term. The result is a simple personalization framework with very good practical properties. Experiments on a 100 class real-world data set show that the number of errors can be reduced by over 40% with as few as five user samples per character.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Deterministic annealing for semi-supervised kernel machines

Sindhwani, V., Keerthi, S., Chapelle, O.

In ICML 2006, pages: 841-848, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
An intuitive approach to utilizing unlabeled data in kernel-based classification algorithms is to simply treat the unknown labels as additional optimization variables. For margin-based loss functions, one can view this approach as attempting to learn low-density separators. However, this is a hard optimization problem to solve in typical semi-supervised settings where unlabeled data is abundant. The popular Transductive SVM algorithm is a label-switching-retraining procedure that is known to be susceptible to local minima. In this paper, we present a global optimization framework for semi-supervised Kernel machines where an easier problem is parametrically deformed to the original hard problem and minimizers are smoothly tracked. Our approach is motivated from deterministic annealing techniques and involves a sequence of convex optimization problems that are exactly and efficiently solved. We present empirical results on several synthetic and real world datasets that demonstrate the effectiveness of our approach.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Clustering Graphs by Weighted Substructure Mining

Tsuda, K., Kudo, T.

In ICML 2006, pages: 953-960, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. We propose an efficient method for learning a binomial mixture model in this feature space. Combining the $ell_1$ regularizer and the data structure called DFS code tree, the MAP estimate of non-zero parameters are computed efficiently by means of the EM algorithm. Our method is applied to the clustering of RNA graphs, and is compared favorably with graph kernels and the spectral graph distance.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Choice Model with Infinitely Many Latent Features

Görür, D., Jäkel, F., Rasmussen, C.

In ICML 2006, pages: 361-368, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Elimination by aspects (EBA) is a probabilistic choice model describing how humans decide between several options. The options from which the choice is made are characterized by binary features and associated weights. For instance, when choosing which mobile phone to buy the features to consider may be: long lasting battery, color screen, etc. Existing methods for inferring the parameters of the model assume pre-specified features. However, the features that lead to the observed choices are not always known. Here, we present a non-parametric Bayesian model to infer the features of the options and the corresponding weights from choice data. We use the Indian buffet process (IBP) as a prior over the features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described. The main contribution of this paper is an MCMC algorithm for the EBA model that can also be used in inference for other non-conjugate IBP models---this may broaden the use of IBP priors considerably.

ei

PostScript PDF Web DOI [BibTex]

PostScript PDF Web DOI [BibTex]


no image
Learning High-Order MRF Priors of Color Images

McAuley, J., Caetano, T., Smola, A., Franz, MO.

In ICML 2006, pages: 617-624, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
In this paper, we use large neighborhood Markov random fields to learn rich prior models of color images. Our approach extends the monochromatic Fields of Experts model (Roth and Blackwell, 2005) to color images. In the Fields of Experts model, the curse of dimensionality due to very large clique sizes is circumvented by parameterizing the potential functions according to a product of experts. We introduce several simplifications of the original approach by Roth and Black which allow us to cope with the increased clique size (typically 3x3x3 or 5x5x3 pixels) of color images. Experimental results are presented for image denoising which evidence improvements over state-of-the-art monochromatic image priors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Inference with the Universum

Weston, J., Collobert, R., Sinz, F., Bottou, L., Vapnik, V.

In ICML 2006, pages: 1009-1016, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
WIn this paper we study a new framework introduced by Vapnik (1998) and Vapnik (2006) that is an alternative capacity concept to the large margin approach. In the particular case of binary classification, we are given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand. We describe an algorithm to leverage the Universum by maximizing the number of observed contradictions, and show experimentally that this approach delivers accuracy improvements over using labeled data alone.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Statistical Convergence of Kernel CCA

Fukumizu, K., Bach, F., Gretton, A.

In Advances in neural information processing systems 18, pages: 387-394, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
While kernel canonical correlation analysis (kernel CCA) has been applied in many problems, the asymptotic convergence of the functions estimated from a finite sample to the true functions has not yet been established. This paper gives a rigorous proof of the statistical convergence of kernel CCA and a related method (NOCCO), which provides a theoretical justification for these methods. The result also gives a sufficient condition on the decay of the regularization coefficient in the methods to ensure convergence.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Products of "Edge-perts"

Gehler, PV., Welling, M.

In Advances in neural information processing systems 18, pages: 419-426, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Assessing Approximations for Gaussian Process Classification

Kuss, M., Rasmussen, C.

In Advances in neural information processing systems 18, pages: 699-706, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Gaussian processes are attractive models for probabilistic classification but unfortunately exact inference is analytically intractable. We compare Laplace‘s method and Expectation Propagation (EP) focusing on marginal likelihood estimates and predictive performance. We explain theoretically and corroborate empirically that EP is superior to Laplace. We also compare to a sophisticated MCMC scheme and show that EP is surprisingly accurate.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning an Interest Operator from Human Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

In CVPWR 2006, pages: page 24, (Editors: C Schmid and S Soatto and C Tomasi), IEEE Computer Society, Los Alamitos, CA, USA, 2006 Conference on Computer Vision and Pattern Recognition Workshop, April 2006 (inproceedings)

Abstract
We present an approach for designing interest operators that are based on human eye movement statistics. In contrast to existing methods which use hand-crafted saliency measures, we use machine learning methods to infer an interest operator directly from eye movement data. That way, the operator provides a measure of biologically plausible interestingness. We describe the data collection, training, and evaluation process, and show that our learned saliency measure significantly accounts for human eye movements. Furthermore, we illustrate connections to existing interest operators, and present a multi-scale interest point detector based on the learned function.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Evaluating Predictive Uncertainty Challenge

Quinonero Candela, J., Rasmussen, C., Sinz, F., Bousquet, O., Schölkopf, B.

In Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, pages: 1-27, (Editors: J Quiñonero Candela and I Dagan and B Magnini and F d’Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop (MLCW), April 2006 (inproceedings)

Abstract
This Chapter presents the PASCAL Evaluating Predictive Uncertainty Challenge, introduces the contributed Chapters by the participants who obtained outstanding results, and provides a discussion with some lessons to be learnt. The Challenge was set up to evaluate the ability of Machine Learning algorithms to provide good “probabilistic predictions”, rather than just the usual “point predictions” with no measure of uncertainty, in regression and classification problems. Parti-cipants had to compete on a number of regression and classification tasks, and were evaluated by both traditional losses that only take into account point predictions and losses we proposed that evaluate the quality of the probabilistic predictions.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]