Header logo is


2006


no image
Conformal Multi-Instance Kernels

Blaschko, M., Hofmann, T.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-6, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

ei

PDF Web [BibTex]

2006


PDF Web [BibTex]


no image
Some observations on the pedestal effect or dipper function

Henning, B., Wichmann, F.

Journal of Vision, 6(13):50, 2006 Fall Vision Meeting of the Optical Society of America, December 2006 (poster)

Abstract
The pedestal effect is the large improvement in the detectabilty of a sinusoidal “signal” grating observed when the signal is added to a masking or “pedestal” grating of the same spatial frequency, orientation, and phase. We measured the pedestal effect in both broadband and notched noise - noise from which a 1.5-octave band centred on the signal frequency had been removed. Although the pedestal effect persists in broadband noise, it almost disappears in the notched noise. Furthermore, the pedestal effect is substantial when either high- or low-pass masking noise is used. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies different from that of the signal and pedestal. The spatial-frequency components of the notched noise above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. Thus the pedestal or dipper effect measured without notched noise is not a characteristic of individual spatial-frequency tuned channels.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Kernel Method for the Two-Sample-Problem

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

20th Annual Conference on Neural Information Processing Systems (NIPS), December 2006 (talk)

Abstract
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. We show that the test statistic can be computed in $O(m^2)$ time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

ei

PDF [BibTex]

PDF [BibTex]


no image
Ab-initio gene finding using machine learning

Schweikert, G., Zeller, G., Zien, A., Ong, C., de Bona, F., Sonnenburg, S., Phillips, P., Rätsch, G.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
Global Biclustering of Microarray Data

Wolf, T., Brors, B., Hofmann, T., Georgii, E.

In ICDMW 2006, pages: 125-129, (Editors: Tsumoto, S. , C. W. Clifton, N. Zhong, X. Wu, J. Liu, B. W. Wah, Y.-M. Cheung), IEEE Computer Society, Los Alamitos, CA, USA, Sixth IEEE International Conference on Data Mining, December 2006 (inproceedings)

Abstract
We consider the problem of simultaneously clustering genes and conditions of a gene expression data matrix. A bicluster is defined as a subset of genes that show similar behavior within a subset of conditions. Finding biclusters can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. Previous work either deals with local, bicluster-based criteria or assumes a very specific structure of the data matrix (e.g. checkerboard or block-diagonal) [11]. In contrast, our goal is to find a set of flexibly arranged biclusters which is optimal in regard to a global objective function. As this is a NP-hard combinatorial problem, we describe several techniques to obtain approximate solutions. We benchmarked our approach successfully on the Alizadeh B-cell lymphoma data set [1].

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Graph boosting for molecular QSAR analysis

Saigo, H., Kadowaki, T., Kudo, T., Tsuda, K.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

Abstract
We propose a new boosting method that systematically combines graph mining and mathematical programming-based machine learning. Informative and interpretable subgraph features are greedily found by a series of graph mining calls. Due to our mathematical programming formulation, subgraph features and pre-calculated real-valued features are seemlessly integrated. We tested our algorithm on a quantitative structure-activity relationship (QSAR) problem, which is basically a regression problem when given a set of chemical compounds. In benchmark experiments, the prediction accuracy of our method favorably compared with the best results reported on each dataset.

ei

Web [BibTex]

Web [BibTex]


no image
Reinforcement Learning by Reward-Weighted Regression

Peters, J.

NIPS Workshop: Towards a New Reinforcement Learning? , December 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
A New Projected Quasi-Newton Approach for the Nonnegative Least Squares Problem

Kim, D., Sra, S., Dhillon, I.

(TR-06-54), Univ. of Texas, Austin, December 2006 (techreport)

ei

PDF [BibTex]

PDF [BibTex]


no image
Inferring Causal Directions by Evaluating the Complexity of Conditional Distributions

Sun, X., Janzing, D., Schölkopf, B.

NIPS Workshop on Causality and Feature Selection, December 2006 (talk)

Abstract
We propose a new approach to infer the causal structure that has generated the observed statistical dependences among n random variables. The idea is that the factorization of the joint measure of cause and effect into P(cause)P(effect|cause) leads typically to simpler conditionals than non-causal factorizations. To evaluate the complexity of the conditionals we have tried two methods. First, we have compared them to those which maximize the conditional entropy subject to the observed first and second moments since we consider the latter as the simplest conditionals. Second, we have fitted the data with conditional probability measures being exponents of functions in an RKHS space and defined the complexity by a Hilbert-space semi-norm. Such a complexity measure has several properties that are useful for our purpose. We describe some encouraging results with both methods applied to real-world data. Moreover, we have combined constraint-based approaches to causal discovery (i.e., methods using only information on conditional statistical dependences) with our method in order to distinguish between causal hypotheses which are equivalent with respect to the imposed independences. Furthermore, we compare the performance to Bayesian approaches to causal inference.

ei

Web [BibTex]


no image
Information-theoretic Metric Learning

Davis, J., Kulis, B., Sra, S., Dhillon, I.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-5, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a low-rank kernel learning problem. Specifically, we minimize the Burg divergence of a low-rank kernel to an input kernel, subject to pairwise distance constraints. Our approach has several advantages over existing methods. First, we present a natural information-theoretic formulation for the problem. Second, the algorithm utilizes the methods developed by Kulis et al. [6], which do not involve any eigenvector computation; in particular, the running time of our method is faster than most existing techniques. Third, the formulation offers insights into connections between metric learning and kernel learning.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Pattern Mining in Frequent Dynamic Subgraphs

Borgwardt, KM., Kriegel, H-P., Wackersreuther, P.

In pages: 818-822, (Editors: Clifton, C.W.), IEEE Computer Society, Los Alamitos, CA, USA, Sixth International Conference on Data Mining (ICDM), December 2006 (inproceedings)

Abstract
Graph-structured data is becoming increasingly abundant in many application domains. Graph mining aims at finding interesting patterns within this data that represent novel knowledge. While current data mining deals with static graphs that do not change over time, coming years will see the advent of an increasing number of time series of graphs. In this article, we investigate how pattern mining on static graphs can be extended to time series of graphs. In particular, we are considering dynamic graphs with edge insertions and edge deletions over time. We define frequency in this setting and provide algorithmic solutions for finding frequent dynamic subgraph patterns. Existing subgraph mining algorithms can be easily integrated into our framework to make them handle dynamic graphs. Experimental results on real-world data confirm the practical feasibility of our approach.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Structure validation of the Josephin domain of ataxin-3: Conclusive evidence for an open conformation

Nicastro, G., Habeck, M., Masino, L., Svergun, DI., Pastore, A.

Journal of Biomolecular NMR, 36(4):267-277, December 2006 (article)

Abstract
The availability of new and fast tools in structure determination has led to a more than exponential growth of the number of structures solved per year. It is therefore increasingly essential to assess the accuracy of the new structures by reliable approaches able to assist validation. Here, we discuss a specific example in which the use of different complementary techniques, which include Bayesian methods and small angle scattering, resulted essential for validating the two currently available structures of the Josephin domain of ataxin-3, a protein involved in the ubiquitin/proteasome pathway and responsible for neurodegenerative spinocerebellar ataxia of type 3. Taken together, our results demonstrate that only one of the two structures is compatible with the experimental information. Based on the high precision of our refined structure, we show that Josephin contains an open cleft which could be directly implicated in the interaction with polyubiquitin chains and other partners.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Unifying View of Wiener and Volterra Theory and Polynomial Kernel Regression

Franz, M., Schölkopf, B.

Neural Computation, 18(12):3097-3118, December 2006 (article)

Abstract
Volterra and Wiener series are perhaps the best understood nonlinear system representations in signal processing. Although both approaches have enjoyed a certain popularity in the past, their application has been limited to rather low-dimensional and weakly nonlinear systems due to the exponential growth of the number of terms that have to be estimated. We show that Volterra and Wiener series can be represented implicitly as elements of a reproducing kernel Hilbert space by utilizing polynomial kernels. The estimation complexity of the implicit representation is linear in the input dimensionality and independent of the degree of nonlinearity. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Minimal Logical Constraint Covering Sets

Sinz, F., Schölkopf, B.

(155), Max Planck Institute for Biological Cybernetics, Tübingen, December 2006 (techreport)

Abstract
We propose a general framework for computing minimal set covers under class of certain logical constraints. The underlying idea is to transform the problem into a mathematical programm under linear constraints. In this sense it can be seen as a natural extension of the vector quantization algorithm proposed by Tipping and Schoelkopf. We show which class of logical constraints can be cast and relaxed into linear constraints and give an algorithm for the transformation.

ei

PDF [BibTex]

PDF [BibTex]


no image
Learning Optimal EEG Features Across Time, Frequency and Space

Farquhar, J., Hill, J., Schölkopf, B.

NIPS Workshop on Current Trends in Brain-Computer Interfacing, December 2006 (talk)

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Probabilistic inference for solving (PO)MDPs

Toussaint, M., Harmeling, S., Storkey, A.

(934), School of Informatics, University of Edinburgh, December 2006 (techreport)

ei

PDF [BibTex]

PDF [BibTex]


no image
Acquiring web page information without commitment to downloading the web page

Heilbron, L., Platt, J. C., Schölkopf, B., Simard, P. Y.

United States Patent, No 7155489, December 2006 (patent)

ei

[BibTex]

[BibTex]


no image
Induced Master Motion in Force-Reflecting Teleoperation

Kuchenbecker, K. J., Niemeyer, G.

ASME Journal of Dynamic Systems, Measurement, and Control, 128(4):800-810, December 2006 (article)

hi

[BibTex]

[BibTex]


no image
Semi-Supervised Learning

Zien, A.

Advanced Methods in Sequence Analysis Lectures, November 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
3DString: a feature string kernel for 3D object classification on voxelized data

Assfalg, J., Borgwardt, KM., Kriegel, H-P.

In pages: 198-207, (Editors: Yu, P.S. , V.J. Tsotras, E.A. Fox, B. Liu), ACM Press, New York, NY, USA, 15th ACM International Conference on Information and Knowledge Management (CIKM), November 2006 (inproceedings)

Abstract
Classification of 3D objects remains an important task in many areas of data management such as engineering, medicine or biology. As a common preprocessing step in current approaches to classification of voxelized 3D objects, voxel representations are transformed into a feature vector description.In this article, we introduce an approach of transforming 3D objects into feature strings which represent the distribution of voxels over the voxel grid. Attractively, this feature string extraction can be performed in linear runtime with respect to the number of voxels. We define a similarity measure on these feature strings that counts common k-mers in two input strings, which is referred to as the spectrum kernel in the field of kernel methods. We prove that on our feature strings, this similarity measure can be computed in time linear to the number of different characters in these strings. This linear runtime behavior makes our kernel attractive even for large datasets that occur in many application domains. Furthermore, we explain that our similarity measure induces a metric which allows to combine it with an M-tree for handling of large volumes of data. Classification experiments on two published benchmark datasets show that our novel approach is competitive with the best state-of-the-art methods for 3D object classification.

ei

DOI [BibTex]

DOI [BibTex]


no image
Prediction of Protein Function from Networks

Shin, H., Tsuda, K.

In Semi-Supervised Learning, pages: 361-376, Adaptive Computation and Machine Learning, (Editors: Chapelle, O. , B. Schölkopf, A. Zien), MIT Press, Cambridge, MA, USA, November 2006 (inbook)

Abstract
In computational biology, it is common to represent domain knowledge using graphs. Frequently there exist multiple graphs for the same set of nodes, representing information from different sources, and no single graph is sufficient to predict class labels of unlabelled nodes reliably. One way to enhance reliability is to integrate multiple graphs, since individual graphs are partly independent and partly complementary to each other for prediction. In this chapter, we describe an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple graphs are formulated into one convex optimization problem. The graph-combining method is applied to functional class prediction of yeast proteins.When compared with individual graphs, the combined graph with optimized weights performs significantly better than any single graph.When compared with the semidefinite programming-based support vector machine (SDP/SVM), it shows comparable accuracy in a remarkably short time. Compared with a combined graph with equal-valued weights, our method could select important graphs without loss of accuracy, which implies the desirable property of integration with selectivity.

ei

Web [BibTex]

Web [BibTex]


no image
Adapting Spatial Filter Methods for Nonstationary BCIs

Tomioka, R., Hill, J., Blankertz, B., Aihara, K.

In IBIS 2006, pages: 65-70, 2006 Workshop on Information-Based Induction Sciences, November 2006 (inproceedings)

Abstract
A major challenge in applying machine learning methods to Brain-Computer Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock the method is trained on and that the method is applied to. Assuming the joint distributions of the whitened signal and the class label to be identical in two blocks, where the whitening is done in each block independently, we propose a simple adaptation formula that is applicable to a broad class of spatial filtering methods including ICA, CSP, and logistic regression classifiers. We characterize the class of linear transformations for which the above assumption holds. Experimental results on 60 BCI datasets show improved classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and (b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

ei

PDF [BibTex]

PDF [BibTex]


no image
Discrete Regularization

Zhou, D., Schölkopf, B.

In Semi-supervised Learning, pages: 237-250, Adaptive computation and machine learning, (Editors: O Chapelle and B Schölkopf and A Zien), MIT Press, Cambridge, MA, USA, November 2006 (inbook)

Abstract
Many real-world machine learning problems are situated on finite discrete sets, including dimensionality reduction, clustering, and transductive inference. A variety of approaches for learning from finite sets has been proposed from different motivations and for different problems. In most of those approaches, a finite set is modeled as a graph, in which the edges encode pairwise relationships among the objects in the set. Consequently many concepts and methods from graph theory are adopted. In particular, the graph Laplacian is widely used. In this chapter we present a systemic framework for learning from a finite set represented as a graph. We develop discrete analogues of a number of differential operators, and then construct a discrete analogue of classical regularization theory based on those discrete differential operators. The graph Laplacian based approaches are special cases of this general discrete regularization framework. An important thing implied in this framework is that we have a wide choices of regularization on graph in addition to the widely-used graph Laplacian based one.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
New Methods for the P300 Visual Speller

Biessmann, F.

(1), (Editors: Hill, J. ), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2006 (techreport)

ei

PDF [BibTex]

PDF [BibTex]


no image
Statistical Analysis of Slow Crack Growth Experiments

Pfingsten, T., Glien, K.

Journal of the European Ceramic Society, 26(15):3061-3065, November 2006 (article)

Abstract
A common approach for the determination of Slow Crack Growth (SCG) parameters are the static and dynamic loading method. Since materials with small Weibull module show a large variability in strength, a correct statistical analysis of the data is indispensable. In this work we propose the use of the Maximum Likelihood method and a Baysian analysis, which, in contrast to the standard procedures, take into account that failure strengths are Weibull distributed. The analysis provides estimates for the SCG parameters, the Weibull module, and the corresponding confidence intervals and overcomes the necessity of manual differentiation between inert and fatigue strength data. We compare the methods to a Least Squares approach, which can be considered the standard procedure. The results for dynamic loading data from the glass sealing of MEMS devices show that the assumptions inherent to the standard approach lead to significantly different estimates.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
An Improved Adaptive Power Line Interference Canceller for Electrocardiography

Martens, SMM., Mischi, M., Oei, SG., Bergmans, JWM.

IEEE Transactions on Biomedical Engineering, 53(11):2220-2231, November 2006 (article)

Abstract
Power line interference may severely corrupt a biomedical recording. Notch filters and adaptive cancellers have been suggested to suppress this interference. We propose an improved adaptive canceller for the reduction of the fundamental power line interference component and harmonics in electrocardiogram (ECG) recordings. The method tracks the amplitude, phase, and frequency of all the interference components for power line frequency deviations up to about 4 Hz. A comparison is made between the performance of our method, former adaptive cancellers, and a narrow and a wide notch filter in suppressing the fundamental power line interference component. For this purpose a real ECG signal is corrupted by an artificial power line interference signal. The cleaned signal after applying all methods is compared with the original ECG signal. Our improved adaptive canceller shows a signal-to-power-line-interference ratio for the fundamental component up to 30 dB higher than that produced by the other methods. Moreover, our method is also effective for the suppression of the harmonics of the power line interference.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Donagi-Markman cubic for Hitchin systems

Balduzzi, D.

Mathematical Research Letters, 13(6):923-933, November 2006 (article)

Abstract
The Donagi-Markman cubic is the differential of the period map for algebraic completely integrable systems. Here we prove a formula for the cubic in the case of Hitchin’s system for arbitrary semisimple g. This was originally stated (without proof) by Pantev for sln.

ei

Web [BibTex]

Web [BibTex]


no image
Optimizing Spatial Filters for BCI: Margin- and Evidence-Maximization Approaches

Farquhar, J., Hill, N., Schölkopf, B.

Challenging Brain-Computer Interfaces: MAIA Workshop 2006, pages: 1, November 2006 (poster)

Abstract
We present easy-to-use alternatives to the often-used two-stage Common Spatial Pattern + classifier approach for spatial filtering and classification of Event-Related Desychnronization signals in BCI. We report two algorithms that aim to optimize the spatial filters according to a criterion more directly related to the ability of the algorithms to generalize to unseen data. Both are based upon the idea of treating the spatial filter coefficients as hyperparameters of a kernel or covariance function. We then optimize these hyper-parameters directly along side the normal classifier parameters with respect to our chosen learning objective function. The two objectives considered are margin maximization as used in Support-Vector Machines and the evidence maximization framework used in Gaussian Processes. Our experiments assessed generalization error as a function of the number of training points used, on 9 BCI competition data sets and 5 offline motor imagery data sets measured in Tubingen. Both our approaches sho w consistent improvements relative to the commonly used CSP+linear classifier combination. Strikingly, the improvement is most significant in the higher noise cases, when either few trails are used for training, or with the most poorly performing subjects. This a reversal of the usual "rich get richer" effect in the development of CSP extensions, which tend to perform best when the signal is strong enough to accurately find their additional parameters. This makes our approach particularly suitable for clinical application where high levels of noise are to be expected.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
A Machine Learning Approach for Determining the PET Attenuation Map from Magnetic Resonance Images

Hofmann, M., Steinke, F., Judenhofer, M., Claussen, C., Schölkopf, B., Pichler, B.

IEEE Medical Imaging Conference, November 2006 (talk)

Abstract
A promising new combination in multimodality imaging is MR-PET, where the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET) are combined. Although many technical problems have recently been solved, it is still an open problem to determine the attenuation map from the available MR scan, as the MR intensities are not directly related to the attenuation values. One standard approach is an atlas registration where the atlas MR image is aligned with the patient MR thus also yielding an attenuation image for the patient. We also propose another approach, which to our knowledge has not been tried before: Using Support Vector Machines we predict the attenuation value directly from the local image information. We train this well-established machine learning algorithm using small image patches. Although both approaches sometimes yielded acceptable results, they also showed their specific shortcomings: The registration often fails with large deformations whereas the prediction approach is problematic when the local image structure is not characteristic enough. However, the failures often do not coincide and integration of both information sources is promising. We therefore developed a combination method extending Support Vector Machines to use not only local image structure but also atlas registered coordinates. We demonstrate the strength of this combination approach on a number of examples.

ei

[BibTex]

[BibTex]


no image
Mining frequent stem patterns from unaligned RNA sequences

Hamada, M., Tsuda, K., Kudo, T., Kin, T., Asai, K.

Bioinformatics, 22(20):2480-2487, October 2006 (article)

Abstract
Motivation: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. Results: Our method RNAmine employs a graph theoretic representation of RNA sequences, and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Geometric Analysis of Hilbert Schmidt Independence criterion based ICA contrast function

Shen, H., Jegelka, S., Gretton, A.

(PA006080), National ICT Australia, Canberra, Australia, October 2006 (techreport)

ei

Web [BibTex]

Web [BibTex]


no image
Large-Scale Gene Expression Profiling Reveals Major Pathogenetic Pathways of Cartilage Degeneration in Osteoarthritis

Aigner, T., Fundel, K., Saas, J., Gebhard, P., Haag, J., Weiss, T., Zien, A., Obermayr, F., Zimmer, R., Bartnik, E.

Arthritis and Rheumatism, 54(11):3533-3544, October 2006 (article)

Abstract
Objective. Despite many research efforts in recent decades, the major pathogenetic mechanisms of osteo- arthritis (OA), including gene alterations occurring during OA cartilage degeneration, are poorly under- stood, and there is no disease-modifying treatment approach. The present study was therefore initiated in order to identify differentially expressed disease-related genes and potential therapeutic targets. Methods. This investigation consisted of a large gene expression profiling study performed based on 78 normal and disease samples, using a custom-made complementar y DNA array covering >4,000 genes. Results. Many differentially expressed genes were identified, including the expected up-regulation of ana- bolic and catabolic matrix genes. In particular, the down-regulation of important oxidative defense genes, i.e., the genes for superoxide dismutases 2 and 3 and glutathione peroxidase 3, was prominent. This indicates that continuous oxidative stress to the cells and the matrix is one major underlying pathogenetic mecha- nism in OA. Also, genes that are involved in the phenot ypic stabilit y of cells, a feature that is greatly reduced in OA cartilage, appeared to be suppressed. Conclusion. Our findings provide a reference data set on gene alterations in OA cartilage and, importantly, indicate major mechanisms underlying central cell bio- logic alterations that occur during the OA disease process. These results identify molecular targets that can be further investigated in the search for therapeutic interventions.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Interactive images

Schölkopf, B., Toyama, K., Uyttendaele, M.

United States Patent, No 7120293, October 2006 (patent)

ei

[BibTex]

[BibTex]


no image
Semi-Supervised Support Vector Machines and Application to Spam Filtering

Zien, A.

ECML Discovery Challenge Workshop, September 2006 (talk)

Abstract
After introducing the semi-supervised support vector machine (aka TSVM for "transductive SVM"), a few popular training strategies are briefly presented. Then the assumptions underlying semi-supervised learning are reviewed. Finally, two modern TSVM optimization techniques are applied to the spam filtering data sets of the workshop; it is shown that they can achieve excellent results, if the problem of the data being non-iid can be handled properly.

ei

PDF Web [BibTex]


no image
A Linear Programming Approach for Molecular QSAR analysis

Saigo, H., Kadowaki, T., Tsuda, K.

In MLG 2006, pages: 85-96, (Editors: Gärtner, T. , G. C. Garriga, T. Meinl), International Workshop on Mining and Learning with Graphs, September 2006, Best Paper Award (inproceedings)

Abstract
Small molecules in chemistry can be represented as graphs. In a quantitative structure-activity relationship (QSAR) analysis, the central task is to find a regression function that predicts the activity of the molecule in high accuracy. Setting a QSAR as a primal target, we propose a new linear programming approach to the graph-based regression problem. Our method extends the graph classification algorithm by Kudo et al. (NIPS 2004), which is a combination of boosting and graph mining. Instead of sequential multiplicative updates, we employ the linear programming boosting (LP) for regression. The LP approach allows to include inequality constraints for the parameter vector, which turns out to be particularly useful in QSAR tasks where activity values are sometimes unavailable. Furthermore, the efficiency is improved significantly by employing multiple pricing.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Incremental Aspect Models for Mining Document Streams

Surendran, A., Sra, S.

In PKDD 2006, pages: 633-640, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
In this paper we introduce a novel approach for incrementally building aspect models, and use it to dynamically discover underlying themes from document streams. Using the new approach we present an application which we call “query-line tracking” i.e., we automatically discover and summarize different themes or stories that appear over time, and that relate to a particular query. We present evaluation on news corpora to demonstrate the strength of our method for both query-line tracking, online indexing and clustering.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

Sensory Coding And The Natural Environment, 2006, pages: 1, September 2006 (poster)

Abstract
The human visual system samples images through saccadic eye movements which rapidly change the point of fixation. Although the selection of eye movement targets depends on numerous top-down mechanisms, a number of recent studies have shown that low-level image features such as local contrast or edges play an important role. These studies typically used predefined image features which were afterwards experimentally verified. Here, we follow a complementary approach: instead of testing a set of candidate image features, we infer these hypotheses from the data, using methods from statistical learning. To this end, we train a non-linear classifier on fixated vs. randomly selected image patches without making any physiological assumptions. The resulting classifier can be essentially characterized by a nonlinear combination of two center-surround receptive fields. We find that the prediction performance of this simple model on our eye movement data is indistinguishable from the physiologically motivated model of Itti & Koch (2000) which is far more complex. In particular, we obtain a comparable performance without using any multi-scale representations, long-range interactions or oriented image features.

ei

Web [BibTex]

Web [BibTex]


no image
Extraction of visual features from natural video data using Slow Feature Analysis

Nickisch, H.

Biologische Kybernetik, Technische Universität Berlin, Berlin, Germany, September 2006 (diplomathesis)

Abstract
Das Forschungsprojekt NeuRoBot hat das un{\"u}berwachte Erlernen einer neuronal inspirierten Steuerungsarchitektur zum Ziel, und zwar unter den Randbedingungen biologischer Plausibilit{\"a}t und der Benutzung einer Kamera als einzigen Sensor. Visuelle Merkmale, die ein angemessenes Abbild der Umgebung liefern, sind unerl{\"a}sslich, um das Ziel kollisionsfreier Navigation zu erreichen. Zeitliche Koh{\"a}renz ist ein neues Lernprinzip, das in der Lage ist, Erkenntnisse aus der Biologie des Sehens zu reproduzieren. Es wird durch die Beobachtung motiviert, dass die “Sensoren” der Retina auf deutlich k{\"u}rzeren Zeitskalen variieren als eine abstrakte Beschreibung. Zeitliche Langsamkeitsanalyse l{\"o}st das Problem, indem sie zeitlich langsam ver{\"a}nderliche Signale aus schnell ver{\"a}nderlichen Eingabesignalen extrahiert. Eine Verallgemeinerung auf Signale, die nichtlinear von den Eingaben abh{\"a}ngen, ist durch die Anwendung des Kernel-Tricks m{\"o}glich. Das einzig benutzte Vorwissen ist die zeitliche Glattheit der gewonnenen Signale. In der vorliegenden Diplomarbeit wird Langsamkeitsanalyse auf Bildausschnitte von Videos einer Roboterkamera und einer Simulationsumgebung angewendet. Zuallererst werden mittels Parameterexploration und Kreuzvalidierung die langsamst m{\"o}glichen Funktionen bestimmt. Anschließend werden die Merkmalsfunktionen analysiert und einige Ansatzpunkte f{\"u}r ihre Interpretation angegeben. Aufgrund der sehr großen Datens{\"a}tze und der umfangreichen Berechnungen behandelt ein Großteil dieser Arbeit auch Aufwandsbetrachtungen und Fragen der effizienten Berechnung. Kantendetektoren in verschiedenen Phasen und mit haupts{\"a}chlich horizontaler Orientierung stellen die wichtigsten aus der Analyse hervorgehenden Funktionen dar. Eine Anwendung auf konkrete Navigationsaufgaben des Roboters konnte bisher nicht erreicht werden. Eine visuelle Interpretation der erlernten Merkmale ist jedoch durchaus gegeben.

ei

PDF [BibTex]

PDF [BibTex]


no image
Implicit Surface Modelling with a Globally Regularised Basis of Compact Support

Walder, C., Schölkopf, B., Chapelle, O.

Computer Graphics Forum, 25(3):635-644, September 2006 (article)

Abstract
We consider the problem of constructing a globally smooth analytic function that represents a surface implicitly by way of its zero set, given sample points with surface normal vectors. The contributions of the paper include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable interpolation properties previously only associated with fully supported bases. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem lying at the core of kernel-based machine learning methods. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data and four-dimensional interpolation between three-dimensional shapes.

ei

PDF GZIP DOI [BibTex]


no image
PALMA: Perfect Alignments using Large Margin Algorithms

Rätsch, G., Hepp, B., Schulze, U., Ong, C.

In GCB 2006, pages: 104-113, (Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics, September 2006 (inproceedings)

Abstract
Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Based Semi-Supervised Learning with Sharper Edges

Shin, H., Hill, N., Rätsch, G.

In ECML 2006, pages: 401-412, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning (ECML), September 2006 (inproceedings)

Abstract
In many graph-based semi-supervised learning algorithms, edge weights are assumed to be fixed and determined by the data points‘ (often symmetric)relationships in input space, without considering directionality. However, relationships may be more informative in one direction (e.g. from labelled to unlabelled) than in the reverse direction, and some relationships (e.g. strong weights between oppositely labelled points) are unhelpful in either direction. Undesirable edges may reduce the amount of influence an informative point can propagate to its neighbours -- the point and its outgoing edges have been ``blunted.‘‘ We present an approach to ``sharpening‘‘ in which weights are adjusted to meet an optimization criterion wherever they are directed towards labelled points. This principle can be applied to a wide variety of algorithms. In the current paper, we present one ad hoc solution satisfying the principle, in order to show that it can improve performance on a number of publicly available benchmark data sets.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Robust MEG Source Localization of Event Related Potentials: Identifying Relevant Sources by Non-Gaussianity

Breun, P., Grosse-Wentrup, M., Utschick, W., Buss, M.

In DAGM 2006, pages: 394-403, (Editors: Franke, K. , K.-R. Müller, B. Nickolay, R. Schäfer), Springer, Berlin, Germany, 28th Annual Symposium of the German Association for Pattern Recognition, September 2006 (inproceedings)

Abstract
Independent Component Analysis (ICA) is a frequently used preprocessing step in source localization of MEG and EEG data. By decomposing the measured data into maximally independent components (ICs), estimates of the time course and the topographies of neural sources are obtained. In this paper, we show that when using estimated source topographies for localization, correlations between neural sources introduce an error into the obtained source locations. This error can be avoided by reprojecting ICs onto the observation space, but requires the identification of relevant ICs. For Event Related Potentials (ERPs), we identify relevant ICs by estimating their non-Gaussianity. The efficacy of the approach is tested on auditory evoked potentials (AEPs) recorded by MEG. It is shown that ten trials are sufficient for reconstructing all important characteristics of the AEP, and source localization of the reconstructed ERP yields the same focus of activity as the average of 250 trials.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Semi-Supervised Learning

Chapelle, O., Schölkopf, B., Zien, A.

pages: 508, Adaptive computation and machine learning, MIT Press, Cambridge, MA, USA, September 2006 (book)

Abstract
In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research. Semi-Supervised Learning first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms that perform two-step learning. The book then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of extensive benchmark experiments. Finally, the book looks at interesting directions for SSL research. The book closes with a discussion of the relationship between semi-supervised learning and transduction.

ei

Web [BibTex]

Web [BibTex]


no image
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

Deisenroth, MP., Ohtsuka, T., Weissel, F., Brunn, D., Hanebeck, UD.

In MFI 2006, pages: 371-376, (Editors: Hanebeck, U. D.), IEEE Service Center, Piscataway, NJ, USA, 6th IEEE International Conference on Multisensor Fusion and Integration, September 2006 (inproceedings)

Abstract
In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Uniform Convergence of Adaptive Graph-Based Regularization

Hein, M.

In COLT 2006, pages: 50-64, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
The regularization functional induced by the graph Laplacian of a random neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying manifold structure and second to the density of the data-generating probability measure. We identify in this paper the limit of the regularizer and show uniform convergence over the space of Hoelder functions. As an intermediate step we derive upper bounds on the covering numbers of Hoelder functions on compact Riemannian manifolds, which are of independent interest for the theoretical analysis of manifold-based learning methods.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Efficient Large Scale Linear Programming Support Vector Machines

Sra, S.

In ECML 2006, pages: 767-774, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning, September 2006 (inproceedings)

Abstract
This paper presents a decomposition method for efficiently constructing ℓ1-norm Support Vector Machines (SVMs). The decomposition algorithm introduced in this paper possesses many desirable properties. For example, it is provably convergent, scales well to large datasets, is easy to implement, and can be extended to handle support vector regression and other SVM variants. We demonstrate the efficiency of our algorithm by training on (dense) synthetic datasets of sizes up to 20 million points (in ℝ32). The results show our algorithm to be several orders of magnitude faster than a previously published method for the same task. We also present experimental results on real data sets—our method is seen to be not only very fast, but also highly competitive against the leading SVM implementations.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Regularised CSP for Sensor Selection in BCI

Farquhar, J., Hill, N., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 14-15, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Time-Dependent Demixing of Task-Relevant EEG Signals

Hill, N., Farquhar, J., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 20-21, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
Given a spatial filtering algorithm that has allowed us to identify task-relevant EEG sources, we present a simple approach for monitoring the activity of these sources while remaining relatively robust to changes in other (task-irrelevant) brain activity. The idea is to keep spatial *patterns* fixed rather than spatial filters, when transferring from training to test sessions or from one time window to another. We show that a fixed spatial pattern (FSP) approach, using a moving-window estimate of signal covariances, can be more robust to non-stationarity than a fixed spatial filter (FSF) approach.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Inferential Structure Determination: Probabilistic determination and validation of NMR structures

Habeck, M.

Gordon Research Conference on Computational Aspects of Biomolecular NMR, September 2006 (talk)

ei

Web [BibTex]

Web [BibTex]