Header logo is

An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization


Technical Report


Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. We propose an elegant and fully automated approach to building a prediction system for protein subcellular localization. We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We further propose a multiclass support vector machine method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we generalize our method to optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets.

Author(s): Zien, A. and Ong, CS.
Number (issue): 146
Year: 2006
Month: April
Day: 0

Department(s): Empirical Inference
Bibtex Type: Technical Report (techreport)

Institution: Max Planck Institute for Biological Cybernetics, Tübingen

Digital: 0
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PDF


  title = {An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization},
  author = {Zien, A. and Ong, CS.},
  number = {146},
  organization = {Max-Planck-Gesellschaft},
  institution = {Max Planck Institute for Biological Cybernetics, Tübingen},
  school = {Biologische Kybernetik},
  month = apr,
  year = {2006},
  month_numeric = {4}