logo

Home

Department of Machine Learning



Transduction and Semi-Supervised Learning
Transduction is a new learning principle which combines induction and deduction in a single step, and is related to the field of semi-supervised learning where one uses unlabeled data during learning. By eliminating the need to construct an accurate model, transduction provides opportunities to achieve greater accuracy, as has been demonstrated in text analysis and bio-informatics applications.


Structured Output Learning
We study learning problems where the predictions are structured objects rather than vectors, for example parse trees or strings.


Universum-based Learning
We study a new framework introduced by (Vapnik 1998) that is an alternative capacity concept to the large margin approach of SVMs. In this setting, one is given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand.


Online Learning
Online learning gives the promise of dealing with very large datasets because data can be streamed off of a disk or another source, and the entire dataset does not have to be held in memory. We investigate fast online algorithms that achieve good generalization ability after only one pass of the data.


Large Scale Transduction
Transduction and semi-supervised learning methods can help improve generalization ability in learning problems through the use of the test labels, or unlabeled data, during learning. However, many algorithms are unfeasibly slow. We investigate how to make large scale algorithms in this domain.


Face Detection
We investigate algorithms for detecting human face and headpose, eyes, and head pose. We focus on a neural network based architecture.


Machine Translation
Machine translation is the problem of converting from one human language to another, typically at the sentence level. We focus on an end-to-end machine learning approach, which is an instance of structured output learning.


Semantic Extraction
Semantic extraction is the task of extracting semantic information from a document that is in a human-readable format. We are investigating the ability of machine learning algorithms to extract such semantic information in the form of semantic tags.


Mass Spectroscopy Analysis
Mass spectrometry, a core technology in the field of proteomics, is commonly used in a high-throughput fashion to identify proteins in a mixture. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of given spectra.


Protein Classification and Ranking
Machine learning algorithms can be used to solve the problem of classifying proteins into superfamilies and folds from sequence data, or returning a ranked list of sequences that are likely to be evolutionarily related to a query sequence. This is of interest because two sequences that are descended from a common ancestral sequence are likely to fill similar functional roles in the cell.


Torch
Torch 5 provides a matlab-like environment for state-of-the-art machine learning algorithms. It is easy to use and provides a very efficient implementation, thanks to a easy and fast scripting language (Lua) and a underlying C implementation.


Spider
Matlab Toolbox for Kernel Methods: The Spider. We are designing and developing a matlab toolbox for kernel methods. The goals of this project are: to build a general purpose kernel methods library including different induction principles such as (but not limited to) online/batch learning, active learning, etc., and to build a platform with different datasets and a code repository where researchers could exchange results and reproduce experiments.The spider is intended to be a complete object orientated environment for machine learning in Matlab. ( Project home page )


UniverSVM
UniverSVM : A SVM Implementation for Large Scale Transduction and Inference with a Universum The UniverSVM is a SVM implementation written in C++. Its functionality comprises large scale transduction (as described in Large Scale Transductive SVMs), sparse solutions (as described in Trading Convexity for Scalability) and inference with a universum (as described in Inference with the Universum).


LaSVM
LaSVM is an online SVM algorithm based on a single pass through the data. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. It can also use active example selection to yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels. Main Project Page


SENNA
SENNA is a fast neural-network architecture for semantic extraction from text.


SVM-FOLD
SVM-FOLD is a web server that makes predictions of family, superfamily and fold level classifications of proteins based on the Structural Classification of Proteins (SCOP) hierarchy using the SVM learning algorithm.


RankProp
RankProp is a ranking algorithm that exploits the entire network structure of similarity relationships among proteins in a sequence database by performing a diffusion operation on a pre-computed, weighted network. The resulting ranking algorithm, evaluated using a human-curated database of protein structures, is efficient and provides significantly better rankings than a local network search algorithm such as PSI-BLAST.



NEC Laboratories America, Inc.
Princeton Campus - 4 Independence Way, Suite 200, Princeton NJ 08540   |    Cupertino Campus - 10080 North Wolfe Road, Suite SW3-350, Cupertino, CA 95014
webmaster@nec-labs.com   ©2008 NEC Laboratories America, Inc. All rights reserved. Please Read our Privacy Policy

Website design by Dragonfly Interactive, LLC