Digital Pathology
Accurate and fast pathological diagnosis is crucial to prevention, early detection and early treatment in defeating cancer. NEC is developing a cancer diagnosis assistance system using digital image processing technologies and machine learning algorithms. With this system, NEC aims to help improve the administrative efficiency and diagnostic quality of pathological diagnosis
full story Biological Network Learning to Promote Lipid Production
Sustainable production of renewable energy is hotly debated, and biofuels are an attempt at diminishing our dependence on fossil fuel. High lipid productivity is a key desirable characteristic of choosing species for biofuel production. We study and apply statistical network learning strategies to understand the cellular signaling / metabolic mechanisms that promote lipid production.
Unsupervised Learning of Sparse Representations
Unsupervised learning models capture the underlying structure of the data without relying on label information. We focus on sparse modeling algorithms and topic models to extract feature representations from text, image and video data. We also apply unsupervised learning models to learn hierarchical feature representation for object recognition and video action classification.
Biomedical Text Mining
Most of the biomedical discoveries are communicated through
publications, or reports. We propose a range of text mining and natural language
processing strategies to convert human language in bio-literature text into
formal computer representations for sophisticated information access.
Transduction and Semi-Supervised Learning
Transduction is a new learning principle which
combines induction and deduction in a single step, and is related
to the field of semi-supervised learning where one uses unlabeled data
during learning.
By eliminating the need to construct an accurate model,
transduction provides opportunities to achieve greater accuracy,
as has been demonstrated in text analysis and bio-informatics applications.
Structured Output Learning
We study learning problems where the predictions are structured objects rather than vectors,
for example parse trees or strings.
Universum-based Learning
We study a new framework introduced by (Vapnik 1998)
that is an alternative capacity concept to the large margin approach of SVMs.
In this setting, one is given a set of
labeled examples, and a collection of "non-examples" that do not belong
to either class of interest. This collection, called the Universum,
allows one to encode prior knowledge by representing meaningful
concepts in the same domain as the problem at hand.
SVM+
SVM+ is a new approach to use hidden information within the learning framework.
Example data for download:
- MNIST with privileged information.
Parallel Computation in Learning
We explore algorithms for implementing large scale learning algorithms
as parallel computation.
We are currently developing parallelization approaches for increasing the
ability of SVM (Support Vector Machines) to solve large-scale
problems. As target systems, we consider shared memory processors,
clusters of processors, vector processors, and SIMD (Single
Instruction Multiple Data) processors. On a given system the speed of
an SVM is limited by the compute performance of the processor as well
as by the size of the memory. Efficient parallelizations have to
overcome both of these limitations while not getting bogged down in
communication overhead.
Online Learning
Online learning gives the promise of dealing with very
large datasets because data can be streamed off of a
disk or another source, and the entire dataset does not
have to be held in memory. We investigate fast online
algorithms that achieve good generalization ability
after only one pass of the data.
Large Scale Transduction
Transduction and semi-supervised learning methods can help
improve generalization ability in learning problems through
the use of the test labels, or unlabeled data, during learning.
However, many
algorithms are unfeasibly slow. We investigate how to make
large scale algorithms in this domain.
Semantic Extraction
Semantic extraction is the task of extracting semantic information from a
document that is in a human-readable format. We are investigating the ability of machine learning algorithms to extract such
semantic information in the form of semantic tags.
Mass Spectroscopy Analysis
Mass spectrometry, a core technology in the field of proteomics, is commonly
used in a high-throughput fashion to identify proteins in a mixture. Currently,
the primary bottleneck in this type of experiment is computational. Existing algorithms
for interpreting mass spectra are slow and fail to identify a large proportion of given spectra.
Protein Classification and Ranking
Machine learning algorithms can be used to solve the problem of
classifying proteins into superfamilies and folds from sequence data,
or returning a ranked list of sequences that are likely to be
evolutionarily related to a query sequence. This is of interest because
two sequences that are descended from a common ancestral sequence are likely
to fill similar functional roles in the cell.