Machine LearningOur Machine Learning team has been at the forefront of machine learning developments, including deep learning, support vector machines, and semantic analysis, for over a decade. We develop innovative technologies integrated into NEC’s products and services. Machine learning is the critical technology for data analytics and artificial intelligence. Recent progress in this field opens opportunities for various new applications.

Deep learning will maintain prominence with more robust model architectures, training methods, and optimization techniques. Enhanced interpretability and explainability will be imperative, especially for AI systems in critical domains like healthcare and finance. Addressing bias and ensuring fairness in AI algorithms will be a top priority, leading to the development of tools and guidelines for ethical AI. Federated learning, quantum computing’s potential impact, and the growth of edge computing will diversify ML applications.

Natural language processing will continue to advance, driving progress in conversational AI, while healthcare, finance, education, and creative industries will witness profound AI integration. As quantum computing matures, it could revolutionize machine learning, while edge computing and federated learning will expand AI’s reach across various domains. Our machine learning research will produce innovation across industries, including more accurate medical diagnoses, safer autonomous systems, and efficient energy use while enabling personalized education and AI-generated creativity.

Read our news and publications from our world-class team of researchers from our Machine Learning department.

Posts

Prediction of Non-Muscle Invasive Bladder Cancer Recurrence using Machine Learning of Quantitative Nuclear Features

Non-muscle invasive bladder cancer (NMIBC) generally has a good prognosis, however, recurrence after transurethral resection (TUR), the standard primary treatment, is a major problem. Clinical management after TUR has been based on risk classification using clinicopathological factors, but these classifications are not complete. In this study, we attempted to predict early recurrence of NMIBC based on machine learning of quantitative morphological features. In general, structural, cellular, and nuclear atypia are evaluated to determine cancer atypia. However, since it is difficult to accurately quantify structural atypia from TUR specimens, in this study, we used only nuclear atypia and analyzed it using feature extraction followed by classification using Support Vector Machine and Random Forest machine learning algorithms. For the analysis, 125 patients diagnosed with NMIBC were used, data from 95 patients were randomly selected for the training set, and data from 30 patients were randomly selected for the test set. The results showed that the support vector machine-based model predicted recurrence within 2 years after TUR with a probability of 90% and the random forest-based model with probability of 86.7%. In the future, the system can be used to objectively predict NMIBC recurrence after TUR.

Dual Projection Generative Adversarial Networks for Conditional Image Generation

Conditional Generative Adversarial Networks (cGANs) extend the standard unconditional GAN framework to learning joint data-label distributions from samples, and have been established as powerful generative models capable of generating high-fidelity imagery. A challenge of training such a model lies in properly infusing class information into its generator and discriminator. For the discriminator, class conditioning can be achieved by either (1) directly incorporating labels as input or (2) involving labels in an auxiliary classification loss. In this paper, we show that the former directly aligns the class-conditioned fake-and-real data distributions P (image|class) (data matching), while the latter aligns data-conditioned class distributions P (class|image) (label matching). Although class separability does not directly translate to sample quality and becomes a burden if classification itself is intrinsically difficult, the discriminator cannot provide useful guidance for the generator if features of distinct classes are mapped to the same point and thus become inseparable. Motivated by this intuition, we propose a Dual Projection GAN (P2GAN) model that learns to balance between data matching and label matching. We then propose an improved cGAN model with Auxiliary Classification that directly aligns the fake and real conditionals P (class|image) by minimizing their f-divergence. Experiments on a synthetic Mixture of Gaussian (MoG) dataset and a variety of real-world datasets including CIFAR100, ImageNet, and VGGFace2 demonstrate the efficacy of our proposed models.

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation and may capture scene context using various modalities that further increases compute costs. Efficient methods such as those used for AR/VR often only use human-keypoint information but suffer from a loss of scene context that hurts accuracy. In this paper, we describe an action-localization method, KeyNet, that uses only the keypoint data for tracking and action recognition. Specifically, KeyNet introduces the use of object based keypoint information to capture context in the scene. Our method illustrates how to build a structured intermediate representation that allows modeling higher-order interactions in the scene from object and human keypoints without using any RGB information. We find that KeyNet is able to track and classify human actions at just 5 FPS. More importantly, we demonstrate that object keypoints can be modeled to recover any loss in context from using keypoint information over AVA action and Kinetics datasets.

Towards Robustness of Deep Neural Networks via Networks via Regularization

Recent studies have demonstrated the vulnerability of deep neural networks against adversarial examples. In-spired by the observation that adversarial examples often lie outside the natural image data manifold and the intrinsic dimension of image data is much smaller than its pixel space dimension, we propose to embed high-dimensional input images into a low-dimensional space and apply regularization on the embedding space to push the adversarial examples back to the manifold. The proposed framework is called Embedding Regularized Classifier (ER-Classifier), which improves the adversarial robustness of the classifier through embedding regularization. Besides improving classification accuracy against adversarial examples, the framework can be combined with detection methods to detect adversarial examples. Experimental results on several benchmark datasets show that, our proposed framework achieves good performance against strong adversarial at-tack methods.

A Silicon Photonic-Electronic Neural Network for Fiber Nonlinearity Compensation

In optical communication systems, fibre nonlinearity is the major obstacle in increasing the transmission capacity. Typically, digital signal processing techniques and hardware are used to deal with optical communication signals, but increasing speed and computational complexity create challenges for such approaches. Highly parallel, ultrafast neural networks using photonic devices have the potential to ease the requirements placed on digital signal processing circuits by processing the optical signals in the analogue domain. Here we report a silicon photonic–electronic neural network for solving fibre nonlinearity compensation in submarine optical-fibre transmission systems. Our approach uses a photonic neural network based on wavelength-division multiplexing built on a silicon photonic platform compatible with complementary metal–oxide–semiconductor technology. We show that the platform can be used to compensate for optical fibre nonlinearities and improve the quality factor of the signal in a 10,080 km submarine fibre communication system. The Q-factor improvement is comparable to that of a software-based neural network implemented on a workstation assisted with a 32-bit graphic processing unit.

Overcoming Poor Word Embeddings with Word Definitions

Modern natural language understanding models depend on pretrained subword embeddings, but applications may need to reason about words that were never or rarely seen during pretraining. We show that examples that depend critically on a rarer word are more challenging for natural language inference models. Then we explore how a model could learn to use definitions, provided in natural text, to overcome this handicap. Our model’s understanding of a definition is usually weaker than a well-modeled word embedding, but it recovers most of the performance gap from using a completely untrained word.

DECODE: A Deep-learning Framework for Condensing Enhancers and Refining Boundaries with Large-scale Functional Assays

MotivationMapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping.ResultsOur DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization.

Nonlinear Impairment Compensation using Neural Networks

Neural networks are attractive for nonlinear impairment compensation applications in communication systems. In this paper, several approaches to reduce computational complexity of the neural network-based algorithms are presented.

Vehicle Run-Off-Road Event Automatic Detection by Fiber Sensing Technology

We demonstrate a new application of fiber-optic-sensing and machine learning techniques for vehicle run-off-road events detection to enhance roadway safety and efficiency. The proposed approach achieves high accuracy in a testbed under various experimental conditions.

Disentangled Recurrent Wasserstein Auto-Encoder

Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework. However, only a few works have explored unsupervised disentangled sequential representation learning due to challenges of generating sequential data. In this paper, we propose recurrent Wasserstein Autoencoder (R-WAE), a new framework for generative modeling of sequential data. R-WAE disentangles the representation of an input sequence into static and dynamic factors (i.e., time-invariant and time-varying parts). Our theoretical analysis shows that, R-WAE minimizes an upper bound of a penalized form of the Wasserstein distance between model distribution and sequential data distribution, and simultaneously maximizes the mutual information between input data and different disentangled latent factors, respectively. This is superior to (recurrent) VAE which does not explicitly enforce mutual information maximization between input data and disentangled latent representations. When the number of actions in sequential data is available as weak supervision information, R-WAE is extended to learn a categorical latent representation of actions to improve its disentanglement. Experiments on a variety of datasets show that our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation both quantitatively and qualitatively.