Bandit Learning is a type of machine learning algorithm based on the multi-armed bandit problem, where an agent must choose among different options to maximize reward with limited information. At NECLA, this technique can be used to optimize dynamic decision-making in uncertain environments like energy markets, sensor scheduling, or distributed resource allocation. Bandit learning balances exploration (trying new strategies) and exploitation (using known best strategies) to adaptively improve performance in real-time systems.

Posts

Decentralized Transactive Energy Auctions with Bandit Learning

The power systems worldwide have been embracing the rapid growth of distributed energy resources. Commonly, distributed energy resources exist in the distribution level, such as electric vehicles, rooftop photovoltaic panels, and home battery systems, which cannot be controlled by a centralized entity like a utility. However, a large number of distributed energy resources have potential to reshape the power generation landscape when the owners (prosumers) are allowed to send electricity back to the grids. Transactive energy paradigms are emerging for orchestrating the coordination of prosumers and consumers by enabling the exchange of energy among them. In this paper, we propose a transactive energy auction framework based on blockchain technology for creating trustworthy and transparent transactive environments in distribution networks, which does not rely on a centralized entity to clear transactions. Moreover, we propose intelligent decentralized decision-making strategies by bandit learning for market participants to locally decide their energy prices in auctions. The bandit learning approach can provide market participants with more benefits under the blockchain framework than trading energy with the centralized entity, which is further supported by the preliminary simulated results conducted over our blockchain-based platform.