To this end, I am currently working on Project Vasudha, where my contributions delve into formulating scenarios with different customer-side as well as producer-side partners of Microsoft to employ these algorithms in a cost effective way. For example:

  • Consumer Centric Scenarios:
    • SSE Airtricity, Ireland: Formulated the scenario of reducing carbon footprints and maximizing profits of the consumers in Ireland by scheduling batteries to charge and discharge at correct time periods. Built a gym-compatible environment and used different stochastic (Simulated Annealing), traditional (Mixed Integer Linear Programming/Model Predictive Control) and reinforcement learning (Deep Q-Network (DQN) and its variants / proximal policy optimization (PPO) / offline RL: Conservative Q-learning (CQL)) techniques to address the problem.
    • Schneider Electric: Formulated the micro-grid scenario of profit maximization where the microgrid (here, a smart building) consists of consumer, solar, battery storage. The microgrid is connected to a utility grid to receive electricity, where the consumer tries to minimize power usage from the main utility so as to pay less electricity bills. Designed the battery controller as a part of the EnCortex Decision Management Framework where we take the different abstractions provided by EnCortex, feeding the relevant battery and global constraints to the gym-compatible environment, and solving the objective using the algorithms mentioned above. Tried out behavior cloning (expert RL agents to support lack of data) and pre-training /fine tuning models on newer battery configurations and newer grids so as to leverage already present models for future with lesser training times.
  • Producer Centric Scenarios:
  • Ayana Renewables /Vestas/ constellation (24/7) Optimization: Developed the MILP formulation for the scenario, where the producer has solar and wind farms and they want to participate in DAM market, provided they have an RTC contract with a consumer. The producer tries to maximize the actual profit earned collectively from the contract and the market, by adhering to the deviation settlement mechanism (DSM) penalties. Also contributed to correcting the RL formulation (DDPG and TD3 -> for the continuous action space) used for the volume allocation and modeled simple market bidding strategies that are in practice in the real-world.