Reward Biased Maximum Likelihood Estimation:
The exploration-exploitation trade-off remains a challenging issue in reinforcement learning. A novel class of model-based learning algorithms, RBMLE can be applied to various learning tasks such as Markov decision processes, stochastic bandits, linear quadratic systems, and contextual bandits. Theoretical analysis shows that RBMLE has comparable regret bounds to state-of-the-art methods. Empirical results show that RBMLE outperforms existing techniques, including Upper Confidence Bound (UCB) and Thompson Sampling.
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems,
Akshay Mete, R. Singh & P. R. Kumar , NeurIPS 2022
Reward Biased Maximum Likelihood Estimation ,
Akshay Mete, R. Singh & P. R. Kumar, CSS 2022 (Invited Paper)
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning,
Akshay Mete, R. Singh & P. R. Kumar, L4DC 2021
Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits
Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar, AAAI 2021
Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits
Xi Liu, Ping-Chun Hsieh, Yu Heng Hung, A. Bhattacharya, P. R. Kumar, ICML 2020.
Safe and Multi-Objective Reinforcement Learning:
We solve an open problem in safe RL about designing algorithms to learn policies with zero or bounded constraint violation. Compared with previous OFU-based algorithms, we add different kinds of pessimism into OFU to guarantee zero or bounded violation when we have different mild assumptions. We also solve an open problem in safe RL about how to design policy gradient-based algorithms for fast global convergence. By exploiting the hidden convexity of the problem and developing an anchor-changing regularized NPG framework, we improve the convergence without further assumption. Then we extend results to more general multi-objective RL, including smooth concave scalarization and minimax scalarization.
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning (NeurIPS 2022)
Policy Optimization for Constrained MDPs with Provable Fast Global Convergence (ArXiv 2021)
Learning from Few Samples:
Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance of images.
Cyber-Security for Networked Cyber-Physical Systems (CPS)
Networked Cyber-Physical Systems (CPS) refer to control systems that involve multiple computing nodes and diverse agents interacting with the physical world. These agents can consist of intelligent systems or robotic agents deployed in real-world applications. CPS are composed of a combination of physical components, such as plants, processes, or systems, and cyber components, such as software, code, and computation that are interconnected through a network, which can be either wired or wireless. The integration and cooperation of physical and cyber components are beneficial for CPS; however, the presence of networked cyber and physical components renders them vulnerable to potential cyber-attacks and other disruptive events.
In light of the fact that numerous networked CPS are utilized to support critical infrastructures or are classified as safety-critical systems, it is crucial to examine the feasibility of securing these systems against cyber threats and ensuring their adaptability to potential disruptions. The deployment of networked CPS, including Autonomous Vehicles, Chemical Process Control Systems, Unmanned Aerial Vehicles (UAVs), Unmanned Ground Vehicles (UGVs), Smart Energy Systems, etc. are typically reliant on sensor measurements to enable closed-loop control. As a result, these control systems are susceptible to cyber-attacks, whereby malicious agents may compromise the sensors or the networks that transmit sensor measurements.
We developed a general-purpose cyber-attack defense methodology for defending networked CPS against arbitrary cyber-attacks, which is called “Dynamic Watermarking (DW)”. Our DW method has been tested on prototype chemical process control system (coupled water tanks), mechanical system (helicopter), the power system with power electronics (grid-tied inverter), industrial adjustable speed drive system, and vehicular system (autonomous car).
Lantian Shangguan, Kenny Chour, Woo Hyun Ko, Jaewon Kim, Gopal Krishna Kamath, Bharadwaj Satchidanandan, Swaminathan Gopalswamy, and P. R. Kumar. “Dynamic watermarking for cybersecurity of autonomous vehicles.” IEEE Transactions on Industrial Electronics (2022).
Jaewon Kim, Akshay Mete, and P. R. Kumar. “Safe Control of Networked Chemical Process Plants Under Cyber-Attacks.” In 2022 AIChE Annual Meeting. AIChE, 2022.
Faris Alotaibi, Hasan Ibrahim, Jaewon Kim, and Prasad Enjeti. “Designing an Intrusion Proof Adjustable Speed Drive System Controlling a Critical Process.” In 2022 IEEE 13th International Symposium on Power Electronics for Distributed Generation Systems (PEDG), pp. 1-7. IEEE, 2022.
Hasan Ibrahim, Jaewon Kim, Prasad Enjeti, P. R. Kumar, and Le Xie. “Detection of cyber attacks in grid- tied pv systems using dynamic watermarking.” In 2022 IEEE Green Technologies Conference (GreenTech), pp. 57-61. IEEE, 2022.
Hasan Ibrahim, Jorge Ramos-Ruiz, Jaewon Kim, Woo Hyun Ko, Tong Huang, Prasad Enjeti, P. R. Kumar, and Le Xie. “An active detection scheme for sensor spoofing in grid-tied pv systems.” In 2021 IEEE Energy Conversion Congress and Exposition (ECCE), pp. 1433-1439. IEEE, 2021.
Tong Huang, Jorge Ramos-Ruiz, Woo-Hyun Ko, Jaewon Kim, Prasad Enjeti, P. R. Kumar, and Le Xie. “Enabling secure peer-to-peer energy transactions through dynamic watermarking in electric distribution grids: Defending the distribution system against sophisticated cyberattacks with a provable guarantee.” IEEE Electrification Magazine 9, no. 3 (2021): 55-64.
Jorge Ramos-Ruiz, Hasan Ibrahim, Jaewon Kim, Woo Hyun Ko, Tong Huang, Prasad Enjeti, P. R. Kumar, and Le Xie. “Validation of a Robust Cyber Shield for a Grid Connected PV Inverter System via Digital Watermarking Principle.” In 2021 IEEE 12th International Symposium on Power Electronics for Distributed Generation Systems (PEDG), pp. 1-6. IEEE, 2021.
Jaewon Kim, Woo-Hyun Ko, and P. R. Kumar. “Cyber-Security through Dynamic Watermarking for 2-rotor Aerial Vehicle Flight Control Systems.” In 2021 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1277-1283. IEEE, 2021.
Jorge Ramos-Ruiz, Jaewon Kim, Woo-Hyun Ko, Tong Huang, Prasad Enjeti, P. R. Kumar, and Le Xie. “An active detection scheme for cyber attacks on grid-tied PV systems.” In 2020 IEEE CyberPELS (CyberPELS), pp. 1-6. IEEE, 2020.
Jaewon Kim, and P. R. Kumar. “Security of Control Systems with Erroneous Observations.” IFAC- PapersOnLine 53, no. 2 (2020): 2225-2230.
Jaewon Kim, Woo-Hyun Ko, and P. R. Kumar. “Cyber-security with dynamic watermarking for process control systems.” In 2019 AIChE Annual Meeting. AIChE, 2019.
Suppressing Epileptic Seizures using the power of Neural Networks
An Epileptic seizure is defined as a transient synchronous neuronal activity which may temporarily lead to loss of motor functions. About 50 Million people worldwide suffer from Epilepsy and about 1.9% deaths in Epileptic patients are due to prolonged seizures. Currently medication and surgery are dominant solutions to this problem however both have severe downsides in terms of loss in quality of life. A new approach is based on stimulating the brain using a chip (electroceutical device) inserted in the epicenter of seizure activity. Thus, the problem statement is designing a detection and prediction algorithm to be used in this closed-loop electroceutical device which will be used to treat epilepsy and potentially also treat other brain disorders.
A Multi-Harmonic Method for Real Time Capacitor Condition Monitoring in Adjustable Speed Drive Systems for Industry 4.0
Capacitor age monitoring is essential in power electronics since the age of the capacitor significantly affects the performance of the overall system. The life of a capacitor is measured using the ESR resistance associated with the capacitor. The capacitor needs to be replaced once the ESR resistor’s resistance doubles. Now typical Adjustable Speed Drives have capacitors in the DC link. Commonly ASD failures are attributed to wear and tear of the DC link capacitors. The proposed multi-harmonic method estimates the ESR resistance of the DC-link capacitor and its capacitance value using DC-link voltage and current measurements as obtained from the ASD. This implies that no additional sensors or signals are required for capacitor condition monitoring. This method has shown to be operational in both balanced and unbalanced voltage operating conditions and we envision its deployment as a cloud based algorithm to help monitor several ASD systems in an industrial setting leading to the Industry 4.0 standard.