GET THE APP

Multi-objective deep reinforcement learning approach for ATM cash replenishment planning
..

Journal of Computer Science & Systems Biology

ISSN: 0974-7230

Open Access

Multi-objective deep reinforcement learning approach for ATM cash replenishment planning


Joint Event on 5th World Machine Learning and Deep Learning Congress and World Congress on Computer Science, Machine Learning and Big Data

August 30-31, 2018 Dubai, UAE

Nabil Belgasmi

Banque de Tunisie, Tunisia

Posters & Accepted Abstracts: J Comput Sci Syst Biol

Abstract :

The current framework of reinforcement learning is based on a single objective performance optimization that is maximizing the expected returns based on scalar rewards that come from either univariate environment response to the agent actions or from a weighted aggregation of a multivariate response. But in many real world situations, tradeoffs must be made among multiple conflicting objectives that have different order of magnitude, measurement units and business specific contexts related to the problem being solved (i.e. costs, lead time, quality of service, profits, etc.). The aggregation of such sub-rewards to get a scalar reward assumes a perfect knowledge about the decision maker preferences and the way she perceives the importance of each objective. In this study, we consider the problem of learning the best ATM cash replenishment policies in an uncertain multi-objective context given an arbitrary history of cash withdrawals that may be non-stationary and may contain outliers. We propose a model-free Multi-objective Deep Reinforcement Learning approach that allows us to compete against the human decision maker and to find the best policy per ATM that outperforms the current human policy. The idea is to disaggregate the performance of a replenishment policy to form a vector of objective functions. The performance of the human policy is then a multi-dimensional reference point (Rh). The task of the deep reinforcement learning algorithm is to find a policy that generates a set of performance points which Pareto-dominate the current human reference point (Rh).

Biography :

Nabil Belgasmi holds a PhD and an Engineering degree in Computer Science from Manouba University. He is a full stack Data Scientist at Banque de Tunisie, Tunisia. He is involved in three main activities: (1) Applied R&D, (2) Data Analytics Technology Watch and (3) Data Science consulting. He achieved many successful Data Science POCs and Quick-Wins: Credit Scoring, Forecasting, Cash Planning, Anomaly/Fraud Detection, Customers Profiling, Intelligent Transactions Scoring & Monitoring, etc. He is a Member of the Industrial Editorial Board of the Engineering Applications of Artificial Intelligence journal (EAAI).

E-mail: belgasmi.nabil@gmail.com

 

Google Scholar citation report
Citations: 2279

Journal of Computer Science & Systems Biology received 2279 citations as per Google Scholar report

Journal of Computer Science & Systems Biology peer review process verified at publons

Indexed In

arrow_upward arrow_upward