@itc

The Cost of Learning Fast with Reinforcement Learning for Edge Cache Allocation

, , , , and . 32th International Teletraffic Congress (ITC 32), Ph.D. Workshop, Osaka, Japan, (2020)

Abstract

We study data-driven cache allocation in Multi-Tenant Edge Computing: a Network Operator (NO) owns storage at the Edge and dynamically allocates it to third party application Content Providers (CPs). CPs can cache a part of their catalog and satisfy locally users' requests, thus reducing inter-domain traffic. The objective of the NO is to find the optimal cache allocation, which minimizes the total inter-domain traffic bandwidth, which constitutes an operational cost. Since CPs' traffic is encrypted, NO' s allocation strategy is based solely on the amount of traffic measured. In this exploratory work, we solve this problem via Reinforcement Learning (RL). RL has mainly been intended to be trained in simulation, before applying it in real scenarios. We instead employ RL online, training it directly while the system is hot and running. An important factor emerges in this case: in order to learn the optimal cache allocation, the NO needs to perturb the allocation several times and measure how the inter-domain traffic changes; when perturbing the allocation, the NO has to pay a perturbation cost. While it has no physical meaning in simulation, it cannot be ignored in a hot and running system. We explore in this work the trade-off between perturbing a lot the system in order to learn a good allocation faster, or learning slower to reduce the perturbation cost. We show results from simulation and make the entire code available as open-source.

Links and resources

Tags