CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, cilt.33, sa.21, 2021 (SCI-Expanded)
Learning-based routing algorithms are good candidates for two-dimensional Network-on-Chip (NoC) architectures since they can give good path selection decisions by combining current and past state of the network traffic. On the other hand, they generally send packets through nonminimal paths in order to detour congested areas in an attempt to minimize the communication cost. Since adapting to the traffic changes takes time to gather enough feedback information by the applied learning model, non-minimal paths may take more time than the congested minimal paths. In this work, we incorporate a probabilistic method to a Q-learning-based NoC routing algorithm for selecting a minimal or nonminimal path to minimize the negative effect of learning duration. We also consider the errors in the architecture and propose a fault-tolerance mechanism that detects both transient and permanent link errors. We compared our method against a standard Q-learning-based routing algorithm on several traffic models in terms of throughput and latency. The results show that our method outperforms its counterpart up to 30% in latency and 7% in throughput.