HAFTA: Highly adaptive fault-tolerant routing algorithm for two-dimensional network-on-chips

Ipek, Anil; TOSUN, SÜLEYMAN; Ozdemir, SUAT

doi:10.1002/cpe.6378

HAFTA: Highly adaptive fault-tolerant routing algorithm for two-dimensional network-on-chips

Ipek A., TOSUN S., Ozdemir S.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, cilt.33, sa.21, 2021 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 33 Sayı: 21
Basım Tarihi: 2021
Doi Numarası: 10.1002/cpe.6378
Dergi Adı: CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
Anahtar Kelimeler: congestion, fault-tolerance, mesh topology, network-on-chip, Q-learning
Hacettepe Üniversitesi Adresli: Evet

Özet

Learning-based routing algorithms are good candidates for two-dimensional Network-on-Chip (NoC) architectures since they can give good path selection decisions by combining current and past state of the network traffic. On the other hand, they generally send packets through nonminimal paths in order to detour congested areas in an attempt to minimize the communication cost. Since adapting to the traffic changes takes time to gather enough feedback information by the applied learning model, non-minimal paths may take more time than the congested minimal paths. In this work, we incorporate a probabilistic method to a Q-learning-based NoC routing algorithm for selecting a minimal or nonminimal path to minimize the negative effect of learning duration. We also consider the errors in the architecture and propose a fault-tolerance mechanism that detects both transient and permanent link errors. We compared our method against a standard Q-learning-based routing algorithm on several traffic models in terms of throughput and latency. The results show that our method outperforms its counterpart up to 30% in latency and 7% in throughput.