An Enhanced Integer Linear Programming Model for Communication‐Aware and Load‐Balanced Neuron Grouping in Manycore Neural Network Accelerators


Çakin A., Dilek S., Tosun S.

CONCURRENCY COMPUTATION PRACTICE AND EXPERIENCE, cilt.38, sa.7, ss.1-12, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 38 Sayı: 7
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1002/cpe.70676
  • Dergi Adı: CONCURRENCY COMPUTATION PRACTICE AND EXPERIENCE
  • Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), Compendex, INSPEC, zbMATH
  • Sayfa Sayıları: ss.1-12
  • Hacettepe Üniversitesi Adresli: Evet

Özet

ABSTRACT Efficient neuron grouping is critical to minimize the communication cost between processing elements and ensure balanced computational workloads, especially in modern hardware accelerators. This paper presents a novel Integer Linear Programming (ILP)‐based method for communication‐aware and load‐balanced neuron grouping for neural network accelerators. The proposed ILP formulation supports fully connected and pruned neural networks, making it suitable for contemporary sparse architectures. To evaluate the effectiveness of the proposed method, we conducted extensive experiments on a diverse set of neural network benchmarks with varying sizes and connectivity patterns. Results demonstrate that our method significantly reduces execution time, achieving speedups of up to three orders of magnitude for fully connected networks, while maintaining optimality. Unlike prior ILP‐based approaches, which fail due to excessive memory consumption on large networks, our method successfully computes solutions for previously intractable benchmarks. However, ILP remains a computational bottleneck for large pruned networks, indicating the need for further scalability improvements. These findings establish our solver‐optimized ILP formulation as a practical tool for neuron grouping in high‐performance Network‐on‐Chip‐based manycore accelerators, bridging the gap between mathematical optimization and real‐world hardware constraints.