QAID: A Comprehensive Framework for Evaluating AI Coding Assistants Beyond Vendor Claims


Esirik B., GÖKALP AYDIN E.

IEEE Software, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1109/ms.2026.3678750
  • Dergi Adı: IEEE Software
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Compendex, INSPEC
  • Hacettepe Üniversitesi Adresli: Evet

Özet

AI-powered coding assistants have significantly outperformed conventional evaluation methods, creating gaps between vendor claims and actual performance. This study presents QAID (Quality Assessment for AI Development Tools), a comprehensive, vendor-neutral framework extending ISO/IEC 25010 with AI-specific criteria, evaluating coding assistants across six dimensions: Generation Ability, Linguistic Capability, Operational Quality, Interaction Quality, Trustworthiness, and Sustainability. We evaluated four leading tools—ChatGPT (GPT-5.1), Gemini (3 Pro), Claude (Sonnet 4.5), and DeepSeek (v3.2)—using dual stakeholder weighting (expert/user) and performance measures. Results reveal no single tool dominates: GPT leads in linguistic capability (73.13), Gemini in generation ability (67.62), Claude in trustworthiness (80.58) and operational quality (73.10), while DeepSeek excels in sustainability (86.46). Cross-platform validation shows strong correlations (Spearman’s ρ = 0.90-1.00, p < 0.01) with independent benchmarks. Rankings remain stable across weighting schemes (Cronbach’s α = 0.81-0.84). QAID enables the first systematic, evidence-based tool selection tailored to organizational context beyond marketing claims.