IEEE Software, 2026 (SCI-Expanded, Scopus)
AI-powered coding assistants have significantly outperformed conventional evaluation methods, creating gaps between vendor claims and actual performance. This study presents QAID (Quality Assessment for AI Development Tools), a comprehensive, vendor-neutral framework extending ISO/IEC 25010 with AI-specific criteria, evaluating coding assistants across six dimensions: Generation Ability, Linguistic Capability, Operational Quality, Interaction Quality, Trustworthiness, and Sustainability. We evaluated four leading tools—ChatGPT (GPT-5.1), Gemini (3 Pro), Claude (Sonnet 4.5), and DeepSeek (v3.2)—using dual stakeholder weighting (expert/user) and performance measures. Results reveal no single tool dominates: GPT leads in linguistic capability (73.13), Gemini in generation ability (67.62), Claude in trustworthiness (80.58) and operational quality (73.10), while DeepSeek excels in sustainability (86.46). Cross-platform validation shows strong correlations (Spearman’s ρ = 0.90-1.00, p < 0.01) with independent benchmarks. Rankings remain stable across weighting schemes (Cronbach’s α = 0.81-0.84). QAID enables the first systematic, evidence-based tool selection tailored to organizational context beyond marketing claims.