Hype to Quality: Assessing Generative AI Products Before Use

Esirik B. E., GÖKALP AYDIN E.

32nd European Conference on Systems, Software and Services Process Improvement, EuroSPI 2025, Riga, Letonya, 17 - 19 Eylül 2025, cilt.2657 CCIS, ss.54-68, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 2657 CCIS
Doi Numarası: 10.1007/978-3-032-04288-0_4
Basıldığı Şehir: Riga
Basıldığı Ülke: Letonya
Sayfa Sayıları: ss.54-68
Anahtar Kelimeler: Generative AI, Large Language Model, Quality Assessment, SLR, Software Quality, User-centered Quality
Hacettepe Üniversitesi Adresli: Evet

Özet

Generative Artificial Intelligence (GenAI) technologies have rapidly permeated personal applications and industries owing to unprece-dented efficiency gains. This swift adoption has been accompanied by accelerated technological evolution and market proliferation of diverse generative solutions. Consequently, users struggle to evaluate GenAI-enabled products against their specific requirements, as traditional software quality assessment frameworks inadequately address the unique characteristics of AI systems, often prioritizing technical metrics over user-required alignment, which is only apparent through practical applications. This research comprehensively examined current evaluation approaches, enabling users to objectively assess GenAI products. Through a systematic literature review (SLR) of 42 studies, we identified and analyzed methodologies that facilitate meaningful comparisons of GenAI solutions based on requirement fulfillment. Our findings reveal four primary evaluation approaches (benchmark-based, model-based, human-assisted, and automated) and establish a three-layer quality taxonomy that distinguishes traditional software, AI-enabled, and GenAI-specific quality attributes. Our analysis emphasizes user-centered quality evaluation paradigms that bridge the gap between technical performance metrics and actual user values across diverse application contexts.