Hype to Quality: Assessing Generative AI Products Before Use


Esirik B. E., GÖKALP AYDIN E.

32nd European Conference on Systems, Software and Services Process Improvement, EuroSPI 2025, Riga, Latvia, 17 - 19 September 2025, vol.2657 CCIS, pp.54-68, (Full Text) identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 2657 CCIS
  • Doi Number: 10.1007/978-3-032-04288-0_4
  • City: Riga
  • Country: Latvia
  • Page Numbers: pp.54-68
  • Keywords: Generative AI, Large Language Model, Quality Assessment, SLR, Software Quality, User-centered Quality
  • Hacettepe University Affiliated: Yes

Abstract

Generative Artificial Intelligence (GenAI) technologies have rapidly permeated personal applications and industries owing to unprece-dented efficiency gains. This swift adoption has been accompanied by accelerated technological evolution and market proliferation of diverse generative solutions. Consequently, users struggle to evaluate GenAI-enabled products against their specific requirements, as traditional software quality assessment frameworks inadequately address the unique characteristics of AI systems, often prioritizing technical metrics over user-required alignment, which is only apparent through practical applications. This research comprehensively examined current evaluation approaches, enabling users to objectively assess GenAI products. Through a systematic literature review (SLR) of 42 studies, we identified and analyzed methodologies that facilitate meaningful comparisons of GenAI solutions based on requirement fulfillment. Our findings reveal four primary evaluation approaches (benchmark-based, model-based, human-assisted, and automated) and establish a three-layer quality taxonomy that distinguishes traditional software, AI-enabled, and GenAI-specific quality attributes. Our analysis emphasizes user-centered quality evaluation paradigms that bridge the gap between technical performance metrics and actual user values across diverse application contexts.