Overview
Many companies are experimenting with generative AI or large language models (LLMs) or using them to deliver services. Nielsen has launched Nielsen IQ, a product that simulates human evaluations of new products. Largo.ai predicts how the target audience reacts to new offering.
Amidst the technological excitement, questions about quality often get neglected.
The idea of using Large Language Models (LLMs) in marketing emerged shortly after their development (see Qian et al. 2025). One of their main advantages is the ability to generate in silico samples, i.e., produce synthetic data that mimic human responses to questionnaires and interviews, but at a fraction of the cost (Arora et al. 2024). Previous qualitative analyses of LLM results show mixed findings (Sarstedt et al. 2023). Some studies from the marketing literature or related disciplines reported good agreement between synthetic data and human responses (Brand et al. 2023, Li et al. 2023), while others found discrepancies ranging from minor (Goli & Singh 2023, Arora et al. 2024) to severe (Gao et al. 2024). These studies used simple or limited metrics to assess the quality of synthetic samples such as accuracy, mean and variance, and less frequently AUC or Kullback-Leibler divergence.
Companies considering using LLMs thus often have to rely on anecdotal, qualitative evidence to make decisions. This is confusing not only for the companies themselves, but also for their customers, who must trust the providers without knowing the strengths and weaknesses of in silico data for their specific use cases. At the same time, benchmarks play a central role in the development of AI systems (Sculley et al. 2025). As long as a robust method for evaluating performance does not exist, the further development of LLMs for marketing will remain hampered.
In September 2025, the first roundtable discussion on best practices for synthetic data in market research took place as part of SwissAI Weeks. In April 2026, we presented the evaluation framework for quantitative and qualitative data at the AI Agents Summit in Lucerne. The meeting notes and the evaluation framework can be found in the attached documents.