How data science helps to better understand customer feedback.

Data Analytics

Mar 31, 2025

Content

Title

Authors

Johannes Dauter

Expert Data Analytics

How would you rate this app? Tell us something about this product! How satisfied were you with our car rental service? These and similar phrases will seem familiar to most people. They constantly encounter us in everyday life and are an integral part of the business model, especially in e-commerce, digital services, and online marketing. Customer and user experiences are particularly valuable in the B2C business.

The aim of well-known customer surveys is usually either to identify weaknesses in the product or to generate what is known as "electronic word of mouth" (eWOM). These goals do not automatically exclude each other, but the strategy of how, where, and when the customer is asked for feedback can vary.

Customer Survey for Generating Electronic Word of Mouth

To generate eWOM, it is essential to focus on public forums where the obtained customer feedback can be read by other potential customers. Typically, an attempt is made to maximize the three factors “volume”, “valence” and “dispersion”.

Volume is the total amount of generated customer feedback.

Valence is the sentiment (also referred to as "sentiment") of the customer feedback.

Dispersion is the extent of feedback distribution across multiple platforms.

In an ideal scenario, a company generates a large amount of positive feedback on multiple platforms. Thus, customer surveys can be used as a marketing tool and ideally serve as a trust signal for potential customers. Examples include product reviews, which reflect authentic experiences of real customers and can significantly influence the purchasing decision.

Customer Survey for Gathering Product Improvements

To identify product weaknesses, customer feedback does not necessarily have to be published. It can also be collected anonymously, as the insights from the feedback serve exclusively for internal development. The type of feedback that is useful varies from the marketing-oriented purpose: while marketing prefers positive feedback, analyzing critical reviews can reveal significantly more value in product improvement.

This type of evaluation of customer feedback is interesting for various companies:

Companies with an iterative product development process, where the product is continuously developed, and insights from the last generation are incorporated into the new generation.

Companies offering a long-term service and looking to uncover problems and optimization potentials along their processes.

With certain methodologies, it is possible to systematically evaluate customer feedback and derive product improvements. We will illuminate how such a system can be integrated based on examples of requirements, methods, and technologies.

Customer Sentiment Analysis

What actually is customer feedback, and how can the contents of feedback be classified? These questions are dealt with in the research area of “Customer Sentiment Analysis.” This term describes the field around the analysis of customer opinions regarding brands, products, topics, or services and is central to Data Science. Customer Sentiment Analysis combines algorithms, psychology, and linguistics to systematically extract information from customer feedback.

The following terms are relevant to understand how feedback, for example from an online review, can be systematically analyzed:

Opinion (engl. opinion): An online review can contain several opinions. An opinion has two components: a sentiment target and a sentiment.
Sentiment target (engl. sentiment target): This is the entity about which the opinion holder expresses their opinion. Sometimes also referred to as a topic.
Sentiment (engl. sentiment): The sentiment lies on a spectrum between positive and negative.
Opinion holder (engl. opinion holder): The person expressing an opinion. This does not necessarily have to be the same person as the author of the review.
Time of opinion (engl. time of opinion): The time when the opinion was expressed. This is especially necessary to make changes in opinion traceable. This also allows one to determine which product version was reviewed.

In summary, an opinion holder, for example, creates an online review at a defined point in time. This review contains one or more opinions. Each opinion contains a sentiment target with an associated sentiment. This theory can be applied to an example:

“(1) Overall, I am satisfied with the vehicle, but there are some points that my wife and I did not like. (2) I love the extremely cozy interior. (3) What really annoys me is the sensitive lane-keep assistant. (4) My wife finds the load edge at the trunk too high.”

This review contains four opinions:

Opinion	Sentiment Target	Sentiment	Opinion Holder
1	Overall vehicle	Rational positive	Author
2	Interior	Emotional positive	Author
3	Lane-keep assistant	Emotional negative	Author
4	Trunk	Rational negative	Wife

It makes a fundamental difference whether an opinion is expressed from a rational standpoint or whether emotions influence the opinion. Together with the general sentiment, this creates a broad spectrum in which the given opinion can be classified. So, it is crucial to capture the sentiment of the opinion holder as accurately as possible.

Analysis Methods of Customer Sentiment Analysis

The overarching goal of the analysis is to classify opinions, for example, by sentiment target, sentiment, or opinion holder. Generally, two approaches can be distinguished:

Rule-based analysis: For example, using a so-called Sentiment Lexicon. Here, rules are provided based on human expertise, such as which adjectives have a positive or negative connotation. However, these models typically struggle with context classification. The word “long” can, for example, be positively viewed in the context of “long battery life.” In the context of “long loading time,” “long” has a negative connotation.

Statistical models: These models are based on calculating probabilities between words. Based on these probabilities, a classification can be made. Often, these models need to be trained. In these cases, a dataset is labeled to establish a ground truth. Based on this, known relationships between words are created. When the model is fed new data, the known relationships can be applied to the new data. Statistical models are therefore somewhat more labor-intensive to train but yield more reliable results. However, there are also models that do not require labels and are referred to as unsupervised or zero-shot learning. An example of this class includes large-language models, or LLMs.

Challenges in Customer Sentiment Analysis

The detection of sarcastic expressions is not trivial for conventional methods. With a literal interpretation of the statement, the opinion initially sounds positive. Rhetorical questions also pose a challenge, as they can be misunderstood as serious inquiries. The general mood and emotional state of the opinion holder also influence the objectivity of an opinion. Here, a mechanism is needed that can recognize and classify these aspects accordingly. The handling of multilingual data is also not straightforward, as translations can introduce distortions into the data.

Customer Sentiment Analysis with LLMs

A possible solution to master the challenges is the use of LLMs. Suppose we have an international product, such as a vehicle, for which we have collected customer feedback. We would like to use the customer feedback for development purposes and directly present the customers' biggest pain points to the development teams. Here, an LLM can be used to clean, structure, and ultimately classify the data. In this case, the sentiment target of the feedback would be classified according to the allocated development team.

Customer Sentiment Analysis Process

The above figure exemplifies what such a process could look like. The knowledge of the LLM can be further enriched at every single process step. It can be specified into which classes the classification should be made and what responsibilities each development team has. In the case of extremely technical and specialized feedback, a sentiment lexicon can be included to classify the terminology. Subjectivity, sarcasm, colloquial language, and rhetorical questions can be interpreted, especially when the LLM is specifically instructed to do so.

LLMs are not deterministic systems. This means that the output is always subject to a certain random factor. This can lead to problems in some application cases. However, in the field of customer sentiment analysis, the enormous advantages in interpreting natural language outweigh the drawbacks. Moreover, it can be argued that the interpretation of an opinion is not deterministic anyway and allows for a certain degree of interpretation. Finally, such an LLM process can also be complemented with other statistical methods.

Customer Sentiment Analysis with CarByte

Many customers are concerned about topics such as data security and the provision of LLMs. CarByte has already gained some experience with the use of LLMs and actively collaborates with customers to find solutions for the given case. In addition, we offer technical expertise in the field of customer sentiment analysis and are happy to provide consulting on best practices and soliciting customer feedback. Contact us to explore further use cases for your company!

Back to the blog