Request a Demo

How Text Analytics Works


When customers are ecstatic or disappointed with their experience with your company, it’s very likely that they will leave comments filled with extremely useful information for your business.  These comments can be left anywhere — from surveys to social reviews and status updates.  For many brands, however, this wealth of data can become too overwhelming for human analysts to handle — thousands of pieces of freeform feedback across a wide variety of channels.  That’s the point when text analytics provides a great solution.  It draws insights from all the unstructured data by taking sentiment-rich comments and sorting them into business-relevant categories.

But how does text analytics actually work?

Text analytics actually involves two types of analyses: topic analysis and sentiment analysis.  Combining both types of analysis allows you to take a look at which business topics have the most negative sentiment, so that you can focus on what to improve.

Topic Analysis categorizes phrases within customer feedback into business relevant topics.  For example, “the sales associate was really nice” would be categorized under the topic of “Staff Friendliness”.  There are generally two ways to accomplish this: a manually set up, rules-based approach, and machine learning techniques.

For the manual, rules-based approach, analysts and linguists have to create each rule manually.  A rule could be that a clause with the co-occurrence of two words such as “friendly” and “employee” would be placed under a “Staff Friendliness” topic.  Rules could also examine word order and the grammatical relationship between key words.  Since each rule has to be created manually, the setup process does take quite a bit of investment, but the upshot is that the categorized comments are highly precise.

For machine learning, there are two main approaches: supervised classification and clustering.  To set up supervised classification, an analyst manually goes through a sample set of comments, and assigns topics to each one.  This annotated data set is then used to train the classifier.  Once the classifier has been trained with this sample annotated data, it can then automatically tag new comments based on what it’s learned.  While annotating a data set may be less labour intensive compared to creating rules, classifiers only work successfully when there are fewer than 10 topics.

Clustering, as its name suggests, clusters similar comments together.  Comments that mention particular words with high frequency are clustered together.  This works well with the news, where articles with the same names and places occurring with high frequency are most likely discussing similar news topics.  The Voice of the Customer, however, is much more varied.  There are a lot more ways of saying things in customer feedback compared to the rigid structures and vocabulary of the news.

Sentiment analysis tags phrases as having positive or negative sentiment.  “The sales associate was really nice” would be tagged as positive.  It uses analogous techniques to topic analysis: rule/dictionary-based, and machine learning.

Dictionary-based sentiment analysis is very easy to set up.  It’s similar to pulling all the words out of a dictionary, and assigning positive or negative sentiment to each word.  The sentiment of words changes, however, depending on the context.  You would usually think of swear words as conveying negative sentiment, but in the gaming community, for example, things are fuzzier.  Positive words are often used ironically, and negative words actually have positive sentiment when put into context.

To allow for context, supervised machine learning techniques provide a much better way for assigning sentiment.  Similar to the supervised classification described for topic analysis, supervised machine learning for sentiment analysis involves taking a sample set of clauses for the context you’re interested in (for example, comments from a specific gaming community), and manually assigning each clause a positive or negative sentiment.  From this annotated data set, the algorithm can then assign new clauses with sentiment based on what it’s learned from the sample of comments.

At Medallia, we combine rule-based topic analysis as well as supervised classification and clustering to ensure that we capture as much feedback as possible and that the categorized comments are precise.  For sentiment analysis, we make use of the flexibility of supervised  machine learning.  Since topics and sentiment varies widely by industry, we build industry and context-specific text analytics approaches so that we capture only business-relevant topics with accurate sentiment.

Find out more about  text analytics. Take a look at our whitepaper.

Photo credit: Marie Buyens