Just a few weeks ago, we were demonstrating why it is not helpful to use the sentiment scores provided by one of the largest and most respected suppliers of social media data. The tweet we pulled for one of our clients said, “I want [the client’s product] so bad right now!” This supplier scored this post as negative.
One of the things I dread is meeting with suppliers of social media data. As Yogi Berra so aptly said, “it’s déjà vu all over again.” The meeting will usually start with a demonstration of the beautiful user interface that a developer deftly created. The prospective vendor goes through the product and all of the wonderful insights that can be gleaned, which are about the same as the other several hundred suppliers in the space.
“So what about the data?”
Of course, all the suppliers have access to about the same data, with some variation on how the data is being cleansed for spam and non-human content. The real issue comes in when we deal with sentiment algorithms and defining categories. Usually, the supplier will say that his or her sentiment algorithm delivers 80-90% accuracy. Of course, there is no third-party validating this number. The harsh reality is that in practice, the number is far lower.
The supplier will usually say then that it’s possible to humanly train their algorithms in order to increase accuracy. This is asking their clients to supply their intellectual capital so that the company – and potentially the clients’ competitors – can benefit. This service model is vexing.
As Porter Novelli has determined over the years, both of these things are unique to our clients and the culture that surrounds their industries, stakeholders and customers. The conversations around our clients’ brands and products are fundamentally different than conversations in healthcare or technology consumers talk.
There are two components of social media analysis: the dictionary and the grammar. The dictionary provides the identification of the keywords and phrases relevant to the client. So, it includes the obvious suspects: the brand and product names, competitors and substitutes. Less obviously, it needs to include the terms used in the conversations, and the valence. “Low calorie” might be very important in the health foods category, but will have a totally different context in a conversation around premium ice cream.
Furthermore, the grammars across categories have observed differences. What this means is that a simple algorithm that scores a comment as positive if a client’s brand or product is five words near a positive term will work 80% of the time in some categories, and only 40% in others. This is a common example:
“Hi all – I found the new [client product] in the grocery store the other day (thanks for the heads up here). Tonight I was feeling “munch-y” with plenty of pp left, and tried a wedge with half a serving (11) Pretzel Slims from Trader Joe’s – which are those really thin and crispy pretzel crackers – I think they are sold everywhere under other names. Well, the sweet, creamy, crunchy, salty combo for so little pp’s (2) was EXCELLENT. Plus the protein really helped curb the munchies – just wanted to share.”
In this case, the product is about 70 words away from the first positive word; yet, this example should be scored as positive.
The best-in-class social media analysis requires machine learning as well as human scoring to make sure that the scoring algorithms are solid. Additionally, as conversations evolve and change, human interaction and scoring are required to ensure that the algorithms do not degrade and keep up with the conversation.
We have observed that these conversations are different based on our clients’ unique business needs. And this is why I don’t expect off-the-shelf sentiment analysis to get better anytime soon.