Linguistic Diversity in Social Listening & Market Research
As more and more organisations carry out social listening and market research into different markets, understanding languages is more important than ever. Paolo Vacca explores how to understand linguistic diversity and why having native language experts on your side is always a good thing.
It has become somewhat clichéd to begin any piece of writing about language diversity by stating the obvious: “we live in a globalised world”. But the fact of the matter is that today’s companies are more likely to deal with partners and customers that are based overseas and speak one or more languages.
There is prolific literature on the prominence of the language factor in marketing and advertisement, but very little is said about language diversity in digital research. At Media Measurement, we pride ourselves to work in 40+ language markets, from widely spoken ones such as Spanish and Chinese to “smaller” languages like Slovenian and Icelandic. We enable our clients to be at the heart of the media conversation even when current technology offers very limited (if not just inadequate) solutions for lesser spoken languages. To do this, we rely on in-house expertise and awareness on how different languages are structured to maximise the quality of our research results.
The majority of technological solutions in the data industry are – for one reason or another – designed on the footprint of widely spoken languages. This represents a challenge in terms of resources that give data analysts and their clients information about audiences that speak less common languages. Digital market researchers often find themselves constrained in a landscape of technology solutions that – although incredibly powerful and insightful for mainstream languages – fail to take the most out of content that was produced in languages that do not fit in the model of, for instance, English and French. This is the case of many morphologically complex languages.
Let me explain.
In English, we (usually) mark the plural by adding an s at the end of the word – e.g. house – houses. In linguistics, that s is called a morpheme. Generally speaking, a morpheme is the smallest grammatical unit in a language. Free-standing words can be morphemes: “house” is a morpheme. But also the last s in houses is a morpheme because it represents a small unit that is saying: this is more than one house. This one specifically is a bound morpheme. It is “bound” because s on its own does not mean anything in English.
English has plenty of these morphemes: words like late-st, farm-ing, walk-ed, un-happi-ness contain bound morphemes. Yet, if compared with other languages, English is actually quite “simple”. Morphologically complex languages attach a lot of these small endings to express meaning. This means that, when looking at online content, words appear in a very wide range of forms.
Finnish is a very good example. “House” is “Talo” in Finnish, whereas housesis Talot. So far so good, nothing mind-blowing here.
But things become a little bit more complicated when we put the word house in a sentence. Let’s have a look at some examples:
- Talo on iso. (The house is big)
- Se on kylmä talossa. (It’s cold in the house)
- Menemme taloon. (We are walking to the house)
In examples 2. And 3. Talo became Talossa and Taloon. Yes, you guessed it: -ossa and –on are bound morphemes. They’re describing the role of the word house is doing in each sentence. In the second one, -ossa is saying that the house is the place in which it is cold; in the third example -on tells us that the house is the place where we are heading to. If we wanted to say that we are walking from the house, we would have said something like Menemme talosta– another morpheme!
These morphemes in Finnish mark the grammatical case – or in simpler words, the role that each word has in the sentence. This phenomenon is known to trained linguists, but not always familiar to media analysts working on different markets. Yet, this type of “linguistic awareness” is of utter importance when setting up search term strings designed to access all-inclusive corpora of media content. Oversimplified content searches that do not take into account linguistic diversity pose the risk of missing out on important pieces of data that will affect the final output.
Such challenges do not only apply to complex morphology, but also syntax (i.e. the arrangement of words in a sentence). If we wanted to look at consumers’ behaviour in Germany, and we set up a search for all social media posts containing an explicit action taken by the user, we couldn’t rely on English syntactic structures only. Let’s say I wanted to look for all social media posts mentioning the phrase “I have bought/purchased” – regardless of the products being bought. The process is pretty straightforward in English, but in German or Dutch the phrase might also be structured as “I have (something) bought”, or “have I something bought”. The searches need to take into account the fact that “have” and “bought” do not necessarily appear next to one another in a sentence. This type of knowledge enables analysts to maximise the quality of the content researched and therefore the data collected and presented to the client.
The brief examples that I’ve explored should highlight to you just how important understanding linguistics is at the heart of social listening and market research. Simply relying on a single technology solution – that’s designed on the footprint of widely spoken languages – might be fine for mainstream languages, but not for all. Having an expert on hand who understands the construction of languages will be a huge time saver in the long run, and ultimately provide more valuable insights for your organisation.
Drop us an email on firstname.lastname@example.org if you would benefit from our approach.
Media Measurement is a leading digital research consultancy. We breathe life into data, telling meaningful stories that inspire fresh thinking. Today’s global digital village holds billions of conversations connecting people, ideas and opinions – this revolution is a force for good.
We work with organisations who have a positive mission. Who dare to challenge the status quo. Who recognise the opportunities hidden in digital data so they can act – when and where it matters.
MM. INSPIRING DIGITAL INTELLIGENCE.