Comtec's James Brown Explores What ChatGPT Can't Do

By now, we’re probably all familiar with ChatGPT, the remarkable creation by OpenAI that has earned its stripes as a powerful tool for facilitating translations.

But, like any tool, ChatGPT has its boundaries. It’s vital to understand where it shines, where it stumbles – in some cases quite spectacularly (and potentially dangerously) – and why.

In this article we’ll look at the world of ChatGPT, exploring its capabilities and limitations in the realm of translations; we’ll also look at the intricacies of translation, a highly skilled, nuanced, and expert art beyond mere linguistic conversion.

The good news: ChatGPT can help streamline the translation process

Hailed by some as a time-saving, free resource, ChatGPT is a prime example of how AI can potentially simplify the somewhat intricate process of translating content into a foreign language.

Here are some of the plus points for ChatGPT:

It is a large language model (LLM) containing a huge multimodal dataset.
It uses a transformer architecture which allows much broader context.
As a learning model, it is improving all the time.
A recent academic paper released by Microsoft concluded that ChatGPT performs “competitively” versus machine translations for high-resourcelanguages. These are contexts where sufficient data resources exist for a particular language pair – i.e. involving the most widely spoken languages – to build machine translation tools. English is by far the most highly resourced language, and Western European languages are good examples of languages with a lot of coverage.
All this supports the view that an AI tool such as ChatGPT (currently 3.5 or 4) can carry out translations with remarkable speed and accuracy, at least for high-resource languages such as most European languages.

So, we have a free tool that can – with the right input – handle translations quickly and accurately.

But there’s one more trait of ChatGPT that is hailed by some as its superpower: It can create content directly in a target language, skipping the need for translations altogether.

At this point, you might be thinking, “This is great! ChatGPT can write my article/proposal/product description/social media post/blog/other content. No need for a human translator at all.”

Erm, not quite.

There are plenty of scenarios where a tool like ChatGPT just doesn’t cut the mustard, and where to rely on it – without any human intervention at all – would be potentially disastrous.

Let’s take a look at those now.

Where AI struggles

There are plenty of stumbling blocks for AI-based content creation, with the consequences ranging from mistranslation (and whatever the impact from that might be) to sounding inauthentic/unnatural or producing biased, offensive, or inaccurate content.

Let’s take a closer look.

Cultural nuances

Language is not just a concatenation of words and grammar rules, but also a living tapestry of culture, history, and context.

While ChatGPT’s algorithms excel in deciphering words and sentence structures, they often stumble when trying to grasp the intricate cultural nuances that significantly influence the meaning of a message.

In English – the most resourced language – AI models are good at colloquialisms and slang, making real-time interactions with chatbots as natural-sounding as possible. But let’s think about some commonly used English idioms and what their equivalents are in other languages:

quiet as a mouse = quiet as a fish in Hebrew
pull someone’s leg = pull someone’s hair in Spanish
it’s raining cats and dogs = it’s raining chair legs in Greek
it’s a piece of cake = like eating cold porridge in Korean
empty-handed = with Hunain’s slippers in Arabic

For many cultures, these commonly held truths or colloquial expressions will be the product of a whole mix of cultural inputs – religion, lifestyle, diet, and even the weather.

The data needed to accurately and reliably produce the exact meaning in a natural-sounding way is extensive.

It is easier to achieve in high-resource languages, but a monumental challenge for low-resource languages, with potentially disastrous consequences if you get it wrong.

Strategic and creative content

Another area where ChatGPT falters is in handling strategic and creative content.

Coming up with ideas, solutions to problems, persuasive writing, emotion-based writing… These are examples of highly complex thinking that requires a vast amount of context and a certain type of intelligence.

And suppose we fast-forward to a world where AI plays an ever-expanding role in content creation. In that case, there’s a genuine risk of diluting the very things we find engaging – humour, original thought, and individual interpretation.

In fact, a recent article by Fast Company argues that AI is actually making us boring. Thinking about the business applications of ChatGPT and translations, boring is the last thing we want for marketing campaigns, storytelling, or strategic planning.

If we’re looking for cut-through, impact, and relevance, we should keep involving real human brains.

Specialised translations: the high-risk zone

One of the most crucial areas where ChatGPT should be approached with caution is specialised translations, especially those involving legal or regulated content. Remember, unlike neural machine translation engines, large language models were not built to serve solely as translation tools.

Precision and accuracy become paramount in these domains, and errors can have severe consequences.

From courtroom translations to legal negotiations, contract writing, terms and conditions, all that pesky “small print” language matters. And if you get it wrong, it really matters.

The result?

Apart from obvious errors, confusion, and embarrassment, poor prompting (and checking) can lead to mistakes with drastic consequences.

Hallucinations: an accuracy horror story

ChatGPT’s eagerness to please famously led it to cite six bogus cases in a personal injury suit filed in New York.

The lawyer was suing an airline on behalf of his client and used ChatGPT to prepare the paperwork. Unfortunately, ChatGPT invented six fake cases and used them as benchmarks in the filing.

This is an example of “hallucination”, where AI gathers a bunch of data, interprets it as best it can, but ultimately comes up with the wrong answer. A kind of “2 + 2 = 5” situation, where the result is a falsehood presented as (a very believable) fact.

The bogus cases debacle was one of the first times a ChatGPT hallucination appeared in the courtroom, resulting in fines for the lawyers involved and – much more costly – serious damage to their reputations.

You might also have read recently about the Microsoft article produced using AI that branded a famous NBA star “useless” in his obituary.

If these sorts of errors in English are slipping the net of detail-oriented lawyers and editors, imagine the risks when producing content in a second language!

Ethical and bias concerns

ChatGPT is based on 300 billion words or roughly 570GB of input data. And all of that data predates 2021.

That means that the models that fuel ChatGPT are based on data collected at least two years ago but which, in many cases, dates much earlier than that. These vast data sources contain obvious biases, such as those related to gender and race.

Why is this risky for translations?

Well, think of social norms and how quickly these progress and change. Not only may ChatGPT be using analysis or data models that contain inherent biases, but it’s also perpetuating them in 2023 by adding biased content to the dataset used to generate content. It might be learning to be biased.

Basically, ChatGPT is a snapshot in time, unable to account for anything after a certain date unless prompted by a user who adds new data when generating their content. In recognition of this potential trip-up, Open AI has just released a new feature this month which allows the user to browse the internet. Time will tell how this improves the output.

In terms of generating content, especially content that might be tricky to double-check, how will you know that what ChatGPT has produced isn’t racist?

Languages where ChatGPT falls short

The topic of bias becomes a particularly thorny issue for low-resource languages, where there’s generally a lack of native linguistic input or model training.

A study by Cornell University earlier this year found that ChatGPT perpetuates gender bias for low-resource languages such as Bengali (among others).

But Bengali is the 7th most widely spoken language in the world, so there’s a real risk for content creators using AI models with a Western, English language bias of producing results that aren’t just incorrect but hugely offensive.

The consensus in the translation community, particularly in the field of research, is that low-resource equals low quality.

Examples include:

Mandarin Chinese: ChatGPT was woefully under-trained in Chinese languages. Mandarin is a tonal language with many characters and intricate writing systems. AI often grapples with accurately capturing the nuances and context required for content creation in Mandarin.
Arabic: Arabic is a language with intricate grammatical and contextual dependencies. The flexibility in word order and multiple dialects pose substantial challenges for AI in generating coherent Arabic content. In fact, two other LLMs have been developed – Jais and Jais-chat – that outperform ChatGPT, thanks to having been specifically trained on Arabic.
Indigenous and lesser-studied languages: As ChatGPT heavily relies on extensive training datasets, languages with limited digital presence and documentation pose a real problem for the tool. Here is a direct quote from the Māori news website Te Ao: “Admittedly, ChatGPT could never pass for a sane human in Māori.”
Languages with complex scripts: Japanese, Thai, or Devanagari (used in Hindi) can be demanding for AI models. Some scripts are incredibly sensitive, and the smallest error can mean huge differences in meaning and understanding. A Japanese language school recently put ChatGPT to the test, translating English to Japanese. The results were far from convincing, with their final piece of advice being: “Not terrible, but if you are sending a sensitive or important message, perhaps [it is] better to avoid using AI.”

In conclusion: use with care

To maximise the benefits of ChatGPT, AI should be seen as a complement to human translation expertise rather than a replacement.

It’s easy to forget, but as a technology it’s still incredibly new: GPT-4 was launched only in March of this year, after ChatGPT’s launch in November 2022 using GPT-3 and then 3.5. It certainly has the potential to help translators and marketers alike in the future, but for now, the human touch remains indispensable in scenarios where precision, cultural sensitivity, and context comprehension are paramount.

A second thing to remember is that the output is only ever as good as the input. Indeed, knowing how to write prompts is a skill in itself: to get the most out of LLMs, you need a unique combination of both linguistic and technical expertise to create a training model, refining and improving the prompts over time.

While it might be tempting to “have a go”, fine-tuning the prompts can be time-consuming and laborious. If you want content to truly match and reflect your brand, it’s well worth calling in the experts.

And that’s really the key takeaway here: as AI continues to advance, it is crucial to strike a balance between automation and human expertise, ensuring that technology serves as an aid to human communication rather than a replacement. It is important not to be scared of this technology, but rather to try and embrace it, understand it, and learn how to use it correctly and ethically.

How can Comtec help?

We offer a whole range of services designed to help brands get the most out of the technology available without compromising on accuracy or relevance.

We pride ourselves on hiring the best of the best – we’ve personally vetted every single linguist who works for us. All are native speakers and most live in the target-language country, meaning they have their finger on the pulse when it comes to trends in advertising, messaging, politics, and pop culture.

Covering more than 250 languages, these are just some of the services we can help with:

AI prompt writing: writing the actual prompts for whichever LLM you use, so that the outputs are as accurate, relevant, and in keeping with your brand tone of voice as possible.
AI prompt training: training your data or marketing team on prompt writing, so that you have an in-house solution you can always use with confidence.
Proof-reading and editing: reviewing your AI generated content, checking for accuracy, style, tone of voice, and cultural fit.
Localisation audits: checking over your current translations for cultural fit and accuracy, and giving you pointers and advice where needed.
Translation management: taking a bird’s eye view of your translation needs and solutions you currently have in place, and making any recommendations to help streamline the process or improve it for you.

You can find out more about what Comtec can offer you, here.