Separating hype from reality in the world of AI

In the world of hype surrounding the term ‘Artificial intelligence’ (AI), it’s natural to ask how you can actually define what AI is (and isn’t).

With many tech providers offering genuinely innovative and valuable solutions under the banner of AI, whilst others use that same banner to sell what is essentially technological ‘snake oil’, it can be challenging to cut through the noise and spot the difference.

The first thing to say is that AI, as a term, is not new.

Arguments still rage about exactly what that term describes, but taking its broadest possible description (i.e. ‘any algorithm that mimics human-like behaviour’), it’s clear to see that there are examples that fit into this category going back at least 50 years. From the Ghosts who chase Pac-Man (and each have their own unique ways of chasing the player) to the Deep Blue computer that beat Gary Kasparov at chess, from Spotify’s music recommendation algorithms and all the way through to facial recognition for your iPhone and ChatGPT, human-like computer programmes have permeated our lives for a generation or more.

Why is this important?

Because although the Ghosts from Pac-Man are cool and, crucially, are technically AI, they’re no longer the cutting edge of technology. But any naïve or insidious marketeer who wants to repackage Pac-Man and sell it for the latest platforms could quite legitimately say that their new version ‘leverages AI technology to unlock the game’s truepotential’. Sound familiar?

A more relevant example would be to take one of the algorithms we use to assist in our quantitative analysis, like a clustering algorithm for a segmentation. A clustering algorithm is part of a group of algorithms called machine learning algorithms, which all fit under this broad AI definition. Therefore, you could legitimately market your segmentations as ‘AI-enhanced’, which, while technically true, could be interpreted as something completely different.

So how can you look past this marketing ploy to separate the digital wheat from the chaff?

At Branding Science, we’ve developed a series of questions to ask when onboarding AI tools – both those we’ve built ourselves and those from our AI suppliers – that we’d like to share with you:

WHERE, AND FOR HOW LONG, IS THE DATA STORED?

There are a variety of standards that different businesses adhere to, based on local regulations, non-mandatory data adherence frameworks and internal compliance regulations. Understanding your company’s rules about data storage and ensuring that your AI providers also store data in accordance with these rules is crucial (for example, ensuring that a provider has EU servers to handle personal data created in the EU).

WHAT IS THE PROCESS FOR HANDLING ADVERSE EVENTS (AEs)?

Adverse event handling is something that we ensure that everyone who interacts with AI is trained in at Branding Science. However, this may not be the case for many AI suppliers, so establishing if your providers have an appropriately robust AE handling process – and if not, ensuring that anyone who interacts with AI outputs is appropriately trained – is vital. This includes AEs contained in the training data, as may well be the case for models trained on social media data such as ChatGPT.

HOW ARE BIASES MITIGATED IN THE MODEL OUTPUTS?

Understanding where there are biases in the outputs of your AI tools is very important, but also extremely challenging. Only those who have had access to the training data set and have done benchmarking on the outputs will be able to identify what biases exist in the model, which is why it’s vital to understand if any of this benchmarking has happened, and, if it has revealed biases, how those are mitigated.

HOW IS THE RISK OF HALLUCINATIONS MINIMISED?

Hallucinations are a huge problem for many large language models (LLMs) such as ChatGPT, defined as misleading or otherwise made-up/false responses to a question that the model will report with absolute confidence. It is possible to put guard rails in place, such as getting an LLM to cite its sources, but even these are not completely foolproof, so it’s critical to understand what safeguards are in place.

WHAT IS THE TRAINING DATA SET FOR THE MODEL?

As the old computer science saying goes: “garbage in, garbage out”. In other words, any AI tool will only be as good as the data you train it on, so understanding what the training data represents (and therefore what the model’s outputs should represent) is the single most important thing you should try to understand about any AI product you’re working with.

The AI frontier is a brave new world and technological advancements in the field are moving far too quickly for regulators to keep up. This means that, in order to prevent yourself from headaches now (from ‘snake oil’ sellers) and down the line (from future regulatory decisions), you must develop a widespread understanding of AI terminology and robust guidelines for safe and effective use of AI within your business. Anyone who does not act now will find themselves looking back in a few years’ time, wondering where they got left behind.

Written 100% by humans, with no AI input

This article was written by:

Gabe Musker

Data Scientist

[email protected]

SEPARATING HYPE FROM REALITY IN THE WORLD OF AI