The Wrong and Right way to use ChatGPT in Data Analytics

We delve into the reasons why these AI-powered tools are competing with human analysts, focusing on aspects such as scalability, cost-efficiency, and consistent performance.

The Wrong and Right way to use ChatGPT in Data Analytics

Section 1: The Rise of Large Language Models in Data Analysis

In this section, we explore the revolution brought about by Large Language Models (LLMs) like ChatGPT in the field of data analysis. We delve into the reasons why these AI-powered tools are competing with human analysts, focusing on aspects such as scalability, cost-efficiency, and consistent performance.

I. Introduction to Large Language Models (LLMs)

Large Language Models (LLMs), like GPT-3 and its successor GPT-4 (commonly known as ChatGPT), have dramatically transformed our approach to text-based tasks in the last six months. But before we start talking more about details, let’s first try to understand what LLMs are in simple terms.

Large language models, such as GPT (Generative Pre-trained Transformer), are like super-brainy computer programs that are really good at understanding and generating text in human-like ways. They get good at this by reading and learning from huge amounts of text data - imagine reading billions of books, articles, and web pages!

They can do a lot of things with language, like translating from one language to another, summarizing big pieces of text, and having conversations like a human. They're so good because they spend a lot of time learning from the text first, and then they get specialized training for specific tasks.

Let's think about these language programs like they're solving a word puzzle. Their job is to guess the next word based on the words they've already seen. The program picks the word that it thinks is most likely to come next. It keeps on doing this until it has a full sentence or paragraph, take a look at the graph below to better get how those models work.

Now, let's talk about how these language programs are built. They use a structure called a Transformer. This structure was created in 2017 and is the main part of all state-of-the-art language models. The Transformer has seven important parts. Let's break them down:

  1. Inputs: This is what you or another user gives to the computer. But since computers can't understand words, they turn these words into numbers, sort of like a code.
  2. Positional Encoding: This is how the computer knows the order of words in a sentence. Think of it like remembering where each word should go in a sentence.
  3. Encoder: This part takes the words (now turned into numbers) and tries to understand their meanings and the overall context. It's a bit like solving a complex puzzle and seeing the bigger picture.
  4. Outputs: This is how the computer guesses the next word. It's like playing a game where you have to predict the next word based on the previous words. To help it do this, it's trained on tons of text data from the internet, books, and Wikipedia.
  5. Output Embeddings: This is similar to the inputs but it's for the output text. The computer changes its text into numbers (again!), which it can understand better. It also uses a way to measure how well it's doing (called a loss function), which helps it get better and better at generating text.
  6. Decoder: This part takes the input and the output and generates the final text. It's like the part of the program that puts all the puzzle pieces together to form the final picture.
  7. Linear Layer and Softmax: These parts transform the output into the format the computer can use and also give probabilities to each word, which helps the program decide the most likely next word. It's like having a lot of options and picking the one with the highest chance of being correct.

So, to sum it up, these language models like GPT read and learn from tons of text, use that knowledge to predict the next word in a sentence, and then use a fancy structure called a Transformer to generate human-like text. And they do this all by turning words into numbers and back again.

More powerful models like ChtaGPT can do way more than that. They have been additionally trained by humans (human feedback was used) in order to support dialogs, and not simply predict the next word but rather try to make the entire answer sound like an understandable instruction (this approach is called InstructGPT). A classic example is the dialogue you might have with ChatGPT, where the model responds intelligently, factoring in context and complexity. For instance, if you ask it, "What is the weather like today?" it would likely respond, "As an AI model, I don't have real-time access to current data such as weather. You might want to check a reliable weather forecasting site." This exemplifies not only the model's understanding of language but also its ability to understand limitations and guide users appropriately.

II. The Revolution in Data Analysis

Data analysis usually is like doing a giant jigsaw puzzle. It needs a lot of patience, time, and effort to clean, sort out, and make sense of the data. But now, LLMs are changing this game. Instead of a human having to look at thousands of entries in a spreadsheet, an LLM can check and sum up huge sets of data with just a little help from a human. This means we can get our results quicker, and the computer might even notice things that a human might miss.

Why LLMs are Super Handy

LLMs bring a lot of cool things to the table, making them a good choice when compared to human data checkers.

A. They Can Handle A LOT of Data!

One of the biggest things about LLMs is that they can handle so much data. They can go through and make sense of tons of data that would take a human forever to understand. Let's think about Twitter for a second. Twitter has hundreds of millions of tweets every day. It would take a group of humans many lifetimes to read and understand all these tweets. But an LLM can go through all this in a few hours. The amount of data an LLM can handle is just way beyond what a human can do.

B. They're Wallet-Friendly

LLMs are also lighter for your pocket book compared to humans. Once trained, an LLM can do its job at a fraction of the cost of a human. For example, if you think about how much you'd pay a team of people to work for a year, it could be hundreds of thousands of dollars. But once an LLM like ChatGPT is trained, it could do the same job at a much lower cost, even when you count the costs of running and maintaining it. This could save you a lot of money over time, making LLMs an attractive choice.

C. They're Consistent

LLMs are always on the ball. They don't get tired or emotional and they can work all the time. For example, imagine you had to keep track of what people are saying about a big brand on social media. This is a 24/7 job. Humans would need to take turns, and they might not always be consistent because they could be tired, biased, or just make a mistake. An LLM can do this job non-stop, without breaks, and always be on the mark. In one study, an LLM tool that was checking feelings in social media posts was right 90% of the time, even when it checked millions of posts over several weeks.

D. Challenges

But of course, there are still lots of open questions and the biggest worry is about the difference between what the computer understands and what a human understands. This raises questions about whether the insights given by the AI are reliable or not. Also, if the AI and human interpretations don't overlap much, does it mean we're losing the human touch in data analysis?

There is active research going on to uncover those topics, here is one example: https://arxiv.org/pdf/2306.13298.pdf  In this study, researchers wanted to see who's better at understanding: humans or LLMs. They used a small sample of reviews for the Alexa app and had both a human and the AI models ChatGPT 3.5 and GPT-4 classify and explain these reviews.

The study showed that the human and ChatGPT 3.5 had similar classifications in about one-third of cases. The human and GPT-4 were slightly less aligned. The two AI models agreed more than half of the time. However, there was only agreement across all three (the human and both AI models) in about one-fifth of cases.

In comparing how humans and the AIs reasoned, they saw that humans relied heavily on their personal experiences. The AIs, as expected, based their reasoning on the specific words in the reviews and the functional parts of the app.

What does it mean?

The results suggest that humans and AIs can work together effectively, each bringing their strengths to the table, rather than competing against each other. But researchers need to keep an eye on how they use these AI tools in their work to create a future where AI and humans both help enrich research.

These examples show why LLMs are great for understanding data. While they have their own challenges, their ability to handle large amounts of data, cost-effectiveness, and consistency make them super useful in our world that runs on data. We'll now take a deeper look at the challenges of using these powerful tools and talk about how to manage these effectively.

Section 2: The Roadblocks: Challenges in Using LLMs

Here, we talk about the biggest hurdles we face when using AI tools for examining data.

I. Making Sense of 'Hallucinations'

In the world of AI and these advanced tools, 'hallucinations' mean situations where the AI creates conclusions or interpretations that aren't actually in the original data. This is like the AI seeing things that aren't really there. This can happen a lot when the AI is looking at huge amounts of data and might misunderstand what it's examining, causing it to create "imaginary" facts or insights.

Let's picture an example. Imagine we're using an AI tool to look at a collection of customer reviews for a shop. The reviews have lots of different information, like what products were bought, where the customer lives, the rating they gave, and their feedback.

In this situation, a 'hallucination' might happen if, for example, the AI mistakenly says that all customers from a certain place (let's say New York) are really unhappy with a certain product. The AI might say something like, "Customers in New York have constantly given Product X a rating of 1 out of 5, showing they're really unhappy."

But when you look closer at the data, you might find that this isn't true. Maybe customers in New York have given a range of ratings for Product X, and overall, their feelings about the product are mixed and not as negative as the AI said.

This is what we call a 'hallucination' - the AI has created a fact (that all New York customers are unhappy with Product X) that isn't really in the data. It might have done this because it's oversimplified or misunderstood the data.

These 'hallucinations' can be a big problem when using AI tools to look at data, as they can lead to misleading conclusions and wrong decisions. It's really important to understand this risk so we can make the most of these AI tools while being aware of their limitations

Why does LLM hallucinate?

This one million dollars question is still an open topic for many researchers. Here is a good latest research on that: https://arxiv.org/pdf/2305.14552.pdf

Researchers experimented with these LLM AI models to see how they make decisions and found two main reasons why they sometimes make mistakes:

  1. They often say something is true if they've seen it before in the data they were trained on. For example, if an AI was trained with text that said "Birds can fly," and you asked it whether birds could fly, it would likely say yes, even if your question contained other information suggesting that might not be the case (e.g. if the bird was a penguin).
  2. If they can't find relevant information in their training data, they rely on how often certain words appear in their training data to make decisions. For instance, if the term "birds can fly" appears more frequently than "penguins can't fly," the AI might still wrongly say that penguins can fly.

But does LLM knows that it lies?

You might think that to avoid this, the AI should know whether what it's saying is true or false. In fact,  it does. For example, if the AI says "The sun orbits the Earth," it's more likely to correct itself later, but if it says a true statement like "The Earth orbits the sun," it might move on to talk about other planets. This suggests that AI has some understanding of truth and falsehood.

But just because the AI knows something is false doesn't mean it won't say it. There is another interesting research https://arxiv.org/pdf/2304.13734.pdf that identified three reasons why this might happen:

First, when the AI generates a sentence, it does it one word at a time and commits to each word. For example, if it starts a sentence with "Pluto is the smallest," it's hard to finish that sentence correctly. Instead, it might say something like "Pluto is the smallest dwarf planet in our solar system," which isn't true – Pluto is actually the second largest dwarf planet.

Second, sometimes there are many incorrect ways to finish a sentence and only a few correct ways. In those cases, the AI might choose an incorrect finish because it sounds more likely.

Lastly, the AI doesn't always choose the most likely next word, but sometimes randomly samples from different options. This can lead to incorrect statements as well.

II. Lack of Understanding of the Full Picture

LLMs like ChatGPT are really good at picking up patterns and creating human-like text based on those patterns. But what they're not good at is truly understanding the whole picture like a human can. They can only work with the information they've been trained on and don't know about anything beyond that.

Think of these language models like parrots. Parrots can copy human speech really well, but they don't truly get what they're saying. In the same way, an LLM can create text based on patterns it's seen, but it doesn't deeply understand what that text means.

Let's say you're using an LLM to go through social media posts and figure out what people think about a recent popular movie. Now imagine this movie had a surprise ending that a lot of people were talking and making jokes about online.

The LLM might notice words and phrases related to the movie, but it wouldn't get the surprise ending or why it's important because it can't watch and understand movies. It doesn't have real-world experiences or knowledge about the bigger picture. So, its analysis might overlook what the audience is really feeling and just focus on how often certain words or phrases are used in the posts, without understanding what they mean in this context.

In this case, the LLM might come to an incorrect or unrelated conclusion because it can't understand the importance of the surprise ending in the real world, even though this is key to figuring out how people feel about the movie.

This inability to understand the bigger picture is a big hurdle when using LLMs to analyze data. So, it's important to know about these limitations and realize that human supervision might be needed to get an accurate analysis.

However recent studies show that it’s possible to teach LLMs on complex data like images and video, this is called Multimodal (Spoiler: GPT-4 can do this!)

For example in that study: https://arxiv.org/pdf/2301.13823.pdf, researchers found a way to get these text-focused LLMs to understand both images and words. The discovered approach allows them to chat about images, generate text based on images, and even find specific images based on the conversation. To illustrate, imagine having a chat with the computer about different types of birds that a particular feeder can attract. The computer could then generate text about these birds and even show pictures of them - all in the same conversation!

III. Difficulty in Handling ambiguous queries

One of the challenges that LLMs often face is dealing with ambiguous queries. When we say "ambiguous queries", we mean questions or commands that are unclear, open to multiple interpretations, or lack the necessary detail for a specific response.

Let's bring this to life with a simple example. Imagine you're running a café and using an LLM to analyze your sales data. One day, you ask the model, "What's my best-selling product?" On the surface, this seems like a straightforward question. However, from an analytical perspective, it's actually quite ambiguous.

There are several ways the LLM could interpret "best-selling". Do you mean the product that sells the most units? The product that brings in the most revenue? The product with the highest profit margin? The most popular item per customer visit? Or perhaps the item that is most often sold with other products? Without explicit clarification, the LLM has to guess what you mean by "best selling".

Based on its programming and training, the LLM might default to interpreting "best-selling" as the product with the highest unit sales. So, it gives you the answer: "Coffee is your best-selling product."

But what if you were really interested in knowing the product that brings in the most revenue? In this case, due to the higher price point, it might be a specific lunch combo. The LLM's response, while technically correct based on its interpretation, doesn't actually answer your intended question. This example highlights how ambiguous queries can lead to unhelpful or misleading analysis.

This challenge underlines the importance of being specific and clear when interacting with LLMs, especially during data analysis. It's not enough to ask the right questions – you must also ask the questions right. This might feel like a slight inconvenience, but with practice, crafting unambiguous queries becomes second nature and significantly improves the utility of LLMs in data analysis.

To handle such problems you may need a smart system around the LLM that can ask for additional clarifications if needed.

IV. Limited Real-Time Data Access

One important limitation to consider when using Large Language Models like ChatGPT in data analysis is their inability to access and analyze real-time data.

But what does this mean exactly?

Imagine that you're watching a live football game, and you want to comment on the action as it happens. You would need to see the game in real-time to do that effectively. In the same way, some data analysis tasks require "watching the game" in real-time - that is, they need to have access to the most current data, as it is being generated, to provide the most accurate and useful insights.

However, LLMs like ChatGPT do not have this "real-time access". When these models are trained, they learn from a static snapshot of data. This data, no matter how vast and varied it might be, represents the past and does not include information that is generated after the model's training. Therefore, the model doesn't "know" anything about events or data that have occurred since its last training update.

Let's consider an example for clarity. Suppose you're using an LLM to analyze social media sentiments about a newly launched product. The opinions and sentiments on social media can change rapidly based on recent events, new information, or emerging trends. However, an LLM trained on data up until, say, January 2023, won't have information or knowledge about any posts or sentiments expressed after that date. This lack of real-time data access limits the LLM's ability to provide up-to-the-minute analysis of the sentiments towards your product.

To overcome this limitation, data analysis strategies using LLMs often need to involve periodic model updates or retraining with the most recent data. However, this is a complex process and can involve significant computational resources.

Thus, while LLMs offer powerful data analysis capabilities, their limited access to real-time data is a key consideration when determining the most appropriate use cases and strategies for their deployment.

OpenAI (ChatGPT vendor) recently added plugins to mitigate that issue, but plugins are still not a cure when you need to access your internal data or read BI reports.

V. Lack of Common Sense Reasoning

When it comes to analyzing data, humans have the distinct advantage of possessing common sense reasoning. This ability allows us to infer things that are not explicitly stated, understand abstract concepts, and make judgments based on our understanding of the world. It's this common sense that often guides us when we're interpreting data or making decisions based on that data.

Large Language Models (LLMs), on the other hand, lack this innate ability for common sense reasoning. They don't understand the world in the way humans do. Instead, LLMs generate responses based on patterns they have learned from vast amounts of data. However, without an underlying comprehension of the world, these models can sometimes produce outputs that, to a human, seem nonsensical or incorrect.

For example, consider a scenario where an LLM is tasked with analyzing weather data for a certain region. If you were to ask the LLM, "Has it ever rained while the temperature was above the boiling point of water?", a human with common sense reasoning would instantly know that this is highly unlikely, as water evaporates at such high temperatures. However, an LLM might simply respond that it doesn't have enough information to answer, or it may try to find similar instances in the data it was trained on, potentially leading to incorrect or misleading responses.

This example highlights the challenge posed by the lack of common sense reasoning in LLMs. It underscores the importance of understanding this limitation when interpreting outputs from these models, particularly when using them for data analysis. This awareness allows us to critically evaluate the output of an LLM and apply our own common sense reasoning to arrive at accurate conclusions.

One way to enhance the reasoning ability of LLMs is through a method called "few-shot learning," which involves showing the model a few examples of a task and letting it learn from those. When the examples include the steps for reaching a conclusion, the LLM's reasoning ability improves even more.

But how well LLMs can handle tasks that require common sense reasoning is still up for debate, with some studies like that one: https://arxiv.org/pdf/2304.11490.pdf supporting their capability and others questioning it. This study tried to address some of the limitations of previous evaluations by not restricting the LLM's answers to single-word or multiple-choice completions and providing examples with step-by-step reasoning toward an answer.

So, the conclusion is that LLMs can do better at common sense reasoning with the right prompting. This is a big deal because this doesn't need additional training or large new datasets, making it a flexible approach. And if the prompts help LLMs give better responses, this can improve their overall reasoning in a wide range of everyday tasks.

VII. Conclusion

As we navigate the fascinating world of Large Language Models in data analysis, it is crucial to acknowledge the potential roadblocks along the way. Despite the many strengths of LLMs, they are not infallible. Like all tools, they come with their own set of challenges.

The concept of 'hallucinations', where the LLM generates interpretations or facts that don't exist in the data, serves as a prominent example of these challenges. Other obstacles include the LLMs' difficulty in handling ambiguous queries, their lack of real-world contextual understanding, limited real-time data access, and the absence of common sense reasoning.

Understanding these limitations is not meant to dissuade us from using LLMs; quite the opposite. Acknowledging and understanding these challenges is the first step toward using these tools more effectively. Only by recognizing their limitations can we find ways to navigate around them and still extract value.

In the next sections, we will explore strategies to tackle these challenges. We will delve into practical approaches and techniques to ensure that we reap the benefits of LLMs while mitigating the potential risks. So, stay tuned as we continue this journey toward making data analysis more accessible, efficient, and insightful with the help of Large Language Models.