Subscribe to our LinkedIn so you don't miss important media news and analysis
Generative artificial intelligences (AIs) promise to become important tools for journalists and publishers. These technologies shine particularly in analysing data, evaluating survey responses, and dissecting comments and sentiments on various topics.
Yet, while generative AIs promise effectiveness, we must sidestep the allure of the “wow-effect” trap.
One of the most striking wow-effects of new AI models is, with no doubt, their capability to perform data analysis. When we began using the newly launched GPT-4o by OpenAI after May 13, we saw video recordings of the new model answering to very simple prompts that simply made us say, “wow”. These prompts usually go like this “Conduct an in-depth analysis of this data, identify trends, perform high-level statistical analysis, create visualisations.” Then someone attaches a CSV file – since GPT-4o accepts multimodal commands, integrating files and text, like its predecessor – and presses enter.
GPT-4o then performs what appears to be a kind of “miracle”. Indeed, the speed and the results are astounding: the AI analyses the file, extracts important information, identifies trends, searches for and verifies correlations. However, as with all results obtained swiftly, a quick analysis reveals that things are not always as easy as they seem.
Generative AIs like ChatGPT, Gemini, Mistral, and Claude can process large volumes of text data quickly and efficiently. They can identify patterns, extract key themes, and assess the sentiments expressed in written responses. This capability is definitely useful in analysing massive files or handling surveys and open answers or comments, where the sheer volume of data can be overwhelming. But even these tools need preprocessing.
Preprocessing involves cleaning the data, such as removing duplicates, correcting typos, and standardising the text format.
Although these machines can perform some of these tasks with a simple command, it’s better to work step-by-step. This is a good strategy with humans and becomes one with generative AIs.
First, anonymise the file you want to analyse. For example, when analysing survey responses from the AI event for journalists I organised in Milan, I first downloaded the CSV file from the Google Form hosting the survey. Then I anonymised the file, manually removing names, emails left with consent, and any details from open answers that could lead to personal identification. This task cannot be delegated to an online-connected AI.
Before uploading the file, it’s advisable to create a complex prompt explaining the file.
This is a case-study I personally run with ChatGPT-4o. The file I wanted to analyse is a csv containing a survey that follows an event about AIs (see the full video), to understand the impact and the evaluation of the event by the audience.
A good prompt to do this job is structured as follows:
To start the prompt explain the context of the file you’re about to upload. Then briefly describe each column of the file.
Explain to the machine, as you would to a human that has never seen the file, what to expect in the various columns.
When there are multiple-choice answers, specify that the person who answered was allowed to select all that apply. This is particularly important when there are many choices and when punctuation used in the responses could occasionally be interpreted by the AI as a CSV separator.
When there are open answers, clarify this well, and remind the machine that it might also find irrelevant responses.
Finally, provide additional instructions to the AI. Explain your goal, ask to ensure the data is formatted correctly, request the AI to clean the data to correct any errors and standardise the response formats, give a first view of the output you expect and, for extra safety, ask for data anonymisation in case you missed something.
Now, load the file, associate the prompt, and press enter.
With this elaborated prompt we expect an elaborated output. And in fact this is the case.
Note that the AI recognises that the survey I uploaded is in Italian and recognises the various sections, meaning the first part was successful.
It then explains the next steps.
Without finding missing data in the cells, ChatGPT can proceed. If it had found any, it would have handled them according to the provided instructions or asked for more information. We have to always read the output carefully.
Next, a standardisation process of the responses begins. In this case, the task is easy, but again, remember that it wouldn’t be if the file were much more complex. ChatGPT begins its work, anticipating what it will do. Everything we are seeing here as output is produced without further human intervention.
First, it provides a statistical summary, starting arbitrarily with the evaluation of the event. Here, it’s crucial to verify data consistency, for example, the total votes.
ChatGPT then proposes visualisations, like the one you see here.
It creates some, and you can ask for more or for different forms, for example to change the bar graph in a “pie chart” or to generate a “pie chart” from other data.
Now, performing sentiment analysis, ChatGPT analyses open responses and identifies so-called “stop-words”, which are not significant for understanding the text in Italian. This is not an easy task and we will return to this in the second part of this guide, as sentiment analysis requires specific work to be effective
So, I ask the AI to avoid G and H columns asking for more analysis and ideas: it’s always a good thing to ask these chatbots for other ideas.
For example, as suggested you can inquire if there are correlations (in this case, there are none) between different magnitudes, ask if they are strong or weak correlations, and propose other questions.
For instance, in this case, we can ask whether people who attended the event in person enjoyed it more or less than those who participated remotely.
ChatGPT calculates the average ratings by dividing the two groups, then evaluates the distribution of votes and finally briefly answers the posted question.
So, it’s time to see, in general, how to create effective prompts is crucial for maximising the potential of generative AIs in data analysis and survey evaluation. A well-structured prompt combined with the method I’ve shown since here can significantly enhance the quality and relevance of the AI’s output. Here are some tips and best practices for creating effective prompts:
1. Be clear and specific:
2. Avoid ambiguity:
3. Break down complex tasks:
4. Provide examples:
5. Specify output format:
6. Set boundaries:
Be sure not to make these mistakes:
Vagueness: vague instructions can lead to irrelevant or incomplete results.
Overloading the prompt: avoid including too many instructions at once. This can overwhelm the AI and result in a less focused analysis.
Ignoring data quality: ensure that the data is clean and well-prepared before submitting it for analysis. Poor data quality can skew the results. Ask the AI to help you in cleaning the data if you don’t know how to do that task.
Lack of context: Failing to provide sufficient context can lead to misinterpretation. Always include background information and explain the purpose of the analysis.
In short, once we have built the premises for working with the chosen artificial intelligence, human imagination can guide the work. You might even conclude by asking to write a draft of an executive summary or an abstract that summarises in discursive form everything we have seen.
The more complex the file, the more useful this type of setup is to avoid problems.
Source of the cover photo: Mika Baumeister via Unsplash
Everything you need to know about European media market every week in your inbox
Alberto Puliafito is an Italian journalist, director and media analyst, Slow News’ editor-in-chief. He also works as digital transformation and monetisation consultant with Supercerchio, an independent studio.
We are using cookies to give you the best experience on our website.
You can find out more about which cookies we are using or switch them off in settings.