Defining what makes a good story is devilishly tricky. There are so many factors – the pace and style of the narrative, the power of the insights uncovered, how unique the sources are…
The art of picking out the good stories is why editors are paid the aggressively mediocre bucks.
Scaling that process is trickier still. Until now. A French start-up called Deepnews.ai is hoping to solve that problem with the use of artificial intelligence.
Their model aims to assess the quality of any text and score it on a scale of one to five. So far it has proven to be eerily accurate, although its creators are not quite sure why.
“We don’t know exactly where the model is looking because it has created its own mesh of 25 million parameters,” says Frederic Filloux, the co-founder of the company.
Deepnews.ai’s main products are semi-automated newsletters on specific topics ranging from fake news to the future of food and facial recognition. They also have a weekly Digest which sends you the top 25 articles each week.
The Fix spoke to Frederic about the tech behind the product, their newsletter strategy and how AI could be used elsewhere in the media industry.
This interview has been edited and condensed.
ZP: How did you start your project?
FF: In 2017 I was a Journalism Fellow at Stanford. I arrived at the end of 2016, during the election with its wave of fake news. I realized we needed to do something at scale. We decided to find a way to surface and spotlight quality journalism using tools that could work at scale.
After various tries, we settled for an artificial intelligence system. We use a deep neural network, which earlier was used primarily to detect images. We recomposed it to find patterns of quality, feeding the model with a couple of million stories which were deemed as ‘Very good,’ ‘Okay,’ ‘Average,’ ‘Not so good,’ ‘Very Bad’ and ‘Terrible.’
ZP: That sorting was by humans, right?
FF: Yes. We asked the model to find quality in the best stories and so on. We also submitted smaller samples of articles to journalism students to be able to, if you will, recalibrate the model. For instance, we took 10,000 stories scored by the machine and 10,000 stories scored by humans – and then compared them.
After a year of work that included 100 versions of the model and 500 hundred hours of intense computational work on Google Cloud, we were able to come up with a model that was around 85% accurate. By accurate I mean the model could analyse a story in English and say if it was good on a scale from 1 to 5.
We are building a magnet that you put next to a glass of heavily polluted water. This polluted water is a metaphor for the Internet today — it contains lots of bad stuff but also lots of good stuff. Since we can separate the bad from the good, we’re going to look for the good.
ZP: What products do you have?
FF: We launched the first product in June. It’s called Deepnews Digest. You can subscribe for free on www.deepnews.ai/digest. It’s a free weekly newsletter that scans thousands of sources on a subject related to the current news cycle.
First, we crawl the sources from which we are going to collect stories. On the recent topic of the American elections, we collected some 5,000 articles. After that, we submit these articles through the algorithm and in three seconds the model scores them and shows the 25 best stories, which we put in the Deepnews Digest.
Last week we launched our first commercial product called Deepnews Distils in which we cover topics like autonomous vehicles, misinformation, space exploration and so on. We launched it in the form of mostly automated newsletters, that will be available with paid subscriptions
ZP: How does your model define quality journalism?
FF: We don’t exactly know what the model is looking at because it has created its own mesh of 25 million parameters.
ZP: It’s a beast you cannot control?
FF: It’s a black box, but that’s how machine learning works. We suspect the model looks at the structure of sentences, the richness and density of the vocabulary. I think it is ready to detect the source of a story, what is a quote, what a person is talking about, the variety of sources, the variety of topics covered.
ZP: Despite being automatic, do you still post-edit your newsletter before sending it?
FF: That’s a good question. The newsletters we do are mostly automated. It’s around 90% accurate but still, out of 50 stories picked by the algorithm, there could be one that has nothing to do with the topic and can be a mistake.
Should that happen, our journalists will remove it from the list based on human judgment. We still can’t do without human judgement. Maybe in a couple of years, but not right now.
ZP: How do you correct for the mechanism potentially picking articles from different media that are essentially the same and may bore the reader?
FF: That’s tricky. Yeah, it’s very important. We have seen that the model is pretty good at finding original stories. After analyzing millions of articles, the model is able to tell between two stories, some of which are identical. This is our way of finding original content.
ZP: How many people are subscribed to the newsletter?
FF: Right now we have a few thousands, for the Deepnews Digest. The other was just launched last week [the interview was recorded in early February – Editor]. So we maybe a few hundreds.
ZP: Could you tell me more about your new commercial product?
FF: We are going to select a whole bunch of specific topics. These have to be important, fast and forward-looking subjects. We are not interested in the past.
Currently, we have newsletters on autonomous cars, fact-checking, space exploration, facial recognition and the future of food. We are going to launch a lot of other topics.
ZP: Have you considered selling your technology as a service?
FF: Our next step is to launch around 30 newsletters and make some money with them. It’s not going to make us millionaires but we will be able to build a sustainable company, fund our R&D.
With this R&D, we want to build a system, an API, into which any organization producing or curating information will be able to plug its feed of information and we assess its quality.