In April this year, Springer Nature published the first machine-generated scientific book: “Lithium-Ion Batteries – A Machine-Generated Summary of Current Research”, by Beta Writer. According to the preface, the book summarizes more than 150 research articles published from 2016 to 2018 and provides an informative and concise overview of recent research into this field. According to the press release by Springer Nature, the publisher collaborated with researchers from Goethe University Frankfurt/Main to develop a state-of-the-art algorithm, called Beta Writer. The online version of the book is now available on Springer Nature as a free download. This essay aims at reviewing the book and examining some of the questions raised by this approach to publishing.
Despite the seemingly prevalent hype around artificial intelligence (AI), the technology is inexorably progressing towards a future where it underlies every technology with which humans interact.
AI is increasingly finding applications in manufacturing, healthcare, retail, the arts, and countless other sectors. This is partly driven by the information explosion, but also by fierce competition for AI supremacy between the big five (Amazon, Apple, Google, Facebook, and Microsoft).
After the OS wars and the browser wars, the AI wars are upon us. A very real side effect of this virtual warfare is generalized angst about robots taking our jobs, manipulating the news or generally, “taking over”. Some of these fear are well justified, indeed; manufacturing, for instance, has long been on the path to full automatization. Other sectors such as retail and transportation are massively affected and some even believe that the world economy is currently going through a fourth industrial revolution. They may be right.
By now most are quite familiar with this narrative and should not be surprised by anything AI does (or claims to do). Did it not compose music, create a painting, and even imagined the latest sport? Nevertheless, the recent announcement by Springer Nature of a machine-generated book on lithium-ion batteries came to us as a shock. Surely, the work of synthesizing decades of research by large teams of scientists spread across the globe could not be as simple as setting an AI loose on a large pile of articles?
Are we (and our hard-earned Ph.D.s) on the verge of being replaced by a robot powered by the same algorithm that suggests I buy socks after I just bought shoes? The AI system, dubbed Beta Writer, is developed under the direction of Prof. Christian Chiarcos, director of the Applied Computational Linguistics (ACoLi) lab of Goethe University Frankfurt.
The algorithm selected and analyzed various articles and, based on their similarities, clustered them in order to arrange them into coherent chapters and sections. Each chapter has a summary written by the AI algorithm.
As we began to read this book, we almost immediately realized a number of things. First, that its very informative, clear preface is written by a human. The next realization is that this machine-generated book is, in fact, not a book.
To be sure, the summarization algorithms do a good job at capturing the core message of the original work and the composition of Introductions, Conclusions and Related Works sections for each chapter is quite accurate, notwithstanding the occasional odd phrasing—Springer Nature chose not to copyedit the book so AI researchers could see both successes and failures. For instance, the sentence “Without an further mixing process of lithium salts and retains homogeneous cation distribution, the material could be obtained with the coprecipitation of Li+ with transition metal ions” could have used some light editing under normal circumstances. But we must admit, as editors we have read much worse writing by very human scientists.
But the reason this is not a book is that a scholarly text is much more than a collection of facts tied together by grammatically correct sentences. Even in the age of Twitter, our brain still requires a narrative; a story to absorb information effectively.
To quote Jeff Bigham, an associate professor at Carnegie Mellon’s Human-Computer Interaction Institute, “It’s much harder to create something that a human reader finds valuable”. The researchers, of course, never set out to create the next Hemingway in silico. As they repeatedly emphasize, this work is a proof-of-concept, a way to start a conversation. In that, they have succeeded. Although the algorithm used a rather pedestrian language style, Andrew Liszewski points outs that the 180 pages are much easier to digest than the hundreds of thousands of documents spawned from every last bit of research. Fair enough.
Chiarcos and co-workers also envision this as a tool to help tackle information overload, a perennial complaint by researchers. Here, we are considerably more skeptical as information overload is not just an issue of quantity. The main challenge presented by the deluge of published research is to determine quality.
Journals, over the span of their existence, have developed intricate systems of curation and have been the principal means by which we signal quality. It is this quality signal that Beta Writer fails to provide. Its book, despite the prowess displayed by Beta’s human designers, remains a flat collection of summarized information, with no indication to which item is more trustworthy than the rest.
The authors did stress that all papers fed into the summarizer were peer-reviewed. However, peer-review is never carried out in isolation, but within the context of a specific journal. Reviewers take into account the perceived quality of the journal when providing a report. This crucial bit of information, this other dimension of curation, is entirely missing from this work as far as we can see.
In any case, a conversation has indeed started. How it will develop and down which rabbit hole this particular niche of artificial intelligence will drag publishing remains to be seen. And while no one seriously believes that the next Great American Novel will be written by a few hundred grams of silicon, something is coming.
We must think carefully about how to use such tools and tread lightly as we embed them into our workflows. Information processing tools are easily abused and can become misinformation tools and we embed them within the established workflows of science communication at our own peril.
Written by Hakim Meskine and Babak Mostaghaci