After releasing what could well have been the most comprehensive report on state AI in 2019, Air Street Capital and RAAIS founder Nathan Benaich and AI angel investor and UCL IIPP visiting professor Ian Hogarth is back for more.
IN Status of AI Report 2020, Benaich and Hogarth surpassed themselves. While the structure and themes of the report remain largely intact, its size has grown by almost 30%. This is a lot, especially considering their 2019 AI Report was already a 136 slide long journey on all things AI.
State of AI Report 2020 is 177 slides long and it covers technological breakthroughs and their capabilities, supply, demand and concentration of talent working in the field, major platforms, funding and applications for AI-driven innovation today and tomorrow, special section on AI policy and predictions for AI.
ZDNet caught up with Benaich and Hogarth to discuss their findings.
AI democratization and industrialization: Open source and MLOps
We went out by discussing the rationale for such a large contribution, which Benaich and Hogarth admitted to having spent much of their time. They mentioned that their feeling is that their combined industry, research, investment and political background and currently occupied positions give them a unique vantage point. Producing this report is their way of connecting the dots and giving something of value back to the AI ecosystem as a whole.
Random Gartner’s 2020 Hype Cycle for AI was also released a few days back. Gartner identifies what it calls 2 megatrends that will dominate the AI landscape by 2020: Democratization and industrialization. Some of Benaich and Hogarth’s findings were about the enormous cost of training AI models and the limited availability of research. This seems to contradict Gartner’s position or at least suggest a different definition of democratization.
Benaich noted that there are different ways of looking at democratization. One of them is the extent to which AI research is open and reproducible. As the pigeon’s findings show, it is not: only 15% of AI research articles publish their code, and that has not changed much since 2016.
Hogarth added that traditional AI as an academic field has had an open ethos, but the ongoing industry entry is changing that. Companies are recruiting more and more researchers (another theme that the report covers), and there is a clash of cultures going on as companies want to preserve their IP. Notable organizations criticized for not publishing code include OpenAI and Deepmind:
“There’s only so close that you can get without some kind of major setback. But at the same time, I think data clearly shows that they definitely find ways to be close when it’s convenient,” Hogarth said.
As far as industrialization goes, Benaich and Hogarth pointed to their findings with regard to MLOps. MLOps, short for machine learning operations, is equivalent to DevOps for ML models: Taking them from development to production and managing their life cycle in terms of improvements, fixes, relocations, etc.
Some of the most popular and fastest growing Github projects in 2020 are related to MLOps, the duo pointed out. Hogarth also added that for starters, for example, it is probably easier to get started with AI today than it was a few years ago in terms of tool availability and infrastructure maturity. But there is a difference when it comes to training models like GPT3:
“If you wanted to start some kind of AGI research business today, the bar is probably higher in terms of computational requirements. Especially if you believe in the scale hypothesis, the idea of taking approaches like GPT3 and continuing to scale them up. It’s getting more and more expensive and less and less accessible to new entrants without large amounts of capital.
The second thing that organizations with very large amounts of capital can do is run many experiments and repeat in large experiments without having to worry too much about the cost of training. So there is a degree where you can be more experimental with these great models if you have more capital.
Clearly, it’s a bit of a distraction to these almost brutal approaches of power to just apply more scale, capital and data to the problem. But I think if you buy the scaling hypothesis, then it is a fruitful area of progress that should not be dismissed just because it does not have deep intellectual insight into the heart of it. “
How to compete in AI
This is another key finding in the report: large models, large companies and massive training costs dominate the hottest area of AI today: NLP (Natural Language Processing). Based on variables released by Google a. al., research has estimated the cost of training NLP models at around $ 1 per. 1000 parameters.
This means that a model like OpenAI’s GPT3, hailed as the latest and greatest achievement in AI, could have cost tens of thousands of millions to train. Experts suggest the likely budget was $ 10 million. This clearly shows that not everyone can strive to produce something like GPT3. The question is: is there another way? Benaich and Hogarth think so and have an example to show.
PolyAI is a London-based company active in voice assistants. They produced and open source a conversational AI model (technically a pre-trained contextual ranger based on transformers) that surpasses the best Google’s BERT model in conversational applications. PolyAI’s model not only works much better than Google’s, but it required a fraction of the parameters to train, which also means a fraction of the price.
The obvious question is: How did PolyAI do it? This can also be an inspiration to others. Benaich noted that the task of discovering intent and understanding what someone on the phone is trying to accomplish by calling is solved in a much better way by treating this problem as what is called a contextual ranking problem:
That is, given a kind of menu of potential options that a caller might perform based on our understanding of this domain, we can design a more appropriate model that can better teach the customer the intent of data than just trying to take a general purpose model – in this case BERT.
BERT can do it OK in various conversation programs, but just does not have the kind of technical protection lists or technical nuances that can make it robust in a real world domain. In order to make models work in production, you actually need to do more engineering than you need to research. And almost by definition, engineering is not interesting to most researchers. “
Long story short: You know your domain better than anyone else. If you can document and make use of this knowledgeand have the necessary technical rigor, you can do more with less. This pointed once again to the topic of the use of domain knowledge in AI. This is what critics of the brute force approach, also known as the “scaling hypothesis”, point to.
What the proponents of the scaling hypothesis seem to think, simply put, is that intelligence is a prominent phenomenon in relation to scale. Therefore, if at some point models like GPT3 get big enough, complex enough, the holy grail of AI and maybe science and technology in general, artificial general intelligence (AGI), can be obtained.
On the way to general AI?
How to make progress in AI, and the topic of AGI, is at least as much about philosophy as it is about science and technology. Benaich and Hogarth approach it in a holistic way, prompted by criticism of models like GPT3. The most prominent critic of approaches like GPT3 is Gary Marcus. Marcus has been consistent in his critique of models that precede GPT3, as the “brute force” approach does not appear to change on any scale.
Benaich referred to Marcus’ critique and summarized it. GPT3 is an amazing language model that can take a prompt and output a sequence of text that is readable and understandable and in many cases relevant to what the prompt became. What’s more, we must add, GPT3 can even be used on other domains, such as writing software code, which is a topic in itself.
However, there are many examples where GPT3 naturally is, either in a way that expresses bias or it just produces irrelevant results. An interesting point is how we can measure the performance of models like GPT3. Benaich and Hogarth note in their report that existing benchmarks for NLP, such as GLUE and Super glue is now followed by language models.
These benchmarks are intended to compare the performance of AI language models against humans in a range of tasks that span logic, common sense understanding, and lexical semantics. A year ago, the human baseline in GLUE was beaten by one point. Today, GLUE is reliably beaten, and its more challenging sibling SuperGLUE is also almost beat.
This can be interpreted in a number of ways. One way would be to say that AI language models are just as good as humans now. However, the kind of shortcomings that Marcus points out show that this is not the case. Maybe that’s what this means, that we need a new benchmark. Researchers from Berkeley have published a new benchmark, which attempts to capture some of these issues across different tasks.
Benaich noted that an interesting extension of what GPT3 could do relates to the discussion around PolyAI. It is the aspect of injecting a kind of switch into the model that makes it possible to have some protection strips, or at least set what kind of output it can create from a given input. There are various ways you could possibly do this, he continued to add.
Earlier, use of knowledge bases and knowledge graphs was discussed. Benaich also mentioned a kind of learned intention variable that could be used to inject this kind of control over this more general = purpose generator. Benaich believes that the critical view is certainly valid to some extent and points to what models like GPT3 could use, with the aim of making them useful in production environments.
Causality, the next frontier in AI
Hogarth, for his part, noted that Marcus is “almost a professional critic of organizations like DeepMind and OpenAI.” While it is very healthy to have these critical perspectives when there is a ruthless hype cycle around some of this work, he went on to add, OpenAI has one of the more thoughtful approaches to policy around this.
Hogarth emphasized the underlying difference in philosophy between proponents and critics of the scaling hypothesis. However, he went on to add, if the critics are wrong, then we may have a very smart but not very well-adjusted AGI on our hands, as evidenced by some of these early cases of bias when scaling these models:
“So I think it’s up to organizations like OpenAI if they want to follow this approach to tell us all how to do it safely, because it’s not yet clear from their research agenda. How to marry AI security with this “Kind of this kind of throws more data and calculate for the problem and AGI comes up.”
This discussion touched on another part of the State of AI Report 2020. Some researchers, Benaich and Hogarth, noted that progress in mature areas of machine learning is stagnant. Others call for promoting causal justification, arguing that adding this element to machine learning methods could overcome barriers.
Hogarth said causality is without a doubt the core of much of human progress. From an epistemological perspective, causal reasoning has given us the scientific method, and that is the core of all our best world models. So the work that people like Judea Pearl has been pioneering in bringing causality to machine learning is exciting. It feels like the biggest potential disruption of the general trend for larger and larger correlation-driven models:
“I think if you can crack causality, you can start building a pretty powerful scaffolding of knowledge after knowledge and get machines to really contribute to our own knowledge bases and scientific processes. So I think it’s very exciting. There is one reason why some of the smartest people in machine learning spend weekends and evenings working on it.
But I think it is still in its infancy as an area of attention for the commercial community. We really only found one or two examples of it being used in nature, one of the faculties of a London-based machine learning company and one of Babylon’s health in our report this year. “
If you thought that was enough groundbreaking AI research and applications for a report, you are wrong. State of AI Report 2020 is a wealth of references and we will visit it soon with more insights from Benaich and Hogarth.