Product-first Research

I work at (and on) Robin Labs, where I the good fortune to be developing our Robin assistant and to see it converse with - and fulfill tasks for - some 2 million users. This work has been incredibly rewarding and it feels important: I strongly believe that, by making technology accessible via a more natural interface, we can help people remain more human. In the process, I’ve come to appreciate the great power of the conversational agent medium and the extent to which users are hungry for technology that can understand them, naturally. But for all the potential, and despite the recent progress in machine learning, our progress in this field is still hampered by the relative crudeness of our tools. It is obvious that Natural Language Understanding is in its infancy and there is much work to be done. Our goal is, therefore, to keep making impactful contributions to the field, bridging the Human-Machine gap.

The way to advance is through research, but I believe that Language and Dialogue call specifically for product-first research, i.e., rapid iteration of hypotheses in a real product setting, with live users. To product managers, that may sound like a risky proposition, but without risk, there is no innovation.

This product-first perspective make research priorities really clear. We’ve come to have some decent tools for text classification, but these are not nearly suffucient to create agents that can communicate intelligently. Here are some areas that are particularly in need of disruption right now:

Better tools to benchmark conversational agents Quality of dialogue is notoriously different to measure and the Turing test is deeply flawed. The lack of adequate tools is hampering the entire field’s progress. One idea is to offer an open “chatbot playground” to the community, where agents are exposed to live users and can get relative scores in terms of retention and engagement. See a more detailed proposal here.

Better dialogue models Strong language models are necessary but not sufficient for the creation of powerful dialogue models (intuition: dialogue is a protocol to communicate information, so it’s not just a statistical problem). For instance, sequence-to-sequence networks have been shown to work well machine translation but not in dialogue. In my view, a more promising approach is to learn higher level, language-independent discourse models ^[1] ^[2] (again, from real interactions with real users), which can also be combined with language models in differentiable end-to-end architectures. Specifically, since some discriminative discourse models are already available, we may be able to train rich generative dialogue models by using GANs (Generative Adversarial Networks) and/or Reinforcement Learning.

Reinforcement Learning (RL) of dialogue Rather than learning from prior conversations, a potentially more potent paradigm is to learn while conversing with users. Beyond computational aspects, this approach also requires devising a user experience where users are incentivized to make the agent understand them, and the agent is rewarded when it is successful. In other words, dialogue RL involves machine learning and product challenges that may seem quite daunting. Still, it is well worth the effort: imagine a talking to a bot that actually becomes smarter in the course of - and thanks to - the conversation! In summary, RL has the potential to radically transform language & dialogue learning with the users’ help, while keeping them motivated and rewarded in the process!

Now that the goals are clear, let’s get to work!

1. Stolcke, Andreas; Ries, Klaus; Coccaro, Noah; Shriberg, Elizabeth; Bates, Rebecca; Jurafsky, Daniel; Taylor, Paul; Martin, Rachel; et al. (2000), "Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech" (PDF), Computational Linguistics, 26 (3): 339, doi:10.1162/089120100561737

2. Galitsky, Boris A., and Sergei O. Kuznetsov. "Learning communicative actions of conflicting human agents." Journal of Experimental & Theoretical Artificial Intelligence 20.4 (2008): 277-317