Update: I joined the Amazon Alexa team in 2013
[ Research | Publications
| Theses | Online demos |
Datasets and tools |
Talks | Teaching ]
[ Google Scholar | Linkedin profile | Young Researchers' Roundtable on Spoken
Dialogue Systems ]
Machine learning manager and individual contributor at Amazon, leading a team working on conversational AI for Alexa. I am hiring software engineers and applied scientists in San Francisco. Previously I was a research associate at the Cambridge University Machine
Intelligence Lab, in the Dialogue Systems Group headed
by Prof. Steve Young.
I was part of the EU CLASSiC project (Computational Learning in Adaptive Systems for Spoken Conversation), which focuses on statistical methods for data-driven semantic parsing, dialogue management and natural language generation.
I completed my Ph.D. thesis in 2008 under the supervision of Prof. Marilyn Walker, at the Computer Science Department of the University of Sheffield, United Kingdom. I obtained a Master of
Engineering and Computer Science in 2004 from the Université Catholique de
Louvain in Belgium.
I have been working on statistical methods for natural language understanding, natural language generation and opinion mining. These problems require learning structured prediction models from a large amount of annotated data. I have been especially interested in crowdsourcing for collecting data, in order to model the wide range of speaking styles found in natural language.
Research Interests:
- Scalable and maintainable conversational AI
- Domain expansion for spoken language understanding in the absence of data
- Deep learning to generate complex utterance responses
- Expressive language generation and text-to-speech synthesis
- Learning to detect mood, emotion and personality for user modelling
Journal articles (Google Scholar):
- François Mairesse and Steve Young. Stochastic Language Generation in Dialogue Using Factored Language Models
Computational Linguistics, 40(4), 2014.
- François Mairesse and Marilyn
Walker. Controlling User Perceptions of
Linguistic Style: Trainable
Generation of Personality Traits
Computational Linguistics, 37(3), 2011.
- Kai Yu, Heiga Zen, François Mairesse and Steve Young. Context Adaptive Training with
Factorized Decision Trees for HMM based Statistical Parametric
Speech Synthesis [PDF]. Speech Communication, 53(6), pages
914-923, 2011.
- François Mairesse and Marilyn Walker. Towards Personality-Based User Adaptation:
Psychologically Informed Stylistic Language Generation [PDF] [Talk].
User Modeling and User-Adapted Interaction, 20(3), pages 227-278, 2010.
- Steve Young, Milica Gasic, Simon Keizer, François Mairesse, Jost
Schatzmann, Blaise Thomson and Kai Yu. The Hidden Information State Model: a
practical framework for POMDP-based spoken dialogue management [PDF] [PS] . Computer
Speech and Language, 24(2), pages 150-174, 2010.
- François Mairesse, Marilyn Walker, Matthias Mehl and Roger
Moore. Using Linguistic Cues for the Automatic Recognition of
Personality in Conversation and Text [PDF] [PS]
[BibTeX] [Talk].
Journal of Artificial Intelligence Research (JAIR), 30, pages 457-500, 2007.
- Marilyn Walker, Amanda Stent, François Mairesse and Rashmi Prasad. Individual and Domain Adaptation
in Sentence Planning for Dialogue [PDF] [PS] [BibTeX]. Journal of
Artificial Intelligence Research (JAIR), 30, pages 413-456, 2007.
Peer-reviewed publications at international conferences:
- F. Mairesse, P. Raccuglia and S. Vitaladevuni. Search-based Evaluation from Truth Transcripts for Voice Search Applications. In Proceedings
of SIGIR, Pisa, July 2016.
- F. Mairesse, J. Polifroni and G. Di Fabbrizio. Can Prosody Inform Sentiment
Analysis? Experiments on Short Spoken Reviews. In Proceedings
of ICASSP, Kyoto, March 2012.
- J. Polifroni and F. Mairesse. Using Latent Topic Features
for Named Entity Extraction in Search Queries. In Proceedings
of Interspeech, Florence, August 2011.
- F. Jurcicek, S. Keizer, M. Gasic, F. Mairesse, B. Thomson,
K. Yu, and S. Young. Real User
Evaluation of Spoken Dialogue Systems using Amazon Mechanical
Turk. In Proceedings of Interspeech, Florence, August 2011.
- F. Mairesse, M. Gasic, F. Jurcicek, S. Keizer, B. Thomson,
K. Yu and S. Young. Phrase-based Statistical
Language Generation using Graphical Models and Active Learning. In Proceedings of the 48th Annual Meeting of the Association for
Computational Linguistics (ACL), Uppsala, July 2010.
- F. Lefevre, F. Mairesse and
S. Young. Cross-Lingual Spoken Language Understanding from Unaligned Data
using Discriminative Classification Models and Machine
Translation. In Proceedings of Interspeech, Makuhari, September 2010.
- K. Yu, H. Zen, F. Mairesse and S. Young. Context adaptive
training with factorized decision trees for HMM-based speech
synthesis (best paper). In Proceedings of Interspeech, Makuhari, September 2010.
- K. Yu, F. Mairesse and S. Young. Word-level
Emphasis Modelling in HMM-based Speech Synthesis. In Proceedings of
ICASSP, Dallas, 2010.
- F. Mairesse, M. Gasic, F. Jurcicek, S. Keizer, B. Thomson, K. Yu and
S. Young. Spoken Language Understanding from Unaligned Data using
Discriminative Classification Models. In Proceedings of ICASSP, Taipei, 2009.
- S. Keizer, M. Gasic, F. Mairesse, B. Thomson, K. Yu and
S. Young. Modelling user behaviour in the
HIS-POMDP dialogue manager. In Proceedings of SLT, Goa, 2008.
- François Mairesse and Marilyn Walker. Trainable Generation of Big-Five Personality Styles
through Data-driven Parameter Estimation [PDF] [PS]
[BibTeX] [Talk]. In
Proceedings of the 46th Annual Meeting of the Association for Computational
Linguistics (ACL), Columbus, June 2008.
- M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, K. Yu and
S. Young.
Training and Evaluation of the HIS POMDP Dialogue
System in Noise. In Proceedings of SIGDial, Columbus, 2008.
- B. Thomson, K. Yu, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann and
S. Young. Evaluating semantic-level confidence scores with multiple
hypotheses. In Proceedings of Interspeech, Brisbane, 2008.
- B. Thomson, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu and
S. Young.
User study of the Bayesian Update of Dialogue State
approach to dialogue management. In Proceedings of Interspeech, Brisbane, 2008.
- François Mairesse and Marilyn Walker. A Personality-based Framework for Utterance
Generation in Dialogue Applications [PDF] [PS] [BibTeX]. In Proceedings of the AAAI Spring Symposium on Emotion, Personality,
and Social Behavior, Palo Alto, March 2008.
- François Mairesse and Marilyn Walker.
PERSONAGE: Personality Generation for Dialogue [PDF] [PS] [BibTeX] [Talk]. In
Proceedings of the 45th Annual Meeting of the Association for Computational
Linguistics (ACL), Prague, June 2007.
- François Mairesse and Marilyn Walker. Words Mark the Nerds: Computational Models
of Personality Recognition through Language [PDF] [PS] [BibTeX] [Talk]. In
Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006), pages 543-548,
Vancouver, July 2006.
- François Mairesse and Marilyn Walker. Automatic Recognition of
Personality in
Conversation [PDF]
[PS] [BibTeX] [Talk]. In Proceedings of HLT-NAACL
2006, New York City, June 2006.
- Emma Barker, Ryuichiro Higashinaka, François Mairesse, Robert Gaizauskas, Marilyn Walker and
Jonathan Foster. Simulating Cub Reporter Dialogues: The collection of naturalistic
human-human dialogues for information access to text archives [PDF] [PS] [BibTeX]. In Proceedings of the
International Conference on Language Resources and Evaluation (LREC 2006), Genoa, May 2006.
- François Mairesse and Marilyn Walker. Learning to Personalize
Spoken Generation for Dialogue Systems [PDF] [PS] [BibTeX]. In Proceedings of Interspeech'2005
- Eurospeech: 9th European Conference on Speech Communication and Technology, pages 1881-1884, Lisbon, September 2005.
- François Mairesse and Marilyn Walker. Generating Individualized
Utterances for Dialogue Systems [PDF] [PS] [BibTeX] [Talk]. In Proceedings of the Symposium on Dialogue Modelling and
Generation (as part of the Annual Meeting of the Society for Text & Discourse), Amsterdam, July 2005.
Online demos:
- CamInfo: The Cambridge
Tourist Information Dialogue System (requires a microphone)
This Java applet is an interface to our group's live dialogue system,
which provides information
about most places in Cambridge, including pubs, restaurants, colleges, museums,
etc. The system can also be called using the number +44 1223 852
453. The system implements the HIS framework, i.e. it relies
on Partially-observable Markov Decision Processes to reason over multiple hypotheses
about the user input, which are provided by the ATK speech recogniser. Some functionalities of Personage are used for language
generation (e.g., syntactic aggregation, WordNet synonym
selection). The speech synthesiser is an HTS voice trained on
emphasis-dependent context features using the two-pass context clustering
- Automatic personality recognition
What does your language reveal
about you? The personality recognition models can estimate your scores along
the 5 main personality dimensions based on your input text. Models are detailed in this paper.
Datasets and software packages:
Here are various human-annotated datasets and freely available software. Feel free to use and modify them for non-commercial purposes.
- BAGEL training and evaluation data
This contains the 404 semantically aligned utterances used for training
and evaluating the BAGEL statistical language generator, together
with the naturalness and informativeness ratings of 1616 utterances
generated using different learning configurations,
i.e. using active learning and random sampling. More details
in this paper.
- Emphasis-annotated ARCTIC database
for speaker AWB
This corpus contains word-level emphasis annotations for the first 597 utterances (set A) of the
ARCTIC speech database, i.e. the words or phrases perceived as the focus of speaker
AWB's utterances.
- Personage: Language Generation with Personality
The Personage generator can produce personality-rich utterances for presenting information
in the restaurant domain, by varying the target personality scores along the Big Five traits. Personage is based on supervised machine learning models predicting generation parameters from human
personality ratings, detailed in this and this paper. The generator has been used in a wide range of publications by the UCSC NLDS Group. Download the Java stand-alone generator (65 MB, unsupported) and the Personage manual for more details.
- Personage dataset: a
personality-annotated corpus
This dataset contains
580 utterances annotated
with personality/stylistic ratings from human judges, for each Big Five
trait. The data also includes the generation
decisions made for each
utterance, as well as the intermediary content plan tree, sentence
plan tree and syntactic structures. Naturalness ratings are also included. This data was used
for evaluating the Personage generator, as well as for training
parameter estimation models (Mairesse & Walker, 2007, 2008). More details
in the Personage dataset readme file.
- EAR speaker features and personality ratings
Prosodic, LIWC and MRC features extracted from each speaker of the EAR dataset, as used in the JAIR 2007 paper. One file per personality trait, for observed and self-reported ratings.
- Personality Recognizer v1.02 (not supported anymore)
This Java command-line application extracts psycholinguistic features from multiple text files and runs the included models to compute personality scores for all Big Five traits.
- jMRC - MRC Psycholinguistic Database Java Interface v0.9
This Java interface allows you to query the MRC Psycholinguistic Database from your Java programs, providing psycholinguistic features for over 150,000 words.
- How Amazon Alexa Works
- Machine Learning Innovation Summit, San Francisco, 06/06/2017.
- The Benefits of Statistical
Language Understanding and Generation in Dialogue
- CSAIL Seminar, MIT, 26/09/2011.
- Crowdsourcing a Statistical Language Generator using
Phrase-based Factored Language Models
- TALC Seminar at LORIA, Nancy, 23/11/2010.
- 48th Annual Meeting of the Association for Computational
Linguistics (ACL), Uppsala, Sweden, 14/07/2010.
- Trainable
Generation of Personality through Data-driven Parameter Estimation
- NLIP Seminar at the Computer Laboratory, Cambridge University, 21/11/2008.
46th Annual Meeting of the Association for Computational
Linguistics (ACL), Columbus OH, 16/06/2008.
- Generating
Language with Personality
- SRI International's Artificial Intelligence Center, Menlo Park, 03/04/2008.
- AAAI Spring Symposium on Emotion, Personality and Social Behavior, Stanford
University, 26/03/2008.
- Psychology Department, University of Texas at Austin, 14/11/2007.
- Machine Intelligence Lab Seminar, Department of Engineering,
Cambridge University, 22/10/2007.
Computing Department Seminar at the Open University, Milton Keynes,
45th Annual Meeting of the Association for Computational
Linguistics (ACL), Prague, 26/06/2007.
- NLP Group Talk, Sheffield, 20/03/2007.
- Computational Models of Personality
Recognition through Language
- 28th Annual Conference of the Cognitive Science Society (CogSci 2006), Vancouver, 29/07/2006.
- HLT-NAACL 2006 Conference, New York City, 05/06/2006.
- NLP Group Talk, Sheffield, 16/05/2006.
- Learning Individual
Adaptation in Dialogue Systems
- Symposium on Dialogue Modelling
and Generation, Amsterdam, 07/07/2005.
- NLP Group Talk, Sheffield, 10/05/2005.