Guiding enumerative synthesis with large language models

Talk by Elizabeth Polgreen (University of Edinburgh)

Generative AI is great at generating code: it’s fast, scalable, and integrated into your IDE. But the problem is, it gives incorrect answers. Before chatGPT, there was formal program synthesis: algorithms that combine enumeration and deductive reasoning to generate code that is guaranteed to satisfy a specification. Formal synthesis is great at solving small problems precisely; it never gives incorrect answers. But the problem is, it’s just not very scalable. So what if we could combine the two?

In this talk, I will present a novel enumerative synthesis algorithm that integrates calls to a large language model (LLM) into a weighted probabilistic search. This allows the synthesizer to provide the LLM with information about the progress of the enumerator, and the LLM to provide the enumerator with syntactic guidance in an iterative loop. We embed this search within CounterExample Guided Inductive Synthesis, guaranteeing that all results are correct.

An evaluation on benchmarks from the Syntax-Guided Synthesis (SyGuS) competition shows that state-of-the-art formal synthesis algorithms easily outperform GPT 3.5 as a stand-alone tool for formal synthesis, but our approach integrating the LLM into an enumerative synthesis algorithm shows significant performance gains over both the LLM and the enumerative synthesizer alone and the winning SyGuS competition tool.