Parsing as a lifting problem and the Chomsky-Schützenberger representation theorem - École polytechnique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Parsing as a lifting problem and the Chomsky-Schützenberger representation theorem

Résumé

Building on our work on type refinement systems, we continue developing the thesis that many kinds of deductive systems may be usefully modelled as functors and derivability as a lifting problem, focusing in this work on derivability in context-free grammars. We begin by explaining how derivations in any context-free grammar may be naturally encoded by a functor of operads from a freely generated operad into a certain "operad of spliced words". This motivates the introduction of a more general notion of context-free grammar over any category, defined as a finite species S equipped with a color denoting the start symbol and a functor of operads p : Free S → W[C] into the operad of spliced arrows in C, generating a context-free language of arrows. We show that many standard properties of context-free grammars can be formulated within this framework, thereby admitting simpler analysis, and that usual closure properties of context-free languages generalize to context-free languages of arrows. One advantage of considering parsing as a lifting problem is that it enables a dual fibrational perspective on the functor p via the notion of displayed operad, corresponding to a lax functor of operads W[C] → Span(Set). We show that displayed free operads admit an explicit inductive definition, using this to give a reconstruction of Leermakers' generalization of the CYK parsing algorithm. We then turn to the Chomsky-Schützenberger Representation Theorem. We start by explaining that a non-deterministic finite state automaton over words, or more generally over arrows of a category, can be seen as a category Q equipped with a pair of objects denoting initial and accepting states and a functor of categories Q → C satisfying the unique lifting of factorizations (ULF) property and the finite fiber property, recognizing a regular language of arrows. Then, we explain how to extend this notion of automaton to functors of operads, which generalize tree automata, allowing us to lift an automaton over a category to an automaton over its operad of spliced arrows. We show that every context-free grammar over a category can be pulled back along a non-deterministic finite state automaton over the same category, and hence that context-free languages are closed under intersection with regular languages. The last and important ingredient is the identification of a left adjoint C[−] : Operad → Cat to the operad of spliced arrows functor W[−] : Cat → Operad. This construction builds the contour category C[O] of any operad O, whose arrows have a geometric interpretation as "oriented contours" of operations. A direct consequence of the contour / splicing adjunction is that every pointed finite species induces a universal context-free grammar, generating a language of tree contour words. Finally, we prove a generalization of the Chomsky-Schützenberger Representation Theorem, establishing that any context-free language of arrows over a category C is the functorial image of the intersection of a C-chromatic tree contour language and a regular language.
Fichier principal
Vignette du fichier
paper-mfps.pdf (290.4 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03702762 , version 1 (23-06-2022)

Identifiants

Citer

Paul-André Melliès, Noam Zeilberger. Parsing as a lifting problem and the Chomsky-Schützenberger representation theorem. MFPS 2022 - 38th conference on Mathematical Foundations for Programming Semantics, Jul 2022, Ithaca, NY, United States. ⟨10.46298/entics.10508⟩. ⟨hal-03702762⟩
561 Consultations
517 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More