||Parsing with unification grammars is inefficient due to the expressive power of the grammars. Most unification-based parsing algorithms are extensions of context-free (CF) parsing algorithms, and few have been specially designed for unification-style grammars. |
We have developed an efficient parsing algorithm for unification grammars which takes full advantage of the expressiveness of the grammar. Our algorithm (called LC) is a variation of Left-corner parsing, and it exhibits significantly improved average-case performance as compared with previous unification-based parsers. Efficiency of our LC algorithm comes from two factors. First is the representation and architecture of LINK. LINK is a syntax-semantics integrated unification-based system which dynamically combines syntax (grammar) and semantics (domain knowledge). And LINK utilizes all available information at any given point during parsing. Second is the expectation-based Left-corner parsing strategy. By utilizing expectations, the algorithm can eliminate unsuccessful parses which will not fit the left-context (previous word(s) in the sentence).
The central focus of this thesis is the formalization and the proof of correctness of the LC algorithm. To do so, we specify the algorithm using the constraint-based grammar formalism presented in (Shieber, 1992). In the formulation, the LC algorithm is characterized as an optimization of the abstract parsing algorithm developed in (Shieber, 1992). Then by using Shieber's proof of correctness of his algorithm, we prove the correctness of our LC algorithm by reducing LC to his algorithm.
In formulating the proof of correctness, we discovered a difficulty in Shieber's as well as the LC algorithm, in which, for certain grammars, the algorithms may spuriously create nonminimal derivations in addition to the minimal ones. As it turns out, the nonminimal derivation problem raises important issues concerning some of the basic notions in unification grammar and unification-based parsing. We discuss this nonminimal derivation problem in depth, including the sources and the possible solutions.
Finally, we present the empirical result obtained from running LINK on a corpus of example sentences taken from real-world texts. The results indicate that, for the limited domain texts, LINK achieved a linear time average-case performance. This is a marked improvement over other unification-based parsing algorithms.