1 What is Parse?2 Different types of parsers3 Parser generator4 Parsers require correctly structured input
The parser analyzes a given text or source code with the help of a lexer. The lexical scanner not only queries all data, but also breaks it down into tokens. Tokens are characters that the parser understands. These are character strings or input symbols that are assigned a type by the formal grammar. For example, the string 123 is recognized by the parser as a character type number. What follows next is the actual core task of the parser where it checks the syntax of the input and creates a structure from the data obtained and displays it as a parse tree. This structure is the basis for further processing of the data.
Different types of parsers
There are two different types of parsers, top-down parsers, and bottom-up parsers. The main difference between the two is that they have different start and end points for the structure of the syntax tree. Top-down parsers: Top-down parsers (e.g. LL and LF parsers) work by deriving from the start symbol to the individual tokens: The analysis runs from the entire source text to the functions and expressions it contains and finally to the tokens contained therein. Bottom-up parser: With the bottom-up parser (e.g. different LR parsers), processing begins with a token, i.e. a leaf of the tree. By reducing individual tokens, the parser works its way up to larger contexts such as expressions and functions until it reaches the start symbol. With the bottom-up parser, the start symbol signals that the input has been completely analyzed.
Parser generator
With a parser generator it is possible to automatically create an efficient parser for a given lexical system. There are also scanner generators that generate a lexical scanner from a formal description. These tools are used in compiler construction – full-fledged compiler generators are still considered experimental.
Parsers require correctly structured input
A parser usually relies on the inputs complying with a certain syntax. For example, instructions must follow a standardized format in order to be correctly recognized by a parser. XML, for example, is a widely used markup language that can be used to structure information hierarchically. The format can be read directly by humans and by machines using an XML parser. However, the XML parser only works if the structure is free of errors. If an unexpected character came first, the syntactic analysis of the entire document could fail. Most parsers report when they encounter incorrect syntax. This not only helps with troubleshooting, but also helps avoid many errors during development.