Postgresql源码(98)lex与yacc的定制交互方式

2023-02-02 11:27:13 浏览数 (1)

1 背景知识一:LEX %option prefix

Postgresql中使用%option prefix="core_yy",影响范围:yy_create_buffer,yy_delete_buffer,yy_flex_debug,yy_init_buffer,yy_flush_buffer,yy_load_buffer_state,yy_switch_to_buffer,yyin,yyleng,yylex,yylineno,yyout,yyrestart,yytext,yywrap,yyalloc,yyrealloc,yyfree。

所以lex提供的yylex在PG中是core_yylex。

代码语言:javascript复制
‘-PPREFIX, --prefix=PREFIX, %option prefix="PREFIX"’
changes the default ‘yy’ prefix used by flex for all globally-visible variable and function names to instead be ‘PREFIX’. For example, ‘--prefix=foo’ changes the name of yytext to footext. It also changes the name of the default output file from lex.yy.c to lex.foo.c. Here is a partial list of the names affected:

    yy_create_buffer
    yy_delete_buffer
    yy_flex_debug
    yy_init_buffer
    yy_flush_buffer
    yy_load_buffer_state
    yy_switch_to_buffer
    yyin
    yyleng
    yylex
    yylineno
    yyout
    yyrestart
    yytext
    yywrap
    yyalloc
    yyrealloc
    yyfree
(If you are using a C   scanner, then only yywrap and yyFlexLexer are affected.) Within your scanner itself, you can still refer to the global variables and functions using either version of their name; but externally, they have the modified name.

This option lets you easily link together multiple flex programs into the same executable. Note, though, that using this option also renames yywrap(), so you now must either provide your own (appropriately-named) version of the routine for your scanner, or use %option noyywrap, as linking with ‘-lfl’ no longer provides one for you by default.

https://www.cs.virginia.edu/~cr4bd/flex-manual/Code_002dLevel-And-API-Options.html#Code_002dLevel-And-API-Options

2 背景知识二:YACC %name-prefix

lex and yacc中可以使用prefix指定内置函数、变量的前缀,实现一套代码中包含多套解析器。

Postgresql中使用%name-prefix="base_yy",影响范围:yyparse, yylex, yyerror, yynerrs, yylval, yylloc, yychar and yydebug。

所以yacc中调用的yylex函数实际是base_yylex。

但是lex提供的是core_yylex,yacc调用的是base_yylex,怎么找到core_yylex呢?看下一节。

代码语言:javascript复制
The renamed symbols include yyparse, yylex, yyerror, yynerrs, yylval, yylloc, yychar and yydebug. 

If you use a push parser, yypush_parse, yypull_parse, yypstate, yypstate_new and yypstate_delete will also be renamed. The renamed macros include YYSTYPE, YYLTYPE, and YYDEBUG, which is treated specifically — more about this below.

https://www.gnu.org/software/bison/manual/bison.html#Multiple-Parsers

3 yylex与yyparse

  • yyparse是yacc入口,pg中在raw_parser中调用。
  • yylex是lex入口,yacc通过自定义base_yylex函数,在函数中调用core_yylex进入lex拿token和值。

3.1 yylex和yyparse的参数

首先关注在gram.y中的几个重要配置:

代码语言:javascript复制
gram.y

%pure-parser
%name-prefix="base_yy"
%locations

%parse-param {core_yyscan_t yyscanner}
%lex-param   {core_yyscan_t yyscanner}

参数配置函数:

参数影响

yylex

yyparse

初始形态的yylex函数

int yylex()

int yyparse ()

加parse-param后

int yylex(core_yyscan_t yyscanner)

int yyparse (core_yyscan_t yyscanner)

加pure-parser后

int yylex(YYSTYPE *lvalp, core_yyscan_t yyscanner)

int yyparse (core_yyscan_t yyscanner)

加locations后

int yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)

int yyparse (core_yyscan_t yyscanner)

YYSTYPE是什么:token的值保存在全局变量yylval中,yylval的类型是YYSTYPE。 YYLTYPE是什么:token的位置保存在全局变量yylloc中,yylloc的类型是YYLTYPE。

YYSTYPE会在gram.y中编译为联合体YYSTYPE,其中第一个子类型就是core_YYSTYPE core_yystype;

代码语言:javascript复制
union YYSTYPE
{
#line 230 "gram.y"

	core_YYSTYPE core_yystype;
	/* these fields must match core_YYSTYPE: */
	int			ival;
	char	   *str;
	const char *keyword;

	char		chr;
	bool		boolean;
...
...
#line 604 "gram.h"

};

3.2 yyparse与base_yylex的调用流程

代码语言:javascript复制
base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
  ...
  cur_token = core_yylex(&(lvalp->core_yystype), llocp, yyscanner);
  ...

其中lvalp和lvalp->core_yystype地址相同,因为是一个联合体:

0 人点赞