编译器入门_ 字节宝

前言

最近栏主在研究ios的llvm层优化的时候发现很多的知识盲区，便一头埋进编译器的世界。把这几周所学分享给大家

一·环境搭建

这里推荐腾讯云的轻量型服务器记得选linux系统

需要用到的环境有gcc 和 python

直接部署云服务器就可以运行demo 如果没有的话

使用apt-get

代码语言：javascript复制

sudo apt-get install build-essential

二·源程序TinyC

tinyc的关键字

代码语言：javascript复制

void, int, while, if, else, return, break, continue, print

只有简单的9个关键字

classical-tinyc-program

代码语言：javascript复制

#include "for_gcc_build.hh" // only for gcc, TinyC will ignore it.

int main() {
    int i;
    i = 0;
    while (i < 10) {
        i = i   1;
        if (i == 3 || i == 5) {
            continue;
        }
        if (i == 8) {
            break;
        }
        print("%d! = %d", i, factor(i));
    }
    return 0;
}

int factor(int n) {
    if (n < 2) {
        return 1;
    }
    return n * factor(n - 1);
}

以上代码中的第一行的 #include “for_gcc_build.hh” 是为了利用gcc来编译该文件的，TinyC 编译器会注释掉该行。for_gcc_build.hh文件源码如下：

代码语言：javascript复制

#include <stdio.h>
#include <string.h>
#include <stdarg.h>

void print(char *format, ...) {
    va_list args;
    va_start(args, format);
    vprintf(format, args);
    va_end(args);
    puts("");
}

int readint(char *prompt) {
    int i;
    printf(prompt);
    scanf("%d", &i);
    return i;
}

#define auto
#define short
#define long
#define float
#define double
#define char
#define struct
#define union
#define enum
#define typedef
#define const
#define unsigned
#define signed
#define extern
#define register
#define static
#define volatile
#define switch
#define case
#define for
#define do
#define goto
#define default
#define sizeof

此文件中提供了 print 和 readint 函数，另外，将所有 C 语言支持、但 TinyC 不支持的关键词全部 define 成空名称，这样来保证 gcc 和 TinyC 编译器的效果差不多。利用 gcc 编译的目的是为了测试和对比 TinyC 编译器的编译结果。

让我们先用 gcc 编译并运行一下上面这个典型的 TinyC 源文件吧。将以上代码分别存为 tinyc.c 和 for_gcc_build.hh，放在同一目录下，打开终端并 cd 到该目录，输入：

代码语言：javascript复制

$ gcc -o tinyc tinyc.c
$ ./tinyc

我们可以得到

三·中间代码Pcode

Pcode源码先提供给大家

这里假设各位已经了解 llvm中的概念：IR NFA DAF 等

3.1pcode虚拟机

代码语言：javascript复制

x = 1   2 * 3;

可以翻译成以下pcode

代码语言：javascript复制

push 1
push 2
push 3
mul
add
pop x

用来模拟x86，arm64cpu架构中的后端操作

3.2编写asm代码

代码语言：javascript复制

; int a, b, c, d;
var a, b, c, d

; a = 4 * 2;
push 4
push 2
mul
pop a

; b = a * 2
push a
push 2
mul
pop b

; c = a * b
push a
push b
mul
pop c

push a
push b
push c
$sum
exit 0
FUNC @sum:
    arg a,b,c
    push a
    push b
    push c
    add
    add
    ret 0
ENDFUNC

研究过llvm的会发现这不就是中间代码吗？

终端输入

代码语言：javascript复制

python pysim.py pcode_1.asm -d

这里推荐使用全屏

输出效果是这样的

其中我们可以看见基础的c语言的声明变量以及自定义函数sum

代码语言：javascript复制

int a,b,c,d;
sum(1, 2);
...

void sum(int a, int b) {
    return a   b;
}

Pcode 的函数调用过程参考了 x86(32bits) 架构下的 C 语言 stdcall 类型的函数调用约定，整个流程基本一致。 stdcall 的不同之处主要在于：

1.参数是按从右到左的顺序入栈的

2.函数的返回值保存在寄存器EXA中，而不是栈顶

网友 clover_toeic 的博客中非常深入的介绍了 C 语言函数调用过程，有兴趣的读者可以读一读，网址为：http://www.cnblogs.com/clover-toeic/p/3755401.html

四·编译流程

编译的流程可以用一幅图来表示

正常来说一个图灵完备的编译器需要有上面多种功能

但对于入门来说非常不友好，所以就有了入门的tinyc

gcc ide 打包 c++ c语言

0 人点赞