🌃 👩🏿‍🍳 🏂🏻 Rust：A Beginner's Guideで書かれたLLVM Toy Frontend 💇 ☯️ 🤶

翻訳者のメモ

この記事に記載されているコードは、かなり古いバージョンのpegおよびpeg_syntax_extクレートでコンパイルされています。 現在のバージョンでは、ソースに最小限の変更を加える必要があります。 記事のテキストに従って、変更したセクションをネタバレに挿入しました。 コードをビルドするには、夜間のRustコンパイラーをインストールします。
編集した完全なソースはこちらからダウンロードできます： https : //github.com/arktur04/rust-llvm-toy-frontend

私は現在、Rustで書かれてLLVM IRを生成するコンパイラに取り組んでいます。 LLVM APIは初心者には少し威圧的に見えますが、チュートリアルはそれほど多くありません（そしてすべてC ++で書かれているため、Rustで同じことを行う方法は完全には明らかではありません）。このすべてを始めたときに、誰かに助けてもらいたいです。この記事は、その時に自分自身に見せたいものです。

Rustでは、LLVMと対話する最良の方法はllvm-sysクレートを使用することです。親切な人が彼のためにドキュメントをここに投稿しました。もちろん、LLVMがどのように「考えている」かを理解するのに役立つので、 LLVM マニュアルも学習する必要があります。この投稿は基本的に、このガイドのサブセットのRustへの翻訳です。

このガイドの完全なソースコードはこちらです。

開発のための作業環境を手に入れる

まず、LLVMを開始する方法を次に示します。

# `curl` is just so we can next install Rust sudo apt-get -y install clang curl llvm-3.8-dev curl https://sh.rustup.rs -sSf | sh # The `llvm-sys` crate expects something called `llvm-config` on your PATH. sudo ln -s /usr/bin/llvm-config-3.8 /usr/bin/llvm-config

新しいUbuntuで作業している場合（ apt-get updateが必要な場合があります）、すべての準備が整ったら開始できます。そうでない場合は、 Vagrantfileを使用してVagrant仮想マシンで開始できます。

 Vagrant.configure("2") do |config| config.vm.box = "bento/ubuntu-16.04" end

cargo init llvm-example --binを実行し、次を（llvm-sysから取得）をsrc / main.rsに配置することで開始できます。

 //! Construct a function that does nothing in LLVM IR. extern crate llvm_sys as llvm; use std::ptr; fn main() { unsafe { // Set up a context, module and builder in that context. let context = llvm::core::LLVMContextCreate(); let module = llvm::core::LLVMModuleCreateWithName(b"nop\0".as_ptr() as *const _); let builder = llvm::core::LLVMCreateBuilderInContext(context); // Get the type signature for void nop(void); // Then create it in our module. let void = llvm::core::LLVMVoidTypeInContext(context); let function_type = llvm::core::LLVMFunctionType(void, ptr::null_mut(), 0, 0); let function = llvm::core::LLVMAddFunction(module, b"nop\0".as_ptr() as *const _, function_type); // Create a basic block in the function and set our builder to generate // code in it. let bb = llvm::core::LLVMAppendBasicBlockInContext(context, function, b"entry\0".as_ptr() as *const _); llvm::core::LLVMPositionBuilderAtEnd(builder, bb); // Emit a `ret void` into the function llvm::core::LLVMBuildRetVoid(builder); // Dump the module as IR to stdout. llvm::core::LLVMDumpModule(module); // Clean up. Values created in the context mostly get cleaned up there. llvm::core::LLVMDisposeBuilder(builder); llvm::core::LLVMDisposeModule(module); llvm::core::LLVMContextDispose(context); } }

そしてCargo.tomlで ：

 [package] name = "llvm-example" version = "0.1.0" authors = ["Ulysse Carion <ulysse@ulysse.io>"] [[bin]] name = "main" [dependencies] llvm-sys = "0.2"

次のものが必要です。

 vagrant@vagrant:/vagrant$ cargo run Compiling llvm-example v0.1.0 (file:///vagrant) Running `target/debug/main` ; ModuleID = 'nop' define void @nop() { entry: ret void }

やった！独自のプログラムの作成を開始できます。

少しささいなプログラム

最初に、 メイン関数から整数を返すことにより、単に完了コードを返すプログラムをコンパイルします。

私がやった方法は次のとおりです（すぐにパーサーが必要になるので、今すぐ追加しました。ペグクレートを使用しました）。

ご注意 perev。

テキストCargo.toml：

 [package] name = "llvm-example" version = "0.1.0" authors = ["Ulysse Carion <ulysse@ulysse.io>"] [[bin]] name = "main" [dependencies] llvm-sys = "38" peg = "0.5.4" peg-syntax-ext = "0.5.2"

 #![feature(plugin)] #![plugin(peg_syntax_ext)] extern crate llvm_sys as llvm; use std::ffi::CString; use std::fs::File; use std::io::Read; use std::ptr; fn main() { let mut input = String::new(); let mut f = File::open("in.ex").unwrap(); f.read_to_string(&mut input).unwrap(); let parsed_input = parser::program(&input).unwrap(); unsafe { codegen(parsed_input); } } peg! parser(r#" #[pub] program -> String = i:int_literal "\n" { i } int_literal -> String = [0-9]+ { match_str.to_owned() } "#); unsafe fn codegen(input: String) { let context = llvm::core::LLVMContextCreate(); let module = llvm::core::LLVMModuleCreateWithName(b"example_module\0".as_ptr() as *const _); let builder = llvm::core::LLVMCreateBuilderInContext(context); // In LLVM, you get your types from functions. let int_type = llvm::core::LLVMInt64TypeInContext(context); let function_type = llvm::core::LLVMFunctionType(int_type, ptr::null_mut(), 0, 0); let function = llvm::core::LLVMAddFunction(module, b"main\0".as_ptr() as *const _, function_type); let entry_name = CString::new("entry").unwrap(); let bb = llvm::core::LLVMAppendBasicBlockInContext(context, function, entry_name.as_ptr()); llvm::core::LLVMPositionBuilderAtEnd(builder, bb); // The juicy part: construct a `LLVMValue` from a Rust value: let int_value: u64 = input.parse().unwrap(); let int_value = llvm::core::LLVMConstInt(int_type, int_value, 0); llvm::core::LLVMBuildRet(builder, int_value); // Instead of dumping to stdout, let's write out the IR to `out.ll` let out_file = CString::new("out.ll").unwrap(); llvm::core::LLVMPrintModuleToFile(module, out_file.as_ptr(), ptr::null_mut()); llvm::core::LLVMDisposeBuilder(builder); llvm::core::LLVMDisposeModule(module); llvm::core::LLVMContextDispose(context); }

ご注意 perev。

パーサーの変更：

 peg! parser(r#" #[pub] program -> String = i:int_literal "\n" { i } int_literal -> String = n:$([0-9]+) { n.to_owned() } "#);

うまくいく！私たちはチェックします：

 vagrant@vagrant:/vagrant$ cat in.ex 42 vagrant@vagrant:/vagrant$ cargo run Running `target/debug/main` vagrant@vagrant:/vagrant$ lli-3.8 out.ll ; echo $? 42

かっこいい！これがout.llの外観です。

 ; ModuleID = 'example_module' define i64 @main() { entry: ret i64 42 }

算術

数値の加算、減算、乗算、および除算のサポートを追加します。これを行うには、文法を拡張する必要があります。 ASTが表す列挙型を紹介しましょう。

 pub enum Expr { Add(Box<Expr>, Box<Expr>), Sub(Box<Expr>, Box<Expr>), Mul(Box<Expr>, Box<Expr>), Div(Box<Expr>, Box<Expr>), Literal(String), }

また、文法を拡張する必要があります。

 // `product` and `sum` are that way to get operator precedence peg! parser(r#" use super::Expr; #[pub] program -> Expr = e:expression "\n" { e } expression -> Expr = sum sum -> Expr = a:product _ "+" _ b:sum { Expr::Add(Box::new(a), Box::new(b)) } / a:product _ "-" _ b:sum { Expr::Sub(Box::new(a), Box::new(b)) } / product product -> Expr = a:int_literal _ "*" _ b:product { Expr::Mul(Box::new(a), Box::new(b)) } / a:int_literal _ "/" _ b:product { Expr::Div(Box::new(a), Box::new(b)) } / int_literal int_literal -> Expr = [0-9]+ { Expr::Literal(match_str.to_owned()) } _ = " "* "#);

ご注意 perev。

パーサーの変更：

 // `product` and `sum` are that way to get operator precedence peg! parser(r#" use super::Expr; #[pub] program -> Expr = e:expression "\n" { e } expression -> Expr = sum sum -> Expr = a:product _ "+" _ b:sum { Expr::Add(Box::new(a), Box::new(b)) } / a:product _ "-" _ b:sum { Expr::Sub(Box::new(a), Box::new(b)) } / product product -> Expr = a:int_literal _ "*" _ b:product { Expr::Mul(Box::new(a), Box::new(b)) } / a:int_literal _ "/" _ b:product { Expr::Div(Box::new(a), Box::new(b)) } / int_literal int_literal -> Expr = n:$([0-9]+) { Expr::Literal(n.to_owned()) } _ = " "* "#);

次に、コードを生成します。「 addtmp 」などの文字列を定義できます。これらの文字列は、IRの対応するレジスタの名前の一部として使用されます。

 // When you write out instructions in LLVM, you get back `LLVMValueRef`s. You // can then use these references in other instructions. unsafe fn codegen_expr(context: LLVMContextRef, builder: LLVMBuilderRef, expr: Expr) -> LLVMValueRef { match expr { Expr::Literal(int_literal) => { let int_type = llvm::core::LLVMInt64TypeInContext(context); llvm::core::LLVMConstInt(int_type, int_literal.parse().unwrap(), 0) }, Expr::Add(lhs, rhs) => { let lhs = codegen_expr(context, builder, *lhs); let rhs = codegen_expr(context, builder, *rhs); let name = CString::new("addtmp").unwrap(); llvm::core::LLVMBuildAdd(builder, lhs, rhs, name.as_ptr()) }, Expr::Sub(lhs, rhs) => { let lhs = codegen_expr(context, builder, *lhs); let rhs = codegen_expr(context, builder, *rhs); let name = CString::new("subtmp").unwrap(); llvm::core::LLVMBuildSub(builder, lhs, rhs, name.as_ptr()) }, Expr::Mul(lhs, rhs) => { let lhs = codegen_expr(context, builder, *lhs); let rhs = codegen_expr(context, builder, *rhs); let name = CString::new("multmp").unwrap(); llvm::core::LLVMBuildMul(builder, lhs, rhs, name.as_ptr()) }, Expr::Div(lhs, rhs) => { let lhs = codegen_expr(context, builder, *lhs); let rhs = codegen_expr(context, builder, *rhs); let name = CString::new("divtmp").unwrap(); llvm::core::LLVMBuildUDiv(builder, lhs, rhs, name.as_ptr()) }, } }

これで、 10 * 4 + 20/2-8のようなプログラムを実行できます！とてもかっこいいですね。

変数

単純なパスをたどり、未定義の変数への参照など、プログラムがさまざまな迷惑なことをしないと仮定します。変数をレジスタに保存し、HashMap <String、LLVMValueRef>に保存するだけです。プログラムには実行パスが1つしかないため、これは機能します。
言語とパーサーの拡張：

 pub enum Expr { Literal(String), Ref(String), Assign(String, Box<Expr>), Add(Box<Expr>, Box<Expr>), Sub(Box<Expr>, Box<Expr>), Mul(Box<Expr>, Box<Expr>), Div(Box<Expr>, Box<Expr>), } peg! parser(r#" use super::Expr; #[pub] program -> Vec<Expr> = e:(expression ** "\n") "\n" { e } expression -> Expr = i:identifier _ "=" _ s:sum { Expr::Assign(i, Box::new(s)) } / sum sum -> Expr = a:product _ "+" _ b:sum { Expr::Add(Box::new(a), Box::new(b)) } / a:product _ "-" _ b:sum { Expr::Sub(Box::new(a), Box::new(b)) } / product product -> Expr = a:ref_or_literal _ "*" _ b:product { Expr::Mul(Box::new(a), Box::new(b)) } / a:ref_or_literal _ "/" _ b:product { Expr::Div(Box::new(a), Box::new(b)) } / ref_or_literal ref_or_literal -> Expr = i:identifier { Expr::Ref(i) } / int_literal identifier -> String = [a-zA-Z]+ { match_str.to_owned() } int_literal -> Expr = [0-9]+ { Expr::Literal(match_str.to_owned()) } _ = " "* "#);

ご注意 perev。

パーサーの変更：

 peg! parser(r#" use super::Expr; #[pub] program -> Vec<Expr> = e:(expression ** "\n") "\n" { e } expression -> Expr = i:identifier _ "=" _ s:sum { Expr::Assign(i, Box::new(s)) } / sum sum -> Expr = a:product _ "+" _ b:sum { Expr::Add(Box::new(a), Box::new(b)) } / a:product _ "-" _ b:sum { Expr::Sub(Box::new(a), Box::new(b)) } / product product -> Expr = a:ref_or_literal _ "*" _ b:product { Expr::Mul(Box::new(a), Box::new(b)) } / a:ref_or_literal _ "/" _ b:product { Expr::Div(Box::new(a), Box::new(b)) } / ref_or_literal ref_or_literal -> Expr = i:identifier { Expr::Ref(i) } / int_literal identifier -> String = n:$([a-zA-Z]+) { n.to_owned() } int_literal -> Expr = n:$([0-9]+) { Expr::Literal(n.to_owned()) } _ = " "* "#);

次に、2つの新しい式のサポートを追加します。

 unsafe fn codegen_expr(context: LLVMContextRef, builder: LLVMBuilderRef, names: &mut HashMap<String, LLVMValueRef>, expr: Expr) -> LLVMValueRef { match expr { // ... Expr::Ref(name) => { *names.get(&name).unwrap() }, Expr::Assign(name, expr) => { let new_value = codegen_expr(context, builder, names, *expr); names.insert(name, new_value); new_value }, } }

そして、 codegen関数をわずかに変更します。

 let int_type = llvm::core::LLVMInt64TypeInContext(context); let zero = llvm::core::LLVMConstInt(int_type, 0, 0); let mut names = HashMap::new(); let mut return_value = zero; // return value on empty program for expr in input { return_value = codegen_expr(context, builder, &mut names, expr); } llvm::core::LLVMBuildRet(builder, return_value);

出来上がり！私たちはチェックします：

 vagrant@vagrant:/vagrant$ cat in.ex a = 3 b = 76 a + b vagrant@vagrant:/vagrant$ cargo run Running `target/debug/main` vagrant@vagrant:/vagrant$ cat out.ll ; ModuleID = 'example_module' define i64 @main() { entry: ret i64 79 }

もし

ifを使用すると、事態はもう少し複雑になります。それを機能させる最も簡単な方法は、ローカル変数をスタックに保存し、LLVMに最適化を実行させることです。 LLVMでは、 allocaコマンドを使用してスタック変数を作成し、 load / storeコマンドを使用して読み取り/書き込みを行います。

これを行うために、新しいパーサールールを追加して、言語と文法を再度拡張します。

 expression -> Expr = if_expression / i:identifier _ "=" _ s:expression { Expr::Assign(i, Box::new(s)) } / sum if_expression -> Expr = "if" _ e:expression _ "{\n" _ then_body:statements _ "}" _ "else" _ "{\n" _ else_body:statements _ "}" { Expr::If(Box::new(e), then_body, else_body) }

そして、新しいタイプのASTノードを追加します。

 pub enum Expr { Literal(String), Ref(String), Assign(String, Box<Expr>), Add(Box<Expr>, Box<Expr>), Sub(Box<Expr>, Box<Expr>), Mul(Box<Expr>, Box<Expr>), Div(Box<Expr>, Box<Expr>), If(Box<Expr>, Vec<Expr>, Vec<Expr>), }

最後に、 ifステートメントのコードを生成します。

 unsafe fn codegen_expr(context: LLVMContextRef, builder: LLVMBuilderRef, func: LLVMValueRef, names: &mut HashMap<String, LLVMValueRef>, expr: Expr) -> LLVMValueRef { match expr { // ... Expr::If(condition, then_body, else_body) => { let condition_value = codegen_expr(context, builder, func, names, *condition); let int_type = llvm::core::LLVMInt64TypeInContext(context); let zero = llvm::core::LLVMConstInt(int_type, 0, 0); // `is_nonzero` is a 1-bit integer let name = CString::new("is_nonzero").unwrap(); let is_nonzero = llvm::core::LLVMBuildICmp(builder, llvm::LLVMIntPredicate::LLVMIntNE, condition_value, zero, name.as_ptr()); // It's fine to create blocks first, and then fill them in later. let entry_name = CString::new("entry").unwrap(); let then_block = llvm::core::LLVMAppendBasicBlockInContext(context, func, entry_name.as_ptr()); let else_block = llvm::core::LLVMAppendBasicBlockInContext(context, func, entry_name.as_ptr()); let merge_block = llvm::core::LLVMAppendBasicBlockInContext(context, func, entry_name.as_ptr()); llvm::core::LLVMBuildCondBr(builder, is_nonzero, then_block, else_block); llvm::core::LLVMPositionBuilderAtEnd(builder, then_block); let mut then_return = zero; for expr in then_body { then_return = codegen_expr(context, builder, func, names, expr); } llvm::core::LLVMBuildBr(builder, merge_block); llvm::core::LLVMPositionBuilderAtEnd(builder, else_block); let mut else_return = zero; for expr in else_body { else_return = codegen_expr(context, builder, func, names, expr); } llvm::core::LLVMBuildBr(builder, merge_block); // Position the builder so that it's ready to work on the next // expression. llvm::core::LLVMPositionBuilderAtEnd(builder, merge_block); zero } } }

多くのコードが、私たちが期待したことを行います。これで、このようなプログラムを実行できます。

 a = 1 if a { a = 42 } else { a = 13 } a

そのようなIRを生成します：

 ; ModuleID = 'example_module' define i64 @main() { entry: %a = alloca i64 store i64 1, i64* %a %a1 = load i64, i64* %a %is_nonzero = icmp ne i64 %a1, 0 br i1 %is_nonzero, label %entry2, label %entry3 entry2: ; preds = %entry store i64 42, i64* %a br label %entry4 entry3: ; preds = %entry store i64 13, i64* %a br label %entry4 entry4: ; preds = %entry3, %entry2 %a5 = load i64, i64* %a ret i64 %a5 }

しかし、まだ終わっていません。ここで、「式」 ifは常にゼロです。代わりに、 thenパスが実行される場合、ifはthen_returnに等しくなければならず、 そうでない場合はelse_returnになります。

どのパスをたどったかをLLVMに追跡させる方法は？「 Phi 」ノードを使用します。 phi命令にペアのリスト（ブロック、値）を与えると、phiノードはその前に実行されたブロックに対応する値を返します。

ifで終わります。 then_blockとelse_blockを更新する必要があることに注意してください。これは、「 then 」/「 else 」構造の最後のブロックを取得するために行われ、以前はthen_blockが「 then 」/」 elseの最初のブロックでした。

 // ... // This is mostly the same code as before, just note the new calls to // `LLVMGetInsertBlock`. llvm::core::LLVMPositionBuilderAtEnd(builder, then_block); let mut then_return = zero; for expr in then_body { then_return = codegen_expr(context, builder, func, names, expr); } llvm::core::LLVMBuildBr(builder, merge_block); let then_block = llvm::core::LLVMGetInsertBlock(builder); llvm::core::LLVMPositionBuilderAtEnd(builder, else_block); let mut else_return = zero; for expr in else_body { else_return = codegen_expr(context, builder, func, names, expr); } llvm::core::LLVMBuildBr(builder, merge_block); let else_block = llvm::core::LLVMGetInsertBlock(builder); // Insert the phi node llvm::core::LLVMPositionBuilderAtEnd(builder, merge_block); let phi_name = CString::new("iftmp").unwrap(); let phi = llvm::core::LLVMBuildPhi(builder, int_type, phi_name.as_ptr()); let mut values = vec![then_return, else_return]; let mut blocks = vec![then_block, else_block]; llvm::core::LLVMAddIncoming(phi, values.as_mut_ptr(), blocks.as_mut_ptr(), 2); phi

そしてここに、驚くべきコンパイラがあります：

 vagrant@vagrant:/vagrant$ cat in.ex a = 1 b = 0 c = if a { if b { 11 } else { 40 } } else { if b { 10 } else { 20 } } c + 2 vagrant@vagrant:/vagrant$ cargo run Running `target/debug/main` vagrant@vagrant:/vagrant$ lli-3.8 out.ll ; echo $? 42

かっこいい！このサンプル入力プログラム用に生成されるコードは次のとおりです。

 ; ModuleID = 'example_module' define i64 @main() { entry: %a = alloca i64 %b = alloca i64 %c = alloca i64 store i64 1, i64* %a store i64 0, i64* %b %a1 = load i64, i64* %a %is_nonzero = icmp ne i64 %a1, 0 br i1 %is_nonzero, label %entry2, label %entry3 entry2: ; preds = %entry %b5 = load i64, i64* %b %is_nonzero6 = icmp ne i64 %b5, 0 br i1 %is_nonzero6, label %entry7, label %entry8 entry3: ; preds = %entry %b10 = load i64, i64* %b %is_nonzero11 = icmp ne i64 %b10, 0 br i1 %is_nonzero11, label %entry12, label %entry13 entry4: ; preds = %entry14, %entry9 %iftmp16 = phi i64 [ %iftmp, %entry9 ], [ %iftmp15, %entry14 ] store i64 %iftmp16, i64* %c %c17 = load i64, i64* %c %addtmp = add i64 %c17, 2 ret i64 %addtmp entry7: ; preds = %entry2 br label %entry9 entry8: ; preds = %entry2 br label %entry9 entry9: ; preds = %entry8, %entry7 %iftmp = phi i64 [ 11, %entry7 ], [ 40, %entry8 ] br label %entry4 entry12: ; preds = %entry3 br label %entry14 entry13: ; preds = %entry3 br label %entry14 entry14: ; preds = %entry13, %entry12 %iftmp15 = phi i64 [ 10, %entry12 ], [ 20, %entry13 ] br label %entry4 }

ブロックが形成するパターンに注意してください：エントリブロックを除いて、3つのグループを形成し、thenブランチが最初に、次にelseブランチ、次にマージブロック（phi命令で学習可能）を形成します。これは、「if」という表現を見つけるたびに、3つの新しいブロックをmainに追加するという事実の結果です。ブロックのトリプルは、ASTツリーがトラバースされる順序で配置されます。

以上です！今、あなたが独立して行動するのに十分な基礎を持っていることを願っています。

Rust：A Beginner's Guideで書かれたLLVM Toy Frontend

開発のための作業環境を手に入れる

少しささいなプログラム

算術

変数

もし

More articles: