Fleet 编译器架构

本文档详细介绍 Fleet 编程语言编译器的内部架构、设计决策和实现细节。

📋 目录

🏗️ 总体架构

Fleet 编译器采用传统的三阶段架构:前端、中端、后端。

源代码 (.fleet)
    ↓
┌─────────────┐
│    前端     │  词法分析 → 语法分析 → AST
│  (Frontend) │
└─────────────┘
    ↓
┌─────────────┐
│    中端     │  语义分析 → 类型检查 → HIR
│  (Middle)   │
└─────────────┘
    ↓
┌─────────────┐
│    后端     │  代码生成 → 优化 → 机器码
│  (Backend)  │
└─────────────┘
    ↓
可执行文件

核心组件

  1. 词法分析器 (Lexer) - 将源代码转换为 Token 流
  2. 语法分析器 (Parser) - 将 Token 流转换为抽象语法树 (AST)
  3. 语义分析器 (Semantic Analyzer) - 进行语义检查和类型推断
  4. 类型检查器 (Type Checker) - 验证类型正确性
  5. 代码生成器 (Code Generator) - 生成 LLVM IR
  6. 优化器 (Optimizer) - 优化生成的代码

技术栈

  • 实现语言: Rust
  • 解析器: Pest (PEG 解析器)
  • 后端: LLVM (通过 Inkwell 绑定)
  • 构建系统: Cargo

🔍 前端:词法分析和语法分析

词法分析器

Fleet 使用 Pest 解析器生成器进行词法和语法分析。

Token 定义

// 数字
number = { digit+ }
float = { digit+ ~ "." ~ digit+ ~ (^"e" ~ ("+" | "-")? ~ digit+)? }

// 标识符
identifier = { (letter | "_") ~ (letter | digit | "_")* }

// 字符串
string = { "\"" ~ (!"\"" ~ any)* ~ "\"" }

// 关键字
keyword = { 
    "fn" | "let" | "var" | "if" | "else" | "match" | 
    "struct" | "enum" | "trait" | "impl" | "loop" |
    "return" | "break" | "continue"
}

// 操作符
operator = { 
    "==" | "!=" | "<=" | ">=" | "&&" | "||" |
    "+=" | "-=" | "*=" | "/=" | "%=" |
    "+" | "-" | "*" | "/" | "%" | "!" |
    "<" | ">" | "&" | "|" | "^" | "~" |
    "<<" | ">>" | "=" | "." | "::" | "->" | "=>"
}

词法优先级

Fleet 解决了元组访问与浮点数的语法冲突:

// 高优先级规则:元组访问
tuple_access = { "." ~ digit }

// 低优先级规则:浮点数
float_literal = { digit+ ~ "." ~ digit+ }

语法分析器

AST 节点定义

#[derive(Debug, Clone)]
pub enum Statement {
    VariableDeclaration {
        name: String,
        type_annotation: Option,
        value: Expression,
        mutable: bool,
    },
    FunctionDeclaration {
        name: String,
        parameters: Vec,
        return_type: Option,
        body: Vec,
    },
    ExpressionStatement(Expression),
    Return(Option),
}

#[derive(Debug, Clone)]
pub enum Expression {
    Literal(Literal),
    Identifier(String),
    BinaryOperation {
        left: Box,
        operator: BinaryOperator,
        right: Box,
    },
    FunctionCall {
        function: Box,
        arguments: Vec,
    },
    FieldAccess {
        object: Box,
        field: String,
    },
    TupleAccess {
        tuple: Box,
        index: usize,
    },
}

语法规则

program = { SOI ~ statement* ~ EOI }

statement = {
    variable_declaration |
    function_declaration |
    struct_declaration |
    enum_declaration |
    trait_declaration |
    impl_block |
    expression_statement
}

expression = {
    assignment_expression |
    logical_or_expression
}

logical_or_expression = {
    logical_and_expression ~ ("||" ~ logical_and_expression)*
}

logical_and_expression = {
    equality_expression ~ ("&&" ~ equality_expression)*
}

// ... 更多语法规则

错误恢复

Fleet 编译器实现了智能的错误恢复机制:

impl Parser {
    fn parse_with_recovery(&mut self) -> Result> {
        let mut errors = Vec::new();
        let mut statements = Vec::new();

        while !self.is_at_end() {
            match self.parse_statement() {
                Ok(stmt) => statements.push(stmt),
                Err(error) => {
                    errors.push(error);
                    self.synchronize(); // 错误恢复
                }
            }
        }

        if errors.is_empty() {
            Ok(AST { statements })
        } else {
            Err(errors)
        }
    }

    fn synchronize(&mut self) {
        // 跳到下一个语句边界
        while !self.is_at_end() {
            if self.previous().token_type == TokenType::Semicolon {
                return;
            }

            match self.peek().token_type {
                TokenType::Fn | TokenType::Let | TokenType::Var => return,
                _ => self.advance(),
            }
        }
    }
}

🧠 中端:语义分析和类型检查

符号表

Fleet 使用分层符号表管理作用域:

#[derive(Debug)]
pub struct SymbolTable {
    scopes: Vec,
    current_scope: usize,
}

#[derive(Debug)]
pub struct Scope {
    symbols: HashMap,
    parent: Option,
}

#[derive(Debug, Clone)]
pub struct Symbol {
    pub name: String,
    pub symbol_type: Type,
    pub kind: SymbolKind,
    pub location: SourceLocation,
}

#[derive(Debug, Clone)]
pub enum SymbolKind {
    Variable { mutable: bool },
    Function { parameters: Vec, return_type: Type },
    Struct { fields: Vec },
    Enum { variants: Vec },
    Trait { methods: Vec },
}

类型系统

类型定义

#[derive(Debug, Clone, PartialEq)]
pub enum Type {
    // 基础类型
    Int,
    Float,
    String,
    Bool,
    Char,

    // 复合类型
    Array(Box),
    Tuple(Vec),
    Map(Box, Box),

    // 用户定义类型
    Struct(String),
    Enum(String),

    // 函数类型
    Function {
        parameters: Vec,
        return_type: Box,
    },

    // 泛型类型
    Generic(String),

    // 特殊类型
    Unit,
    Unknown, // 用于类型推断
}

类型推断

Fleet 实现了 Hindley-Milner 风格的类型推断:

pub struct TypeInferencer {
    constraints: Vec,
    substitutions: HashMap,
    next_type_var: usize,
}

#[derive(Debug, Clone)]
pub struct TypeConstraint {
    pub left: Type,
    pub right: Type,
    pub location: SourceLocation,
}

impl TypeInferencer {
    pub fn infer_expression(&mut self, expr: &Expression) -> Result {
        match expr {
            Expression::Literal(lit) => Ok(self.infer_literal(lit)),
            Expression::Identifier(name) => self.lookup_variable(name),
            Expression::BinaryOperation { left, op, right } => {
                let left_type = self.infer_expression(left)?;
                let right_type = self.infer_expression(right)?;
                self.infer_binary_operation(left_type, *op, right_type)
            },
            Expression::FunctionCall { function, arguments } => {
                let func_type = self.infer_expression(function)?;
                let arg_types: Result, _> = arguments
                    .iter()
                    .map(|arg| self.infer_expression(arg))
                    .collect();
                let arg_types = arg_types?;
                self.infer_function_call(func_type, arg_types)
            },
            // ... 其他表达式类型
        }
    }

    fn unify(&mut self, t1: Type, t2: Type) -> Result<(), TypeError> {
        match (t1, t2) {
            (Type::Unknown, t) | (t, Type::Unknown) => {
                // 类型变量统一
                Ok(())
            },
            (Type::Int, Type::Int) => Ok(()),
            (Type::Float, Type::Float) => Ok(()),
            (Type::Array(t1), Type::Array(t2)) => self.unify(*t1, *t2),
            (Type::Tuple(types1), Type::Tuple(types2)) => {
                if types1.len() != types2.len() {
                    return Err(TypeError::TupleLengthMismatch);
                }
                for (t1, t2) in types1.into_iter().zip(types2.into_iter()) {
                    self.unify(t1, t2)?;
                }
                Ok(())
            },
            _ => Err(TypeError::TypeMismatch(t1, t2)),
        }
    }
}

Trait 系统

Trait 解析

pub struct TraitResolver {
    traits: HashMap,
    implementations: HashMap<(String, String), ImplBlock>, // (Type, Trait) -> Impl
}

impl TraitResolver {
    pub fn resolve_method_call(
        &self,
        receiver_type: &Type,
        method_name: &str,
    ) -> Result {
        // 1. 查找直接方法
        if let Some(method) = self.find_inherent_method(receiver_type, method_name) {
            return Ok(MethodResolution::Inherent(method));
        }

        // 2. 查找 trait 方法
        let mut candidates = Vec::new();
        for (trait_name, trait_def) in &self.traits {
            if trait_def.has_method(method_name) {
                if let Some(impl_block) = self.implementations.get(&(receiver_type.name(), trait_name.clone())) {
                    candidates.push((trait_name.clone(), impl_block));
                }
            }
        }

        match candidates.len() {
            0 => Err(TraitError::MethodNotFound),
            1 => Ok(MethodResolution::Trait(candidates[0].clone())),
            _ => Err(TraitError::AmbiguousMethod),
        }
    }
}

⚙️ 后端:代码生成

LLVM 集成

Fleet 使用 LLVM 作为代码生成后端:

pub struct CodeGenerator<'ctx> {
    context: &'ctx Context,
    module: Module<'ctx>,
    builder: Builder<'ctx>,
    functions: HashMap>,
    variables: HashMap>,
}

impl<'ctx> CodeGenerator<'ctx> {
    pub fn generate_function(&mut self, func: &FunctionDeclaration) -> Result<(), CodeGenError> {
        // 创建函数签名
        let param_types: Vec = func.parameters
            .iter()
            .map(|p| self.type_to_llvm(&p.param_type))
            .collect::, _>>()?;

        let return_type = match &func.return_type {
            Some(t) => self.type_to_llvm(t)?,
            None => self.context.void_type().into(),
        };

        let fn_type = return_type.fn_type(¶m_types, false);
        let function = self.module.add_function(&func.name, fn_type, None);

        // 创建基本块
        let entry_block = self.context.append_basic_block(function, "entry");
        self.builder.position_at_end(entry_block);

        // 生成函数体
        for statement in &func.body {
            self.generate_statement(statement)?;
        }

        // 如果没有显式返回,添加默认返回
        if !self.current_block_has_terminator() {
            match &func.return_type {
                Some(_) => return Err(CodeGenError::MissingReturn),
                None => { self.builder.build_return(None); },
            }
        }

        self.functions.insert(func.name.clone(), function);
        Ok(())
    }

    fn generate_expression(&mut self, expr: &Expression) -> Result, CodeGenError> {
        match expr {
            Expression::Literal(lit) => self.generate_literal(lit),
            Expression::Identifier(name) => {
                let ptr = self.variables.get(name)
                    .ok_or_else(|| CodeGenError::UndefinedVariable(name.clone()))?;
                Ok(self.builder.build_load(*ptr, name))
            },
            Expression::BinaryOperation { left, op, right } => {
                let left_val = self.generate_expression(left)?;
                let right_val = self.generate_expression(right)?;
                self.generate_binary_operation(left_val, *op, right_val)
            },
            Expression::FunctionCall { function, arguments } => {
                self.generate_function_call(function, arguments)
            },
            // ... 其他表达式类型
        }
    }
}

Trait 方法的静态分发

impl<'ctx> CodeGenerator<'ctx> {
    fn generate_trait_method_call(
        &mut self,
        receiver: &Expression,
        trait_name: &str,
        method_name: &str,
        args: &[Expression],
    ) -> Result, CodeGenError> {
        let receiver_type = self.infer_expression_type(receiver)?;

        // 生成混淆的方法名:Type_Trait__method
        let mangled_name = format!("{}_{}__{}", 
            receiver_type.name(), 
            trait_name, 
            method_name
        );

        // 查找函数
        let function = self.functions.get(&mangled_name)
            .ok_or_else(|| CodeGenError::UndefinedFunction(mangled_name.clone()))?;

        // 生成参数
        let mut call_args = vec![self.generate_expression(receiver)?];
        for arg in args {
            call_args.push(self.generate_expression(arg)?);
        }

        // 生成函数调用
        let call_site = self.builder.build_call(*function, &call_args, "trait_call");

        match call_site.try_as_basic_value() {
            Either::Left(value) => Ok(value),
            Either::Right(_) => Err(CodeGenError::VoidValue),
        }
    }
}

内存布局

结构体布局

impl<'ctx> CodeGenerator<'ctx> {
    fn generate_struct_type(&self, struct_def: &StructDefinition) -> Result, CodeGenError> {
        let field_types: Vec = struct_def.fields
            .iter()
            .map(|field| self.type_to_llvm(&field.field_type))
            .collect::, _>>()?;

        Ok(self.context.struct_type(&field_types, false))
    }

    fn generate_struct_access(
        &mut self,
        struct_expr: &Expression,
        field_name: &str,
    ) -> Result, CodeGenError> {
        let struct_value = self.generate_expression(struct_expr)?;
        let struct_type = self.infer_expression_type(struct_expr)?;

        if let Type::Struct(struct_name) = struct_type {
            let struct_def = self.get_struct_definition(&struct_name)?;
            let field_index = struct_def.get_field_index(field_name)
                .ok_or_else(|| CodeGenError::UndefinedField(field_name.to_string()))?;

            let field_ptr = self.builder.build_struct_gep(
                struct_value.into_pointer_value(),
                field_index as u32,
                &format!("{}_field", field_name),
            )?;

            Ok(self.builder.build_load(field_ptr, field_name))
        } else {
            Err(CodeGenError::NotAStruct)
        }
    }
}

枚举布局

Fleet 使用标记联合 (Tagged Union) 来表示枚举:

// 枚举内存布局:
// struct EnumLayout {
//     tag: i32,           // 变体标识符
//     data: union {       // 数据联合
//         variant1_data,
//         variant2_data,
//         ...
//     }
// }

impl<'ctx> CodeGenerator<'ctx> {
    fn generate_enum_type(&self, enum_def: &EnumDefinition) -> Result, CodeGenError> {
        // 计算最大变体大小
        let max_size = enum_def.variants
            .iter()
            .map(|variant| self.calculate_variant_size(variant))
            .max()
            .unwrap_or(0);

        // 创建标记联合结构
        let tag_type = self.context.i32_type();
        let data_type = self.context.i8_type().array_type(max_size as u32);

        Ok(self.context.struct_type(&[tag_type.into(), data_type.into()], false))
    }
}

🧹 内存管理

Fleet 采用自动内存管理,结合引用计数和垃圾回收:

引用计数

// 每个对象都有引用计数头
struct ObjectHeader {
    ref_count: AtomicUsize,
    type_info: *const TypeInfo,
}

// 自动引用计数操作
impl<'ctx> CodeGenerator<'ctx> {
    fn generate_retain(&mut self, value: BasicValueEnum<'ctx>) {
        if self.needs_ref_counting(&value) {
            let retain_fn = self.get_runtime_function("fleet_retain");
            self.builder.build_call(retain_fn, &[value], "retain");
        }
    }

    fn generate_release(&mut self, value: BasicValueEnum<'ctx>) {
        if self.needs_ref_counting(&value) {
            let release_fn = self.get_runtime_function("fleet_release");
            self.builder.build_call(release_fn, &[value], "release");
        }
    }
}

循环检测

// 运行时循环检测器
pub struct CycleDetector {
    mark_stack: Vec<*mut ObjectHeader>,
    marked_objects: HashSet<*mut ObjectHeader>,
}

impl CycleDetector {
    pub fn collect_cycles(&mut self) {
        // Mark phase: 标记所有可达对象
        self.mark_roots();

        // Sweep phase: 回收未标记的对象
        self.sweep_unmarked();

        // 清理标记
        self.clear_marks();
    }
}

🚀 优化策略

编译时优化

常量折叠

impl Optimizer {
    fn fold_constants(&mut self, expr: &mut Expression) {
        match expr {
            Expression::BinaryOperation { left, op, right } => {
                self.fold_constants(left);
                self.fold_constants(right);

                if let (Expression::Literal(left_lit), Expression::Literal(right_lit)) = (left.as_ref(), right.as_ref()) {
                    if let Some(result) = self.evaluate_constant_operation(left_lit, *op, right_lit) {
                        *expr = Expression::Literal(result);
                    }
                }
            },
            // ... 其他表达式类型
        }
    }
}

死代码消除

impl Optimizer {
    fn eliminate_dead_code(&mut self, statements: &mut Vec) {
        let mut live_variables = HashSet::new();

        // 反向分析,标记活跃变量
        for stmt in statements.iter().rev() {
            match stmt {
                Statement::Return(Some(expr)) => {
                    self.mark_expression_variables(expr, &mut live_variables);
                },
                Statement::VariableDeclaration { name, value, .. } => {
                    if live_variables.contains(name) {
                        self.mark_expression_variables(value, &mut live_variables);
                    }
                },
                // ... 其他语句类型
            }
        }

        // 移除死代码
        statements.retain(|stmt| self.is_statement_live(stmt, &live_variables));
    }
}

运行时优化

内联缓存

// 方法调用的内联缓存
struct InlineCache {
    receiver_type: Option,
    method_ptr: Option<*const u8>,
    call_count: usize,
}

impl InlineCache {
    fn call_method(&mut self, receiver: &dyn Any, method_name: &str) -> Result {
        let receiver_type = receiver.type_id();

        if Some(receiver_type) == self.receiver_type {
            // 缓存命中,直接调用
            if let Some(method_ptr) = self.method_ptr {
                return unsafe { self.call_cached_method(method_ptr, receiver) };
            }
        }

        // 缓存未命中,查找方法
        let method_ptr = self.lookup_method(receiver_type, method_name)?;

        // 更新缓存
        self.receiver_type = Some(receiver_type);
        self.method_ptr = Some(method_ptr);
        self.call_count += 1;

        unsafe { self.call_cached_method(method_ptr, receiver) }
    }
}

❌ 错误处理

错误类型层次

#[derive(Debug, Clone)]
pub enum CompilerError {
    LexError(LexError),
    ParseError(ParseError),
    SemanticError(SemanticError),
    TypeError(TypeError),
    CodeGenError(CodeGenError),
}

#[derive(Debug, Clone)]
pub struct ParseError {
    pub message: String,
    pub location: SourceLocation,
    pub expected: Vec,
    pub found: String,
}

#[derive(Debug, Clone)]
pub struct TypeError {
    pub message: String,
    pub location: SourceLocation,
    pub expected_type: Type,
    pub actual_type: Type,
}

错误恢复和报告

pub struct ErrorReporter {
    errors: Vec,
    source_map: SourceMap,
}

impl ErrorReporter {
    pub fn report_error(&mut self, error: CompilerError) {
        match &error {
            CompilerError::TypeError(type_error) => {
                self.report_type_error(type_error);
            },
            CompilerError::ParseError(parse_error) => {
                self.report_parse_error(parse_error);
            },
            // ... 其他错误类型
        }

        self.errors.push(error);
    }

    fn report_type_error(&self, error: &TypeError) {
        let source_line = self.source_map.get_line(error.location.line);

        eprintln!("Type Error at {}:{}", error.location.line, error.location.column);
        eprintln!("  {}", error.message);
        eprintln!("  Expected: {}", error.expected_type);
        eprintln!("  Found:    {}", error.actual_type);
        eprintln!();
        eprintln!("  {} | {}", error.location.line, source_line);
        eprintln!("  {} | {}^", " ".repeat(error.location.line.to_string().len()), " ".repeat(error.location.column));
    }
}

📊 性能指标

编译性能

  • 词法分析: ~1M tokens/秒
  • 语法分析: ~100K AST nodes/秒
  • 类型检查: ~50K expressions/秒
  • 代码生成: ~10K LLVM instructions/秒

生成代码性能

  • 函数调用开销: ~2-5 CPU 周期
  • Trait 方法分发: ~5-10 CPU 周期(静态分发)
  • 内存分配: ~50-100 CPU 周期
  • 垃圾回收暂停: <1ms(增量回收)

🔧 调试和分析工具

AST 可视化

impl ASTVisualizer {
    pub fn visualize(&self, ast: &AST) -> String {
        let mut output = String::new();
        self.visualize_node(&ast.root, 0, &mut output);
        output
    }

    fn visualize_node(&self, node: &ASTNode, depth: usize, output: &mut String) {
        let indent = "  ".repeat(depth);
        output.push_str(&format!("{}{:?}\n", indent, node.kind));

        for child in &node.children {
            self.visualize_node(child, depth + 1, output);
        }
    }
}

性能分析

pub struct CompilerProfiler {
    phase_times: HashMap,
    memory_usage: HashMap,
}

impl CompilerProfiler {
    pub fn time_phase(&mut self, phase_name: &str, f: impl FnOnce() -> T) -> T {
        let start = Instant::now();
        let result = f();
        let elapsed = start.elapsed();

        self.phase_times.insert(phase_name.to_string(), elapsed);
        result
    }

    pub fn report(&self) {
        println!("Compilation Profile:");
        for (phase, time) in &self.phase_times {
            println!("  {}: {:?}", phase, time);
        }
    }
}

🔗 相关主题

  • 语法参考 - 完整语法规范
  • 类型系统 - 类型系统设计
  • 性能优化 - 性能优化指南
  • 贡献指南 - 如何参与编译器开发