Fleet 编译器架构
本文档详细介绍 Fleet 编程语言编译器的内部架构、设计决策和实现细节。
📋 目录
🏗️ 总体架构
Fleet 编译器采用传统的三阶段架构:前端、中端、后端。
源代码 (.fleet)
↓
┌─────────────┐
│ 前端 │ 词法分析 → 语法分析 → AST
│ (Frontend) │
└─────────────┘
↓
┌─────────────┐
│ 中端 │ 语义分析 → 类型检查 → HIR
│ (Middle) │
└─────────────┘
↓
┌─────────────┐
│ 后端 │ 代码生成 → 优化 → 机器码
│ (Backend) │
└─────────────┘
↓
可执行文件
核心组件
- 词法分析器 (Lexer) - 将源代码转换为 Token 流
- 语法分析器 (Parser) - 将 Token 流转换为抽象语法树 (AST)
- 语义分析器 (Semantic Analyzer) - 进行语义检查和类型推断
- 类型检查器 (Type Checker) - 验证类型正确性
- 代码生成器 (Code Generator) - 生成 LLVM IR
- 优化器 (Optimizer) - 优化生成的代码
技术栈
- 实现语言: Rust
- 解析器: Pest (PEG 解析器)
- 后端: LLVM (通过 Inkwell 绑定)
- 构建系统: Cargo
🔍 前端:词法分析和语法分析
词法分析器
Fleet 使用 Pest 解析器生成器进行词法和语法分析。
Token 定义
// 数字
number = { digit+ }
float = { digit+ ~ "." ~ digit+ ~ (^"e" ~ ("+" | "-")? ~ digit+)? }
// 标识符
identifier = { (letter | "_") ~ (letter | digit | "_")* }
// 字符串
string = { "\"" ~ (!"\"" ~ any)* ~ "\"" }
// 关键字
keyword = {
"fn" | "let" | "var" | "if" | "else" | "match" |
"struct" | "enum" | "trait" | "impl" | "loop" |
"return" | "break" | "continue"
}
// 操作符
operator = {
"==" | "!=" | "<=" | ">=" | "&&" | "||" |
"+=" | "-=" | "*=" | "/=" | "%=" |
"+" | "-" | "*" | "/" | "%" | "!" |
"<" | ">" | "&" | "|" | "^" | "~" |
"<<" | ">>" | "=" | "." | "::" | "->" | "=>"
}
词法优先级
Fleet 解决了元组访问与浮点数的语法冲突:
// 高优先级规则:元组访问
tuple_access = { "." ~ digit }
// 低优先级规则:浮点数
float_literal = { digit+ ~ "." ~ digit+ }
语法分析器
AST 节点定义
#[derive(Debug, Clone)]
pub enum Statement {
VariableDeclaration {
name: String,
type_annotation: Option,
value: Expression,
mutable: bool,
},
FunctionDeclaration {
name: String,
parameters: Vec,
return_type: Option,
body: Vec,
},
ExpressionStatement(Expression),
Return(Option),
}
#[derive(Debug, Clone)]
pub enum Expression {
Literal(Literal),
Identifier(String),
BinaryOperation {
left: Box,
operator: BinaryOperator,
right: Box,
},
FunctionCall {
function: Box,
arguments: Vec,
},
FieldAccess {
object: Box,
field: String,
},
TupleAccess {
tuple: Box,
index: usize,
},
}
语法规则
program = { SOI ~ statement* ~ EOI }
statement = {
variable_declaration |
function_declaration |
struct_declaration |
enum_declaration |
trait_declaration |
impl_block |
expression_statement
}
expression = {
assignment_expression |
logical_or_expression
}
logical_or_expression = {
logical_and_expression ~ ("||" ~ logical_and_expression)*
}
logical_and_expression = {
equality_expression ~ ("&&" ~ equality_expression)*
}
// ... 更多语法规则
错误恢复
Fleet 编译器实现了智能的错误恢复机制:
impl Parser {
fn parse_with_recovery(&mut self) -> Result> {
let mut errors = Vec::new();
let mut statements = Vec::new();
while !self.is_at_end() {
match self.parse_statement() {
Ok(stmt) => statements.push(stmt),
Err(error) => {
errors.push(error);
self.synchronize(); // 错误恢复
}
}
}
if errors.is_empty() {
Ok(AST { statements })
} else {
Err(errors)
}
}
fn synchronize(&mut self) {
// 跳到下一个语句边界
while !self.is_at_end() {
if self.previous().token_type == TokenType::Semicolon {
return;
}
match self.peek().token_type {
TokenType::Fn | TokenType::Let | TokenType::Var => return,
_ => self.advance(),
}
}
}
}
🧠 中端:语义分析和类型检查
符号表
Fleet 使用分层符号表管理作用域:
#[derive(Debug)]
pub struct SymbolTable {
scopes: Vec,
current_scope: usize,
}
#[derive(Debug)]
pub struct Scope {
symbols: HashMap,
parent: Option,
}
#[derive(Debug, Clone)]
pub struct Symbol {
pub name: String,
pub symbol_type: Type,
pub kind: SymbolKind,
pub location: SourceLocation,
}
#[derive(Debug, Clone)]
pub enum SymbolKind {
Variable { mutable: bool },
Function { parameters: Vec, return_type: Type },
Struct { fields: Vec },
Enum { variants: Vec },
Trait { methods: Vec },
}
类型系统
类型定义
#[derive(Debug, Clone, PartialEq)]
pub enum Type {
// 基础类型
Int,
Float,
String,
Bool,
Char,
// 复合类型
Array(Box),
Tuple(Vec),
Map(Box, Box),
// 用户定义类型
Struct(String),
Enum(String),
// 函数类型
Function {
parameters: Vec,
return_type: Box,
},
// 泛型类型
Generic(String),
// 特殊类型
Unit,
Unknown, // 用于类型推断
}
类型推断
Fleet 实现了 Hindley-Milner 风格的类型推断:
pub struct TypeInferencer {
constraints: Vec,
substitutions: HashMap,
next_type_var: usize,
}
#[derive(Debug, Clone)]
pub struct TypeConstraint {
pub left: Type,
pub right: Type,
pub location: SourceLocation,
}
impl TypeInferencer {
pub fn infer_expression(&mut self, expr: &Expression) -> Result {
match expr {
Expression::Literal(lit) => Ok(self.infer_literal(lit)),
Expression::Identifier(name) => self.lookup_variable(name),
Expression::BinaryOperation { left, op, right } => {
let left_type = self.infer_expression(left)?;
let right_type = self.infer_expression(right)?;
self.infer_binary_operation(left_type, *op, right_type)
},
Expression::FunctionCall { function, arguments } => {
let func_type = self.infer_expression(function)?;
let arg_types: Result, _> = arguments
.iter()
.map(|arg| self.infer_expression(arg))
.collect();
let arg_types = arg_types?;
self.infer_function_call(func_type, arg_types)
},
// ... 其他表达式类型
}
}
fn unify(&mut self, t1: Type, t2: Type) -> Result<(), TypeError> {
match (t1, t2) {
(Type::Unknown, t) | (t, Type::Unknown) => {
// 类型变量统一
Ok(())
},
(Type::Int, Type::Int) => Ok(()),
(Type::Float, Type::Float) => Ok(()),
(Type::Array(t1), Type::Array(t2)) => self.unify(*t1, *t2),
(Type::Tuple(types1), Type::Tuple(types2)) => {
if types1.len() != types2.len() {
return Err(TypeError::TupleLengthMismatch);
}
for (t1, t2) in types1.into_iter().zip(types2.into_iter()) {
self.unify(t1, t2)?;
}
Ok(())
},
_ => Err(TypeError::TypeMismatch(t1, t2)),
}
}
}
Trait 系统
Trait 解析
pub struct TraitResolver {
traits: HashMap,
implementations: HashMap<(String, String), ImplBlock>, // (Type, Trait) -> Impl
}
impl TraitResolver {
pub fn resolve_method_call(
&self,
receiver_type: &Type,
method_name: &str,
) -> Result {
// 1. 查找直接方法
if let Some(method) = self.find_inherent_method(receiver_type, method_name) {
return Ok(MethodResolution::Inherent(method));
}
// 2. 查找 trait 方法
let mut candidates = Vec::new();
for (trait_name, trait_def) in &self.traits {
if trait_def.has_method(method_name) {
if let Some(impl_block) = self.implementations.get(&(receiver_type.name(), trait_name.clone())) {
candidates.push((trait_name.clone(), impl_block));
}
}
}
match candidates.len() {
0 => Err(TraitError::MethodNotFound),
1 => Ok(MethodResolution::Trait(candidates[0].clone())),
_ => Err(TraitError::AmbiguousMethod),
}
}
}
⚙️ 后端:代码生成
LLVM 集成
Fleet 使用 LLVM 作为代码生成后端:
pub struct CodeGenerator<'ctx> {
context: &'ctx Context,
module: Module<'ctx>,
builder: Builder<'ctx>,
functions: HashMap>,
variables: HashMap>,
}
impl<'ctx> CodeGenerator<'ctx> {
pub fn generate_function(&mut self, func: &FunctionDeclaration) -> Result<(), CodeGenError> {
// 创建函数签名
let param_types: Vec = func.parameters
.iter()
.map(|p| self.type_to_llvm(&p.param_type))
.collect::, _>>()?;
let return_type = match &func.return_type {
Some(t) => self.type_to_llvm(t)?,
None => self.context.void_type().into(),
};
let fn_type = return_type.fn_type(¶m_types, false);
let function = self.module.add_function(&func.name, fn_type, None);
// 创建基本块
let entry_block = self.context.append_basic_block(function, "entry");
self.builder.position_at_end(entry_block);
// 生成函数体
for statement in &func.body {
self.generate_statement(statement)?;
}
// 如果没有显式返回,添加默认返回
if !self.current_block_has_terminator() {
match &func.return_type {
Some(_) => return Err(CodeGenError::MissingReturn),
None => { self.builder.build_return(None); },
}
}
self.functions.insert(func.name.clone(), function);
Ok(())
}
fn generate_expression(&mut self, expr: &Expression) -> Result, CodeGenError> {
match expr {
Expression::Literal(lit) => self.generate_literal(lit),
Expression::Identifier(name) => {
let ptr = self.variables.get(name)
.ok_or_else(|| CodeGenError::UndefinedVariable(name.clone()))?;
Ok(self.builder.build_load(*ptr, name))
},
Expression::BinaryOperation { left, op, right } => {
let left_val = self.generate_expression(left)?;
let right_val = self.generate_expression(right)?;
self.generate_binary_operation(left_val, *op, right_val)
},
Expression::FunctionCall { function, arguments } => {
self.generate_function_call(function, arguments)
},
// ... 其他表达式类型
}
}
}
Trait 方法的静态分发
impl<'ctx> CodeGenerator<'ctx> {
fn generate_trait_method_call(
&mut self,
receiver: &Expression,
trait_name: &str,
method_name: &str,
args: &[Expression],
) -> Result, CodeGenError> {
let receiver_type = self.infer_expression_type(receiver)?;
// 生成混淆的方法名:Type_Trait__method
let mangled_name = format!("{}_{}__{}",
receiver_type.name(),
trait_name,
method_name
);
// 查找函数
let function = self.functions.get(&mangled_name)
.ok_or_else(|| CodeGenError::UndefinedFunction(mangled_name.clone()))?;
// 生成参数
let mut call_args = vec![self.generate_expression(receiver)?];
for arg in args {
call_args.push(self.generate_expression(arg)?);
}
// 生成函数调用
let call_site = self.builder.build_call(*function, &call_args, "trait_call");
match call_site.try_as_basic_value() {
Either::Left(value) => Ok(value),
Either::Right(_) => Err(CodeGenError::VoidValue),
}
}
}
内存布局
结构体布局
impl<'ctx> CodeGenerator<'ctx> {
fn generate_struct_type(&self, struct_def: &StructDefinition) -> Result, CodeGenError> {
let field_types: Vec = struct_def.fields
.iter()
.map(|field| self.type_to_llvm(&field.field_type))
.collect::, _>>()?;
Ok(self.context.struct_type(&field_types, false))
}
fn generate_struct_access(
&mut self,
struct_expr: &Expression,
field_name: &str,
) -> Result, CodeGenError> {
let struct_value = self.generate_expression(struct_expr)?;
let struct_type = self.infer_expression_type(struct_expr)?;
if let Type::Struct(struct_name) = struct_type {
let struct_def = self.get_struct_definition(&struct_name)?;
let field_index = struct_def.get_field_index(field_name)
.ok_or_else(|| CodeGenError::UndefinedField(field_name.to_string()))?;
let field_ptr = self.builder.build_struct_gep(
struct_value.into_pointer_value(),
field_index as u32,
&format!("{}_field", field_name),
)?;
Ok(self.builder.build_load(field_ptr, field_name))
} else {
Err(CodeGenError::NotAStruct)
}
}
}
枚举布局
Fleet 使用标记联合 (Tagged Union) 来表示枚举:
// 枚举内存布局:
// struct EnumLayout {
// tag: i32, // 变体标识符
// data: union { // 数据联合
// variant1_data,
// variant2_data,
// ...
// }
// }
impl<'ctx> CodeGenerator<'ctx> {
fn generate_enum_type(&self, enum_def: &EnumDefinition) -> Result, CodeGenError> {
// 计算最大变体大小
let max_size = enum_def.variants
.iter()
.map(|variant| self.calculate_variant_size(variant))
.max()
.unwrap_or(0);
// 创建标记联合结构
let tag_type = self.context.i32_type();
let data_type = self.context.i8_type().array_type(max_size as u32);
Ok(self.context.struct_type(&[tag_type.into(), data_type.into()], false))
}
}
🧹 内存管理
Fleet 采用自动内存管理,结合引用计数和垃圾回收:
引用计数
// 每个对象都有引用计数头
struct ObjectHeader {
ref_count: AtomicUsize,
type_info: *const TypeInfo,
}
// 自动引用计数操作
impl<'ctx> CodeGenerator<'ctx> {
fn generate_retain(&mut self, value: BasicValueEnum<'ctx>) {
if self.needs_ref_counting(&value) {
let retain_fn = self.get_runtime_function("fleet_retain");
self.builder.build_call(retain_fn, &[value], "retain");
}
}
fn generate_release(&mut self, value: BasicValueEnum<'ctx>) {
if self.needs_ref_counting(&value) {
let release_fn = self.get_runtime_function("fleet_release");
self.builder.build_call(release_fn, &[value], "release");
}
}
}
循环检测
// 运行时循环检测器
pub struct CycleDetector {
mark_stack: Vec<*mut ObjectHeader>,
marked_objects: HashSet<*mut ObjectHeader>,
}
impl CycleDetector {
pub fn collect_cycles(&mut self) {
// Mark phase: 标记所有可达对象
self.mark_roots();
// Sweep phase: 回收未标记的对象
self.sweep_unmarked();
// 清理标记
self.clear_marks();
}
}
🚀 优化策略
编译时优化
常量折叠
impl Optimizer {
fn fold_constants(&mut self, expr: &mut Expression) {
match expr {
Expression::BinaryOperation { left, op, right } => {
self.fold_constants(left);
self.fold_constants(right);
if let (Expression::Literal(left_lit), Expression::Literal(right_lit)) = (left.as_ref(), right.as_ref()) {
if let Some(result) = self.evaluate_constant_operation(left_lit, *op, right_lit) {
*expr = Expression::Literal(result);
}
}
},
// ... 其他表达式类型
}
}
}
死代码消除
impl Optimizer {
fn eliminate_dead_code(&mut self, statements: &mut Vec) {
let mut live_variables = HashSet::new();
// 反向分析,标记活跃变量
for stmt in statements.iter().rev() {
match stmt {
Statement::Return(Some(expr)) => {
self.mark_expression_variables(expr, &mut live_variables);
},
Statement::VariableDeclaration { name, value, .. } => {
if live_variables.contains(name) {
self.mark_expression_variables(value, &mut live_variables);
}
},
// ... 其他语句类型
}
}
// 移除死代码
statements.retain(|stmt| self.is_statement_live(stmt, &live_variables));
}
}
运行时优化
内联缓存
// 方法调用的内联缓存
struct InlineCache {
receiver_type: Option,
method_ptr: Option<*const u8>,
call_count: usize,
}
impl InlineCache {
fn call_method(&mut self, receiver: &dyn Any, method_name: &str) -> Result {
let receiver_type = receiver.type_id();
if Some(receiver_type) == self.receiver_type {
// 缓存命中,直接调用
if let Some(method_ptr) = self.method_ptr {
return unsafe { self.call_cached_method(method_ptr, receiver) };
}
}
// 缓存未命中,查找方法
let method_ptr = self.lookup_method(receiver_type, method_name)?;
// 更新缓存
self.receiver_type = Some(receiver_type);
self.method_ptr = Some(method_ptr);
self.call_count += 1;
unsafe { self.call_cached_method(method_ptr, receiver) }
}
}
❌ 错误处理
错误类型层次
#[derive(Debug, Clone)]
pub enum CompilerError {
LexError(LexError),
ParseError(ParseError),
SemanticError(SemanticError),
TypeError(TypeError),
CodeGenError(CodeGenError),
}
#[derive(Debug, Clone)]
pub struct ParseError {
pub message: String,
pub location: SourceLocation,
pub expected: Vec,
pub found: String,
}
#[derive(Debug, Clone)]
pub struct TypeError {
pub message: String,
pub location: SourceLocation,
pub expected_type: Type,
pub actual_type: Type,
}
错误恢复和报告
pub struct ErrorReporter {
errors: Vec,
source_map: SourceMap,
}
impl ErrorReporter {
pub fn report_error(&mut self, error: CompilerError) {
match &error {
CompilerError::TypeError(type_error) => {
self.report_type_error(type_error);
},
CompilerError::ParseError(parse_error) => {
self.report_parse_error(parse_error);
},
// ... 其他错误类型
}
self.errors.push(error);
}
fn report_type_error(&self, error: &TypeError) {
let source_line = self.source_map.get_line(error.location.line);
eprintln!("Type Error at {}:{}", error.location.line, error.location.column);
eprintln!(" {}", error.message);
eprintln!(" Expected: {}", error.expected_type);
eprintln!(" Found: {}", error.actual_type);
eprintln!();
eprintln!(" {} | {}", error.location.line, source_line);
eprintln!(" {} | {}^", " ".repeat(error.location.line.to_string().len()), " ".repeat(error.location.column));
}
}
📊 性能指标
编译性能
- 词法分析: ~1M tokens/秒
- 语法分析: ~100K AST nodes/秒
- 类型检查: ~50K expressions/秒
- 代码生成: ~10K LLVM instructions/秒
生成代码性能
- 函数调用开销: ~2-5 CPU 周期
- Trait 方法分发: ~5-10 CPU 周期(静态分发)
- 内存分配: ~50-100 CPU 周期
- 垃圾回收暂停: <1ms(增量回收)
🔧 调试和分析工具
AST 可视化
impl ASTVisualizer {
pub fn visualize(&self, ast: &AST) -> String {
let mut output = String::new();
self.visualize_node(&ast.root, 0, &mut output);
output
}
fn visualize_node(&self, node: &ASTNode, depth: usize, output: &mut String) {
let indent = " ".repeat(depth);
output.push_str(&format!("{}{:?}\n", indent, node.kind));
for child in &node.children {
self.visualize_node(child, depth + 1, output);
}
}
}
性能分析
pub struct CompilerProfiler {
phase_times: HashMap,
memory_usage: HashMap,
}
impl CompilerProfiler {
pub fn time_phase(&mut self, phase_name: &str, f: impl FnOnce() -> T) -> T {
let start = Instant::now();
let result = f();
let elapsed = start.elapsed();
self.phase_times.insert(phase_name.to_string(), elapsed);
result
}
pub fn report(&self) {
println!("Compilation Profile:");
for (phase, time) in &self.phase_times {
println!(" {}: {:?}", phase, time);
}
}
}