Architecture Decisions
This document explains the key architectural decisions made in the BSharp project, their rationale, and their implications for contributors.
Core Design Philosophy
BSharp is designed as a modular, extensible C# parser and analysis toolkit written in Rust. The architecture prioritizes:
- Correctness - Accurate parsing of C# syntax
- Performance - Efficient parsing and analysis of large codebases
- Maintainability - Clear module boundaries and minimal coupling
- Extensibility - Easy addition of new language features and analyzers
Parser Architecture
Why nom Parser Combinators?
Decision: Use the nom parser combinator library as the foundation for parsing.
Rationale:
- Composability: Small, focused parsers combine to handle complex syntax
- Type Safety: Rust's type system catches parser errors at compile time
- Performance: Zero-copy parsing with minimal allocations
- Testability: Individual parser functions are easily unit tested
- Maintainability: Declarative style is easier to understand than hand-written parsers
Trade-offs:
- Learning curve for contributors unfamiliar with parser combinators
- Error messages require additional work (addressed with nom-supreme)
Implementation:
- Core parsing infrastructure:
src/bsharp_parser/src/helpers/ - Parser implementations:
src/bsharp_parser/src/ - All parsers return
BResult<I, O>type alias
Error Handling Strategy
Decision: Use nom-supreme::ErrorTree for all parser errors.
Rationale:
- Rich Context: Tree structure preserves full parse failure path
- Better Diagnostics: Context annotations via
.context()method - Integration: Seamless integration with nom combinators
- Debugging: Pretty-printing via
format_error_tree()
Evolution:
- Initially used custom
BSharpParseErrortype - Migrated to
ErrorTreefor better diagnostics - Custom error type deprecated and removed
Implementation:
#![allow(unused)] fn main() { pub type BResult<I, O> = IResult<I, O, ErrorTree<I>>; }
Helper Functions (in src/bsharp_parser/src/helpers/)
context()- Adds contextual informationcut()- Commits to parse branch (prevents misleading backtracking)bws()- Whitespace-aware wrapper with error contextbdelimited()- Delimited parsing with cut on closing delimiter
Module Organization
Decision: Separate the parser crate from the syntax (AST) crate, and keep analysis in its own crate.
Structure:
src/
├── bsharp_parser/ # Parser implementations and public facade
│ ├── src/
│ │ ├── expressions/ # Expression parsers
│ │ ├── keywords/ # Keyword parsing (modularized)
│ │ ├── helpers/ # Parsing utilities (bws, cut, context, directives, ...)
│ │ ├── facade.rs # Public Parser facade
│ │ └── ...
├── bsharp_syntax/ # AST node definitions and shared syntax types
│ └── src/ # (re-exported by bsharp_parser as `syntax`)
├── bsharp_analysis/ # Analysis framework and workspace
│ └── src/
└── bsharp_cli/ # CLI entry and subcommands
Rationale:
- Separation of Concerns: Infrastructure vs implementation
- Reusability: Helpers used across all parsers
- API Clarity:
syntaxmodule is the public API - Testing: Infrastructure can be tested independently
Keyword Modularization
Decision: Organize keywords by category in dedicated modules.
Structure:
src/parser/keywords/
├── mod.rs # Keyword infrastructure
├── access_keywords.rs # public, private, protected, internal
├── accessor_keywords.rs # get, set, init, add, remove
├── type_keywords.rs # class, struct, interface, enum, record
├── modifier_keywords.rs # static, abstract, virtual, sealed
├── flow_control_keywords.rs # if, else, switch, case, default
├── iteration_keywords.rs # for, foreach, while, do
├── expression_keywords.rs # new, this, base, typeof, sizeof
├── linq_query_keywords.rs # from, where, select, orderby
└── ...
Rationale:
- Maintainability: Easy to find and update keyword parsers
- Consistency: Uniform keyword parsing strategy
- Word Boundaries: All keywords use
keyword()helper for boundary checking - Prevents Bugs: Avoids partial matches (e.g., "int" vs "int32")
Implementation:
keyword()function enforces[A-Za-z0-9_]word boundaries- Parsers grouped under
src/bsharp_parser/src/keywords/
AST Design
Naming Convention
Decision: Use PascalCase names without 'Syntax' suffix for all AST nodes.
Examples:
ClassDeclaration(notClassDeclarationSyntax)MethodDeclaration(notMethodDeclarationSyntax)ExpressionStatement(notExpressionStatementSyntax)IfStatement(notIfStatementSyntax)
Rationale:
- Clarity: Shorter, clearer names
- Roslyn Inspiration: Mirrors Roslyn's structure where appropriate
- Consistency: Uniform naming across entire codebase
- User Preference: Explicit design decision (documented in memories)
Implications:
- All AST node types follow this convention
- Test code uses these names
- Documentation uses these names
- Breaking change from earlier versions with 'Syntax' suffix
AST Ownership Model
Decision: Parent nodes own their children; no circular references.
Structure:
#![allow(unused)] fn main() { pub struct ClassDeclaration { pub attributes: Vec<AttributeList>, pub modifiers: Vec<Modifier>, pub name: Identifier, pub type_parameters: Option<Vec<TypeParameter>>, pub primary_constructor_parameters: Option<Vec<Parameter>>, pub base_types: Vec<Type>, pub body_declarations: Vec<ClassBodyDeclaration>, // Owned pub documentation: Option<XmlDocumentationComment>, pub constraints: Option<Vec<TypeParameterConstraintClause>>, } }
Rationale:
- Rust Ownership: Leverages Rust's ownership system
- Memory Safety: No reference cycles or lifetime complexity
- Simplicity: Clear ownership semantics
- Traversal: Navigation traits provide search without ownership issues
Trade-offs:
- Cannot directly reference parent from child
- Navigation requires traversal from root
- Mitigated by
AstNavigateandFindDeclarationstraits
Zero-Copy Parsing
Decision: Minimize string allocations during parsing where possible.
Implementation:
- String slices reference original input
- Identifiers store
String(owned) for convenience - Literals preserve original format as
String
Rationale:
- Performance: Reduces allocation overhead
- Memory Efficiency: Lower memory footprint
- Trade-off: Some allocations necessary for AST lifetime
Spans and Location Tracking
Decision: Track source locations via spans for precise diagnostics and tooling.
Implementation:
Spantype based onnom_locate::LocatedSpanlives insrc/bsharp_parser/src/syntax/span.rsand is re-exported through the public parser API.- The parser facade supports
parse_with_spans()which returns both the AST and span table for mapping nodes back to source locations. - Error reporting uses spans to include line/column, highlighting ranges via
format_error_tree().
Rationale:
- Diagnostics: Accurate error locations and ranges.
- Tooling: Enables IDE features, navigation, and source mapping.
- Testing: Stable, comparable locations for snapshot tests.
See also: docs/syntax/spans.md.
Analysis Framework
Framework-Driven Architecture
Decision: Implement a pipeline-based analysis framework with passes, rules, and visitors.
Structure:
src/analysis/
├── framework/ # Core analysis infrastructure
│ ├── pipeline.rs # Analysis pipeline orchestration
│ ├── passes.rs # Analysis pass trait and phases
│ ├── rules.rs # Rule trait and rulesets
│ ├── walker.rs # AST walker and visitor pattern
│ ├── registry.rs # Analyzer registry
│ └── session.rs # Analysis session and state
├── passes/ # Concrete analysis passes
├── rules/ # Concrete analysis rules
├── artifacts/ # Analysis artifacts (symbols, metrics, CFG)
└── ...
Rationale:
- Extensibility: Easy to add new analyzers
- Composability: Passes and rules compose via registry
- Performance: Single-pass traversal for local rules
- Configurability: Enable/disable passes and rules via config
Phases:
- Index - Symbol indexing and scope building
- Local - Single-pass local rules and metrics collection
- Global - Cross-file analysis (dependencies, etc.)
- Semantic - Type checking and semantic rules
- Reporting - Report generation and formatting
Visitor Pattern
Decision: Use visitor pattern for AST traversal.
Implementation:
#![allow(unused)] fn main() { pub trait Visit { fn enter(&mut self, node: &NodeRef, session: &mut AnalysisSession); fn exit(&mut self, node: &NodeRef, session: &mut AnalysisSession) {} } pub struct AstWalker { visitors: Vec<Box<dyn Visit>>, } }
Rationale:
- Separation of Concerns: Traversal logic separate from analysis logic
- Composability: Multiple visitors in single traversal
- Performance: Single pass for multiple analyses
- Extensibility: Easy to add new visitors
Query API
Decision: Use a typed Query API over a minimal NodeRef to traverse the AST. This is the current traversal API; the term “legacy” only refers to older navigation traits that the Query API replaced.
Implementation:
NodeRefenumerates coarse node categories (compilation unit, namespaces, declarations, methods, statements, expressions), and now includes top-level items like file-scoped namespaces, using directives, global using directives, and global attributes.Childrenprovides child enumeration forNodeRef.Extract<T>enablesQuery::of<T>()to yield typed nodes without extendingNodeReffor every concrete type.- Macro helpers
impl_extract_expr!andimpl_extract_stmt!simplify addingExtractimpls for expression/statement variants. - Location:
src/bsharp_syntax/src/query/(re-exported asbsharp_analysis::framework::Query)
Rationale:
- Composability: Typed filters via
Query::filter_typed. - Maintainability: Avoids wide trait surfaces and duplicated traversal.
- Performance: Focused walkers remain available for hot paths.
- Determinism: Traversal order and artifact hashing remain stable.
See also:
docs/parser/navigation.md(Query API overview)docs/analysis/traversal-guide.md(using Query in passes)docs/development/query-cookbook.md(recipes)
Formatting and Emitters
Decision: Implement formatting via an Emit trait with per-node emitters in bsharp_syntax.
Implementation:
Emittrait and emitters live undersrc/bsharp_syntax/src/emitters/(e.g.,emitters/declarations/*,emitters/expressions/*,emitters/statements/*).- Formatting is separated from parsing; emitters reconstruct code from AST with consistent whitespace and trivia handling.
- Trivia and XML doc emitters are under
emitters/trivia/.
Rationale:
- Separation of Concerns: Parsing and formatting evolve independently.
- Consistency: Centralized formatting rules for all nodes.
- Extensibility: Adding a new node implies an
Emitimpl in a known location.
See also: docs/syntax/formatter.md.
Workspace Loading
Multi-Format Support
Decision: Support loading from .sln, .csproj, or directory.
Implementation:
#![allow(unused)] fn main() { pub struct WorkspaceLoader; impl WorkspaceLoader { pub fn from_path(path: &Path) -> Result<Workspace>; pub fn from_path_with_options(path: &Path, opts: WorkspaceLoadOptions) -> Result<Workspace>; } }
Rationale:
- Flexibility: Support different entry points
- IDE Integration: Match IDE project loading behavior
- Incremental Analysis: Load only what's needed
Features:
- Solution file (.sln) parsing
- Project file (.csproj) parsing with XML
- Transitive ProjectReference following
- Source file discovery with glob patterns
- Deterministic project ordering
Error Resilience
Decision: Continue loading workspace even if individual projects fail.
Implementation:
- Failed projects recorded as stubs with error messages
- Workspace loading succeeds with partial results
- Errors accessible via
Project::errorsfield
Rationale:
- Robustness: Don't fail entire workspace for one bad project
- User Experience: Show what can be analyzed
- Debugging: Error messages preserved for investigation
Testing Strategy
External Test Organization
Decision: Externalize tests; in the current workspace they live under src/bsharp_tests/ rather than inline #[cfg(test)] modules.
Structure:
src/bsharp_tests/src/
├── parser/
│ ├── expressions/
│ ├── statements/
│ ├── declarations/
│ └── types/
├── cli/
└── integration/
Rationale:
- Separation: Test code separate from implementation
- Organization: Clear structure mirrors crates
- Compilation: Tests don't bloat production binaries
Note: A future migration to top-level tests/ may be considered.
Test Helpers
Decision: Provide expect_ok() helper for readable test failures.
Implementation:
#![allow(unused)] fn main() { pub fn expect_ok<T>(input: &str, result: BResult<&str, T>) -> T { match result { Ok((_, value)) => value, Err(e) => { eprintln!("{}", format_error_tree(&input, &e)); panic!("Parse failed"); } } } }
Rationale:
- Diagnostics: Pretty-printed errors on failure
- Debugging: Shows parse failure context
- Consistency: Uniform test error reporting
Snapshot Testing
Decision: Use insta crate for snapshot testing.
Implementation:
Cargo.tomlincludesinstain dev-dependencies- Snapshot tests for complex AST structures
- JSON serialization for comparison
Rationale:
- Regression Prevention: Catch unintended AST changes
- Review: Visual diff of AST changes
- Maintenance: Update snapshots when intentional
Performance Considerations
Parallel Analysis
Decision: Optional parallel analysis via rayon feature.
Implementation:
[features]
parallel_analysis = ["rayon"]
Rationale:
- Scalability: Faster analysis for large workspaces
- Optional: Not required for single-file use cases
- Trade-off: Adds dependency and complexity
Incremental Parsing
Decision: Not implemented yet; designed for future addition.
Future Design:
- Cache parsed ASTs by file hash
- Reparse only changed files
- Incremental analysis based on change scope
Rationale:
- Performance: Critical for IDE integration
- Complexity: Requires careful cache invalidation
- Priority: Deferred until core features stable
CLI Design
Subcommand Structure
Decision: Use clap with subcommands for different operations.
Commands:
parse- Parse C# file to JSONtree- Generate AST visualization (Mermaid/DOT)analyze- Run analysis and generate report
Rationale:
- Clarity: Each command has clear purpose
- Extensibility: Easy to add new commands
- Discoverability:
--helpshows all options - Consistency: Follows common CLI patterns
Output Formats
Decision: Support multiple output formats (JSON, pretty-JSON, SVG).
Implementation:
- JSON for machine consumption
- Pretty-JSON for human readability
- SVG for visualization
Rationale:
- Integration: JSON for tool integration
- Debugging: Pretty-JSON for manual inspection
- Visualization: SVG for understanding AST structure
Future Extensibility
Planned Enhancements
-
Incremental Parsing
- Cache parsed ASTs
- Reparse only changed regions
- Critical for IDE integration
-
Language Server Protocol (LSP)
- IDE integration
- Real-time diagnostics
- Code completion
-
More Analysis Passes
- Nullability analysis
- Lifetime analysis
- Security analysis
-
Code Transformation
- AST modification API
- Code generation from AST
- Refactoring support
Design for Extension
Principles:
- Trait-Based: Use traits for extensibility points
- Registry Pattern: Dynamic registration of analyzers
- Configuration: Enable/disable features via config
- Versioning: Stable API with clear versioning
Lessons Learned
What Worked Well
- Parser Combinators: Excellent for composability and testing
- Module Organization: Clear boundaries reduce coupling
- Error Context:
ErrorTreeprovides excellent diagnostics - External Tests: Clean separation improves maintainability
What We'd Do Differently
- Earlier Keyword Modularization: Should have organized keywords from start
- Error Type Migration: Earlier adoption of
ErrorTreewould have saved refactoring - Documentation: More inline documentation from the beginning
Recent Refactoring
Major refactoring improvements completed:
- Expression precedence chain builder implemented
- Statement group deduplication completed
- Consistent error recovery with
skip_to_member_boundary_top_level() - Whitespace handling standardization via
bws()combinator - Keyword modularization by category
Contributing Guidelines
When adding new features, follow these architectural principles:
- Use Existing Patterns: Follow established parser patterns
- Add Tests: External tests in
tests/directory - Document Decisions: Update this file for significant changes
- Error Context: Add
.context()calls for debugging - Naming Convention: PascalCase without 'Syntax' suffix
- Keyword Boundaries: Use
keyword()helper for all keywords
See docs/development/contributing.md for detailed contribution guidelines.