Why Is Crystal Compilation So Slow?

AI Summary8 min read

TL;DR

Crystal's slow compilation is primarily due to LLVM optimization and code generation, not its own parsing or semantic analysis. The language's type system and limited resources make incremental compilation difficult. Practical speedups can be achieved by using -O3 instead of --release.

Key Takeaways

  • Over 96% of Crystal compilation time is spent in LLVM's run_passes and target_machine_emit_to_file functions during code generation
  • Crystal's type inference system, where the caller determines types, prevents effective splitting of code into reusable modules
  • The --release flag enables --single-module, creating one massive LLVM module that slows compilation but improves optimization
  • Using -O3 instead of --release enables parallelization and can speed up compilation while sacrificing some optimization
  • Crystal's compilation challenges stem from both language design and limited development resources compared to corporate-backed languages

Tags

crystal

Introduction

The Crystal programming language is notorious for its slow compilation times.

But have you ever wondered where Crystal actually spends most of its compilation time?

overview

Figure: Crystal uses LLVM as its backend

The Crystal Compilation Pipeline

The Crystal compiler's compilation process consists of the following stages:

  1. new_program - Creating the program object
  2. parse - Lexical analysis and parsing
  3. semantic - Semantic analysis
  4. codegen - Generating object files
module Crystal
  class Compiler
    def compile(source : Source | Array(Source), output_filename : String) : Result
      source = [source] unless source.is_a?(Array)
      # 1 new_program
      program = new_program(source)

      # 2 parse
      node = parse program, source

      # 3 semantic
      node = program.semantic node, cleanup: !no_cleanup?

      # 4 codegen
      units = codegen program, node, source, output_filename unless @no_codegen

      # 5 cleanup
      # ... omission ...
      Result.new program, node
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

After this, linking is performed by the standard linker.

Command-Line Options for Compilation Statistics

Crystal provides a command-line option that displays compilation time statistics:

crystal build -s hoge.cr
Enter fullscreen mode Exit fullscreen mode

However, this method doesn't show the execution time of native LLVM functions, which was insufficient for this article's investigation.

To get to the heart of the matter, I used print debugging to measure the compilation time.

Native LLVM Functions Called During Codegen

During the codegen stage, the following native LLVM functions are called:

  • LibLLVM.run_passes
    • Applies optimization passes to LLVM IR
  • LibLLVM.target_machine_emit_to_file
    • Generates object files

I measured the execution time of these functions using print debugging as well.

Results

Here are the results from compiling the Crystal compiler itself:

Stage Time (seconds)
new_program 0.000388207
parse 0.000065000
semantic 12.552620028
codegen 355.245409133
- LibLLVM.run_passes 252.340241198
- LibLLVM.target_machine_emit_to_file 93.280652845
cleanup 0.000013180
total 367.798495548

Let me visualize this with a bar chart:

Compilation time breakdown

NOTE: This graph is from the original article and may differ slightly from the latest compiler.

Were the results what you expected?

  • Lexical analysis and parsing take virtually no time!
  • Semantic analysis (including type inference) also takes relatively little time!

In fact, the vast majority of the compilation time is spent in codegen, specifically in:

  • LibLLVM.run_passes
  • LibLLVM.target_machine_emit_to_file

These are external LLVM function calls that happen outside of Crystal's control!

In this case of building the Crystal compiler itself with --release, the majority of compilation time was spent on LLVM optimization and code generation.

This might be a somewhat surprising result, don't you think?

How to Speed Up the Crystal Compiler

The parts of the Crystal compiler implemented in Crystal—namely lexical analysis, parsing, and semantic analysis—are already sufficiently fast. This means that to achieve further speedups, we would need hardcore approaches such as:

  1. Introducing parallelization even in release builds
  2. Optimizing LLVM itself (specifically for Crystal)
  3. Improving Crystal to generate LLVM IR that's easier for LLVM to process

However, since these approaches aren't very practical for everyday use, let me introduce a more accessible method:

Use -O3 Instead of --release

In the Crystal compiler, specifying --release is equivalent to specifying both -O3 and --single-module. If you're willing to sacrifice some optimization, you can specify only the -O3 option, which enables parallelization and can speed up compilation in many cases.

From here on, there's a bit of a speculative element to the discussion.

Why Crystal Doesn't Have Incremental Compilation or Shared Library Support

Crystal's --release Mode Includes --single-module

Crystal struggles with splitting code into separate compilation units and reusing the results. In particular, --release builds enable --single-module, which compiles everything into one massive LLVM module for optimization.

For comparison, Rust performs separate compilation for each crate even with --release. In Rust, you need to explicitly use -C lto=fat to get behavior similar to Crystal's, where the entire LLVM IR is optimized together.

Crystal's Weak Caching Mechanism

Crystal does have a mechanism that caches LLVM bitcode files (.bc) and object files on a per-type basis during normal builds, and can reuse object files only when the bitcode is completely unchanged.

This allows the compiler to skip the expensive object file generation step in some cases.

However, even in such cases, lexical analysis, parsing, and semantic analysis cannot be skipped. The comparison only happens after generating .bc files. And as we'll discuss later, cases where the bitcode is completely unchanged are actually quite rare.

Crystal Is a Statically-Typed Language Where the Caller Determines Types

Why can't Crystal split packages into multiple LLVM IR modules, precompile them, and reuse the results?

The main reason is that Crystal has strong type inference and union types, and the concrete types of methods change depending on the calling context.

Crystal is an unusual statically-typed language where the caller determines the types, enabling duck typing. However, the trade-off is that type signatures need to be inferred with every compilation.

Type IDs Change with Each Compilation

The Crystal compiler assigns a number to every class to resolve types. With each compilation, every type that appears gets assigned a "number." Let's say class A gets assigned the number "10" in one compilation. If you make a small change to the code and recompile, "10" might be assigned to a different class. Linking object files created this way causes type inconsistencies and fails, because conditional branches based on types won't work correctly.

Additionally, when loading multiple Crystal shared libraries simultaneously, there's the problem of runtime functions being multiply defined.

This makes it difficult for Crystal to split code into parts, precompile them, and reuse them later.

But is this an inherent characteristic of the Crystal language? Let's consider this from a more social context.

The Crystal Language Community and Resource Constraints

Crystal is known as a language with Ruby-like concise syntax that delivers excellent performance.

However, the Crystal development team has limited resources. While there is a dedicated team at Manas.Tech and community contributors worldwide, the resources are still limited compared to large corporations.

For instance, imagine if Apple were developing Crystal.

Apple engineers might make changes to clang/LLVM itself to significantly improve compilation speed.

Or, like Swift, they might define a proper ABI and create an intermediate language or binary format well-suited to Crystal. Similar to how Swift has SIL (Swift Intermediate Language) as an intermediate representation before converting to LLVM IR, Crystal could have its own optimized intermediate language. This would enable comparing modules at that stage, resolving types, and generating object files from there. (Though I'm not entirely sure if this is possible within the LLVM framework.)

However, the Crystal compiler we have isn't like that. It generates monolithic, massive LLVM IR and delegates all optimization to LLVM. For package management, downloading source code directly from GitHub is the mainstream approach.

There still seems to be room for improvement.

The characteristic of slow compilation but fast execution is not purely a linguistic characteristic of Crystal, but also stems from the resource constraints of the Crystal development team. In other words, if significant resources were invested in development in the future, these issues could potentially be improved.

Conclusion

Designing an ABI specification or intermediate language for Crystal is extremely difficult. However, if someone achieves this, it could become Crystal 2.0 or Crystal 3.0.

Even without going that far, finding ways to split the generated LLVM IR into multiple modules, or mangling function names and global variables, would represent significant progress.

Crystal doesn't have as vibrant a library ecosystem as some other languages. While the reasons aren't entirely clear, as we improve the environment for code reuse, techniques for improving compilation speed may also develop.

That's all for this article. Thank you for reading to the end!


This article was originally written in 2024 and revised in December 2025. It was translated from Japanese to English using Claude Sonnet.

Visit Website