Sway compiler optimizations #7080

xunilrj · 2025-04-11T17:14:13Z

Description

This PR is a prequel to #7015.

Debug trait auto-implementation is causing some performance issues. So this PR will try to compensate by doing the following optimizations:

1 - filter_dummy_methods is cloning TyFunctionDecl needlessly. Another small optimization is that instead of calling ConcurrentSlab::get(...) and cloning and dropping an Arc. There is a new ConcurrentSlab::map(...) method that avoids all that.

2 - A lot of time was being spent on node_dependencies. One the primary wastes was hashing nodes multiple times. To avoid that, each node will "memoize" its own hash, and a Hasher that does nothing is being used. This means that each node's hash will only be calculated once.

3 - The third optimization is a little bit more polemic. It involved the fact that we clone TyFunctionDecl a LOT. And some fields are more expensive than others, for example, body, parameters and constraints. To alleviate these excessive clonings, these fields are now behind an Arc (because LSP need them to be thread safe), and there is a "COW` mechanism specifically for them when they need mutation.

Detailed analysis

The first step is to compile forc and run Valgrind. My analysis was done using the debug build; values may vary on the release build, of course.

cd test/src/e2e_vm_tests/test_programs/should_pass/language/intrinsics/transmute
cargo r -p for
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes /home/xunilrj/github/sway/target/debug/forc build

The first that caught my attention was a function called filter_dummy_methods that is called around 150k times. Inside of it, there is a useless call to TyFunctinDecl::clone(). If Valgrind is correct, this clone is spending 10% of the compilation. This explains the first optimization.

Another 20% of the time is spent inside order_ast_nodes_by_dependency.

Inside this function, there is a function called recursively_depends_on that corresponds to 10% of the compilation, and more than half of it is spent hashing nodes. This explains the second optimization.

Another source of waste is the amount of TyFunctionDecl::clone that we call. It amounts to 12% of the compilation time. The problem is that it is much harder to remove/optimize these clones.

I assume that a lot of these clones are not needed. One of the reasons is the way SubtsType and other monomorphization mechanisms were built; we first need to clone something, and then we try to mutate the cloned version. I don´t have a number, but I imagine in a lot of cases, the cloning and the mutation were not needed. Maybe someone can come up with a test to see if we can avoid cloning/mutation. I bet that this will increase performance tremendously.

But what I did was I chose all the fields that are costly to be cloned: body, parameters, call_path, type_parameters, return_type, and where_clause; and I put them behind an Arc. If my assumption is correct, making them cheap to be cloned will improve performance because we will just clone their Arc, and never mutate them.

Unfortunately, there is no easy way to avoid cloning in some cases. For example, when monomorphizing, our algorithms don´t know "from the outside" that a body will never change, and we trigger the COW mechanism and end up cloning the item inside the Arc, even when we don´t need it.

Checklist

I have linked to any relevant issues.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation where relevant (API docs, the reference, and the Sway book).
- If my change requires substantial documentation changes, I have requested support from the DevRel team
I have added tests that prove my fix is effective or that my feature works.
I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
I have requested a review from the relevant team or maintainers.

codspeed-hq · 2025-04-11T17:27:50Z

CodSpeed Performance Report

Merging #7080 will improve performances by 41.79%

_{Comparing xunilrj/sway-compiler-optimizations (3b3e11e) with master (8c132f1)}

Summary

⚡ 1 improvements
✅ 21 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`compile`	5.1 s	3.6 s	+41.79%

JoshuaBatty

Great work!

xunilrj temporarily deployed to fuel-sway-bot April 11, 2025 17:14 — with GitHub Actions Inactive

xunilrj self-assigned this Apr 11, 2025

xunilrj temporarily deployed to fuel-sway-bot April 11, 2025 23:40 — with GitHub Actions Inactive

xunilrj had a problem deploying to fuel-sway-bot April 12, 2025 11:10 — with GitHub Actions Error

xunilrj temporarily deployed to fuel-sway-bot April 12, 2025 11:11 — with GitHub Actions Inactive

xunilrj changed the title ~~removing needless clones, and memoizing hash on dependencies~~ Sway compiler optimizations Apr 13, 2025

xunilrj marked this pull request as ready for review April 14, 2025 11:21

xunilrj requested review from a team as code owners April 14, 2025 11:21

xunilrj mentioned this pull request Apr 14, 2025

Debug trait and its auto implementation #7015

Merged

8 tasks

IGI-111 previously approved these changes Apr 14, 2025

View reviewed changes

IGI-111 requested a review from a team April 14, 2025 11:44

IGI-111 temporarily deployed to fuel-sway-bot April 14, 2025 11:44 — with GitHub Actions Inactive

IGI-111 enabled auto-merge (squash) April 14, 2025 14:55

IGI-111 temporarily deployed to fuel-sway-bot April 14, 2025 14:55 — with GitHub Actions Inactive

xunilrj added 6 commits April 14, 2025 14:26

removing needless clones, and memoizing hash on dependencies

51f102c

remove callgrind file

cad6177

optimize TyFunctionDecl::clone using COW over Arc

c021f8d

fix tests

3c80a59

fix typo

488f353

disabling Arc optimization

3b3e11e

xunilrj dismissed IGI-111’s stale review via 3b3e11e April 14, 2025 17:26

xunilrj force-pushed the xunilrj/sway-compiler-optimizations branch from 496a6b0 to 3b3e11e Compare April 14, 2025 17:26

xunilrj temporarily deployed to fuel-sway-bot April 14, 2025 17:27 — with GitHub Actions Inactive

JoshuaBatty approved these changes Apr 14, 2025

View reviewed changes

JoshuaBatty requested review from a team April 14, 2025 23:44

kayagokalp approved these changes Apr 15, 2025

View reviewed changes

IGI-111 merged commit 2b06d79 into master Apr 15, 2025
41 checks passed

IGI-111 deleted the xunilrj/sway-compiler-optimizations branch April 15, 2025 06:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sway compiler optimizations #7080

Sway compiler optimizations #7080

Uh oh!

xunilrj commented Apr 11, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 11, 2025 •

edited

Loading

Uh oh!

JoshuaBatty left a comment

Uh oh!

Uh oh!

Uh oh!

Sway compiler optimizations #7080

Sway compiler optimizations #7080

Uh oh!

Conversation

xunilrj commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Detailed analysis

Checklist

Uh oh!

codspeed-hq bot commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #7080 will improve performances by 41.79%

Summary

Benchmarks breakdown

Uh oh!

JoshuaBatty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xunilrj commented Apr 11, 2025 •

edited

Loading

codspeed-hq bot commented Apr 11, 2025 •

edited

Loading