Skip to content

Sway compiler optimizations #7080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 15, 2025
Merged

Conversation

xunilrj
Copy link
Contributor

@xunilrj xunilrj commented Apr 11, 2025

Description

This PR is a prequel to #7015.

Debug trait auto-implementation is causing some performance issues. So this PR will try to compensate by doing the following optimizations:

1 - filter_dummy_methods is cloning TyFunctionDecl needlessly. Another small optimization is that instead of calling ConcurrentSlab::get(...) and cloning and dropping an Arc. There is a new ConcurrentSlab::map(...) method that avoids all that.

2 - A lot of time was being spent on node_dependencies. One the primary wastes was hashing nodes multiple times. To avoid that, each node will "memoize" its own hash, and a Hasher that does nothing is being used. This means that each node's hash will only be calculated once.

3 - The third optimization is a little bit more polemic. It involved the fact that we clone TyFunctionDecl a LOT. And some fields are more expensive than others, for example, body, parameters and constraints. To alleviate these excessive clonings, these fields are now behind an Arc (because LSP need them to be thread safe), and there is a "COW` mechanism specifically for them when they need mutation.

Detailed analysis

The first step is to compile forc and run Valgrind. My analysis was done using the debug build; values may vary on the release build, of course.

cd test/src/e2e_vm_tests/test_programs/should_pass/language/intrinsics/transmute
cargo r -p for
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes /home/xunilrj/github/sway/target/debug/forc build

The first that caught my attention was a function called filter_dummy_methods that is called around 150k times. Inside of it, there is a useless call to TyFunctinDecl::clone(). If Valgrind is correct, this clone is spending 10% of the compilation. This explains the first optimization.

image

Another 20% of the time is spent inside order_ast_nodes_by_dependency.

image

Inside this function, there is a function called recursively_depends_on that corresponds to 10% of the compilation, and more than half of it is spent hashing nodes. This explains the second optimization.

image

Another source of waste is the amount of TyFunctionDecl::clone that we call. It amounts to 12% of the compilation time. The problem is that it is much harder to remove/optimize these clones.

I assume that a lot of these clones are not needed. One of the reasons is the way SubtsType and other monomorphization mechanisms were built; we first need to clone something, and then we try to mutate the cloned version. I don´t have a number, but I imagine in a lot of cases, the cloning and the mutation were not needed. Maybe someone can come up with a test to see if we can avoid cloning/mutation. I bet that this will increase performance tremendously.

image

But what I did was I chose all the fields that are costly to be cloned: body, parameters, call_path, type_parameters, return_type, and where_clause; and I put them behind an Arc. If my assumption is correct, making them cheap to be cloned will improve performance because we will just clone their Arc, and never mutate them.

Unfortunately, there is no easy way to avoid cloning in some cases. For example, when monomorphizing, our algorithms don´t know "from the outside" that a body will never change, and we trigger the COW mechanism and end up cloning the item inside the Arc, even when we don´t need it.

image

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

@xunilrj xunilrj self-assigned this Apr 11, 2025
Copy link

codspeed-hq bot commented Apr 11, 2025

CodSpeed Performance Report

Merging #7080 will improve performances by 41.79%

Comparing xunilrj/sway-compiler-optimizations (3b3e11e) with master (8c132f1)

Summary

⚡ 1 improvements
✅ 21 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
compile 5.1 s 3.6 s +41.79%

@xunilrj xunilrj changed the title removing needless clones, and memoizing hash on dependencies Sway compiler optimizations Apr 13, 2025
@xunilrj xunilrj marked this pull request as ready for review April 14, 2025 11:21
@xunilrj xunilrj requested review from a team as code owners April 14, 2025 11:21
IGI-111
IGI-111 previously approved these changes Apr 14, 2025
@IGI-111 IGI-111 requested a review from a team April 14, 2025 11:44
@IGI-111 IGI-111 enabled auto-merge (squash) April 14, 2025 14:55
@xunilrj xunilrj force-pushed the xunilrj/sway-compiler-optimizations branch from 496a6b0 to 3b3e11e Compare April 14, 2025 17:26
Copy link
Member

@JoshuaBatty JoshuaBatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@JoshuaBatty JoshuaBatty requested review from a team April 14, 2025 23:44
@IGI-111 IGI-111 merged commit 2b06d79 into master Apr 15, 2025
41 checks passed
@IGI-111 IGI-111 deleted the xunilrj/sway-compiler-optimizations branch April 15, 2025 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants