-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Sway compiler optimizations #7080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CodSpeed Performance ReportMerging #7080 will improve performances by 41.79%Comparing Summary
Benchmarks breakdown
|
8 tasks
IGI-111
previously approved these changes
Apr 14, 2025
496a6b0
to
3b3e11e
Compare
JoshuaBatty
approved these changes
Apr 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
kayagokalp
approved these changes
Apr 15, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR is a prequel to #7015.
Debug
trait auto-implementation is causing some performance issues. So this PR will try to compensate by doing the following optimizations:1 -
filter_dummy_methods
is cloningTyFunctionDecl
needlessly. Another small optimization is that instead of callingConcurrentSlab::get(...)
and cloning and dropping anArc
. There is a newConcurrentSlab::map(...)
method that avoids all that.2 - A lot of time was being spent on
node_dependencies
. One the primary wastes was hashing nodes multiple times. To avoid that, each node will "memoize" its own hash, and aHasher
that does nothing is being used. This means that each node's hash will only be calculated once.3 - The third optimization is a little bit more polemic. It involved the fact that we clone
TyFunctionDecl
a LOT. And some fields are more expensive than others, for example,body
,parameters
andconstraints
. To alleviate these excessive clonings, these fields are now behind anArc
(because LSP need them to be thread safe), and there is a "COW` mechanism specifically for them when they need mutation.Detailed analysis
The first step is to compile
forc
and run Valgrind. My analysis was done using the debug build; values may vary on the release build, of course.The first that caught my attention was a function called
filter_dummy_methods
that is called around 150k times. Inside of it, there is a useless call toTyFunctinDecl::clone()
. If Valgrind is correct, thisclone
is spending 10% of the compilation. This explains the first optimization.Another 20% of the time is spent inside
order_ast_nodes_by_dependency
.Inside this function, there is a function called
recursively_depends_on
that corresponds to 10% of the compilation, and more than half of it is spent hashing nodes. This explains the second optimization.Another source of waste is the amount of
TyFunctionDecl::clone
that we call. It amounts to 12% of the compilation time. The problem is that it is much harder to remove/optimize these clones.I assume that a lot of these clones are not needed. One of the reasons is the way
SubtsType
and other monomorphization mechanisms were built; we first need to clone something, and then we try to mutate the cloned version. I don´t have a number, but I imagine in a lot of cases, the cloning and the mutation were not needed. Maybe someone can come up with a test to see if we can avoid cloning/mutation. I bet that this will increase performance tremendously.But what I did was I chose all the fields that are costly to be cloned:
body
,parameters
,call_path
,type_parameters
,return_type
, andwhere_clause
; and I put them behind anArc
. If my assumption is correct, making them cheap to be cloned will improve performance because we will just clone theirArc
, and never mutate them.Unfortunately, there is no easy way to avoid cloning in some cases. For example, when monomorphizing, our algorithms don´t know "from the outside" that a
body
will never change, and we trigger the COW mechanism and end up cloning the item inside theArc
, even when we don´t need it.Checklist
Breaking*
orNew Feature
labels where relevant.