-
Notifications
You must be signed in to change notification settings - Fork 5.9k
8342103: C2 compiler support for Float16 type and associated operations #21490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
@jatin-bhateja This change is no longer ready for integration - check the PR body for details. |
@jatin-bhateja The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Can we add the JMH micro benchmark that you added recently for FP16 as well ? or has it intentionally not been included? |
Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? |
We should move the |
*/ | ||
|
||
/* | ||
* @test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jatin, is there any reason why these have been kept under the Float
folder and not a separate Float16
folder?
To expand on that point, a few weeks back I took a look at what porting Float16 from java.lang in the lworld+fp16 branch of Valhalla to the jdk.incubator.vector package in JDK 24 would look like: the result were favorable and the diffs are attached to JDK-8341260. Before the work in this PR proceeds, I think the java.lang -> jdk.incubator.vector move of Float16 should occur first. This will allow leaner reviews and better API separation. I can get an updated PR of the move prepared within the next few days. |
Good point, we should separate the Java changes from the intrinsic + HotSpot changes. |
PS Along those lines, see for a non-intrinsified port of Float16 to the vector API. |
/contributor add @Bhavana-Kilambi |
/contributor add @jddarcy |
@jatin-bhateja |
/contributor add @PaulSandoz |
@jatin-bhateja this pull request can not be integrated into git checkout float16_support
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
@jatin-bhateja |
@jatin-bhateja |
/contributor add @rgiulietti |
@jatin-bhateja |
As I noted on Joe's PR, I like the fact that the intrinsics are decoupled from the box class. I'm now wondering if there is another simplification possible (as I claimed to Joe!) which is to reduce the number of intrinsics, ideally down to conversions (to and from HF). For example, Same argument for max, min, add, mul, etc. I'm not saying the current PR is wrong, but I would like to know if it could be simplified, either now or later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks for tackling this!
Ok, lots of style comments.
But again:
I would have loved to see this split up into these parts:
- Scalar
- Scalar optimizations (value, ideal, identity)
- Vector
This will again take many many week to get reviewed because it is a 3k+ change with lots of details.
Do you have any tests for the scalar constant folding optimizations? I did not find them.
Hey @eme64 , The patch includes IR framework-based scalar constant folding test points.
Regarding vector operation inferencing, we are taking the standard route by adding new Vector IR and associated VectorNode::Opcode / making routine changes without changing the auto-vectorization core. Each new vector operation is backed by IR framework-based tests. Our target is to get this integrated before JDK24-RDP1, your help and reviews will be highly appreciated. Best Regards |
test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java
Outdated
Show resolved
Hide resolved
I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once.
Here I only see the use of very trivial values. I think we need more complicated cases. What about these:
It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. |
ScalarFloat16OperationsTest.java |
Maybe I'm not making myself clear here. The test vectors will never constant fold - the values you read from an array load will always be the full range of their type, and not a constant. And you added constant folding IGVN optimizations. So we should test both:
It starts with something as simple as your constant folding of addition:
Which uses this code:
You are doing the addition in This is the simple stuff. Then there are more complex cases.
Is this adequately tested over the whole range of inputs? Of course the inputs have to be constant, otherwise if you only do array loads, the values are obviously variable, i.e. they would fail at the You do have some constant folding tests like this:
But this is only 2 examples for min. It does not cover all cases by a long shot. It covers 2 "nice" cases. I do not think that is sufficient. Often the bugs are hiding in special cases. Testing is really important to me. I've made the experience myself where I did not test optimizations well and later it can turn into a bug. Comments like these do not give me much confidence:
What do you think @Bhavana-Kilambi @PaulSandoz ? |
Then please do something about it! Your comments are helpful. But they do not answer my request for better test coverage. Yes, |
test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java
Outdated
Show resolved
Hide resolved
set_result(_gvn.transform(new ReinterpretHF2SNode(result))); | ||
return true; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be removed?
} | ||
|
||
@Benchmark | ||
public short cosineSimilarityDoubleRoundingFP16() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jatin-bhateja , I understand that the PR is in draft stage but just wanted to put out this comment so that these changes (if needed) are included in your next PS as well :)
This test fails with an "invalid cast type" error in this function -
jdk/src/hotspot/share/opto/castnode.cpp
Line 470 in dfa5620
fatal("unreachable. Invalid cast type."); |
macRes
at line #238 the compiler generates a ConstraintCastNode and it fails to match the half_float type. We need to add isa_half_float() as another condition in this routine and define a new Cast node for half-float.
d2bba9d
to
746c970
Compare
ins_pipe( pipe_slow ); | ||
%} | ||
|
||
instruct ReplHF_reg(vec dst, regF src, rRegI rtmp) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jatin-bhateja , there are JTREG tests testing Replicate with immediate FP16 values (Rep1HF_imm
backend rule) but I noticed that there are no JTREG tests for testing these rules - Rep1HF_reg
and Rep1HF_short_reg
. Should those be added as well? Something like -
float f = 10.0f;
for (int i = 0; i < SIZE: ++i) {
res[i] = Float16.fma(a[i], b[i], Float16.valueOf(f)); // where f is a loop invariant float variable defined out of the loop
}
@jatin-bhateja just ping me here if you think I should have a look at it again ;) |
cfded1e
to
94f5b98
Compare
This PR will split into separate scalar and vectorization support to ease the review process. Closing this PR. |
|
Hi All,
This patch adds C2 compiler support for various Float16 operations added by PR#22128
Following is the summary of changes included with this patch:-
Kindly review and share your feedback.
Best Regards,
Jatin
Progress
Integration blocker
Issue
Reviewers
Contributors
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21490/head:pull/21490
$ git checkout pull/21490
Update a local copy of the PR:
$ git checkout pull/21490
$ git pull https://git.openjdk.org/jdk.git pull/21490/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21490
View PR using the GUI difftool:
$ git pr show -t 21490
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21490.diff
Using Webrev
Link to Webrev Comment