Skip to content

Update/clarify documentation of generic constants #5473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

geoffromer
Copy link
Contributor

No description provided.

@geoffromer geoffromer requested a review from zygoloid May 13, 2025 19:29
@github-actions github-actions bot requested a review from chandlerc May 13, 2025 19:29
Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

struct SymbolicConstant : Printable<SymbolicConstant> {
// The constant instruction that defines the value of this symbolic constant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also use the "defines the value of a constant" terminology in the definition of the InstConstantKind values, so if you're changing it here, can you change it there too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Note that I made a couple of drive-by fixes while I was there.

@geoffromer geoffromer requested a review from zygoloid May 14, 2025 18:36
Comment on lines +70 to +74
// This instruction represents a symbolic or concrete constant whenever its
// operands are constant. Otherwise, it is non-constant. For example, a
// `TupleValue` defines a constant if and only if its operands are constants.
// Constant evaluation support for types with this constant kind is provided
// automatically.
// `TupleValue` represents a constant if and only if its operands represent
// constants. Constant evaluation support for types with this constant kind is
// provided automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pre-existing description here is wrong. How about this:

Suggested change
// This instruction represents a symbolic or concrete constant whenever its
// operands are constant. Otherwise, it is non-constant. For example, a
// `TupleValue` defines a constant if and only if its operands are constants.
// Constant evaluation support for types with this constant kind is provided
// automatically.
// `TupleValue` represents a constant if and only if its operands represent
// constants. Constant evaluation support for types with this constant kind is
// provided automatically.
// This instruction can represent a symbolic or concrete constant, and has a
// constant value of the same kind whenever its operands are constant.
// Otherwise, it is non-constant. For example, a `TupleValue` has a constant
// value if and only if its operands have constant values, and its constant
// value is always represented by a `TupleValue` instruction. Constant
// evaluation support for types with this constant kind is provided
// automatically.

Comment on lines +76 to +78
// This instruction always represents a constant value. This is the same as
// `WheneverPossible`, except that the operands are known in advance to always
// be constant. For example, `IntValue`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one was right before, and the changes are incorrect (probably because the description of WheneverPossible was wrong).

Suggested change
// This instruction always represents a constant value. This is the same as
// `WheneverPossible`, except that the operands are known in advance to always
// be constant. For example, `IntValue`.
// This instruction always has a constant value of the same kind. This is the
// same as `WheneverPossible`, except that the operands are known in advance
// to always be constant. For example, `IntValue`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a guess about where the disconnect might be: the original version would be correct, and my new version would be wrong, if we add the restriction that only a canonicalized inst can represent a constant value. Is that the issue? If so, I've intentionally tried to avoid that restriction so that these can be self-contained properties of inst values (or more precisely inst graphs), whereas canonicalization is a non-local property of IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see where you're coming from. But I don't think that matches well with the model and terminology that we're using.

Whether particular inst IDs are the canonical instruction that defines a constant value (that is, whether constant_values().GetConstantInstId(inst_id) == inst_id) is a fundamental part of what we're trying to document here. Moreover, when we talk about instructions, we're generally talking about a thing with identity that lives in a SemIR::File, not a SemIR::Inst object (or a typed inst object) -- the latter are probably best thought of more as potential instructions or views of instructions, and the actual instructions are the things in the inst value store. In particular, instructions have IDs and constant values and such, which are not part of the Inst value.

I think the new version of this comment is also wrong even with your approach of using "represents a constant value" to mean "instruction has the right value that it could be the canonical representation of its constant value" -- just because the operands are constant (have constant values), doesn't mean that they are represented by canonical IDs, which is what we need for the instruction to be in its canonical representation. For example, an ArrayType whose bound_id is the expression 1 + 1 does not "represent a constant value" because it's not in the proper form to represent a constant value (it's not a "constant inst" in the terminology you used in constant.h), but we use Always for ArrayType because it always has a constant value of the same kind (in this case, an ArrayType whose bound_id is the IntValue 2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see where you're coming from. But I don't think that matches well with the model and terminology that we're using.

If so, I can revert to the previous version, but I could use some clarification on a couple of points.

Whether particular inst IDs are the canonical instruction that defines a constant value (that is, whether constant_values().GetConstantInstId(inst_id) == inst_id) is a fundamental part of what we're trying to document here.

How so? None of the places where we use this enum seem to care about that property.

Moreover, when we talk about instructions, we're generally talking about a thing with identity that lives in a SemIR::File, not a SemIR::Inst object (or a typed inst object) -- the latter are probably best thought of more as potential instructions or views of instructions, and the actual instructions are the things in the inst value store. In particular, instructions have IDs and constant values and such, which are not part of the Inst value.

I think the new version of this comment is also wrong even with your approach of using "represents a constant value" to mean "instruction has the right value that it could be the canonical representation of its constant value" -- just because the operands are constant (have constant values), doesn't mean that they are represented by canonical IDs, which is what we need for the instruction to be in its canonical representation.

For what it's worth, the model I had in mind wasn't "the instruction value could be the canonical representation of its constant value", it was more like "the canonicalized instruction value could be the canonical representation of its constant value", where canonicalization consists of recursively replacing IDs with their canonical counterparts, but not performing any evaluation.

For example, an ArrayType whose bound_id is the expression 1 + 1 does not "represent a constant value" because it's not in the proper form to represent a constant value (it's not a "constant inst" in the terminology you used in constant.h), but we use Always for ArrayType because it always has a constant value of the same kind (in this case, an ArrayType whose bound_id is the IntValue 2).

It looks to me like ArrayType is Conditional, not Always, and in fact it's the example we use above to explain Conditional. From a quick skim, I'm not seeing an obvious way to transpose your example to any of the Always inst kinds -- they all look like their operands are fully-evaluated by construction. So it seems like the inst kinds that use this enum might be following a definition closer to mine, where Always means that no further evaluation is needed.

On the other hand, the code that currently reads Always seems to be following a definition closer to yours: if I understand correctly, TryEvalTypedInst will recursively evaluate the operands of an Always inst; it just doesn't then go on to do any kind-specific evaluation logic. However, that does suggest that the original documentation was a little imprecise: it's not just that the inst always has a constant value of the same kind, it's that its constant value can be computed by evaluating and canonicalizing each operand. However, in practice that may be a distinction without a difference.

Side musing: I wonder if it would be cleaner to factor this into two separate enums: one that chooses the TryEvalTypedInst algorithm, and one that's responsible for specifying whether insts can be/have a constant value. For example:

enum class EvalAlgorithm {
  // Always returns NotConstant.
  // Corresponds to `None`.
  NoEval,
  // Always returns the inst_id as a constant.
  // Corresponds to `Unique`
  Unique,
  // Replaces operands with their canonicalized constant values.
  // Corresponds to `Always` and `WheneverPossible`
  ReplaceOnly,
  // Uses PerformDelayedAction.
  // Corresponds to `InstAction`
  DelayedAction,
  // Uses 3-argument EvalConstantInst.
  // Corresponds to the subset of `Indirect`, `SymbolicOnly` and `Conditional`
  // inst kinds that have constant_needs_inst_id == true.
  CustomWithId,
  // Uses 2-argument EvalConstantInst.
  // Corresponds to the subset of `Indirect`, `SymbolicOnly` and `Conditional`
  // inst kinds that have constant_needs_inst_id == false.
  CustomWithoutId,
};

enum class InstConstantValueKind {
  // Corresponds to `Never`.
  Never,
  // Corresponds to `Indirect`.
  CanHaveConstantValue,
  // Corresponds to `SymbolicOnly`, `InstAction`, `Conditional`, and `WheneverPossible`.
  CanBeConstantValue,
  // Corresponds to `Always`.
  MustHaveConstantValue,
  // Corresponds to `Unique`.
  MustBeConstantValue,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether particular inst IDs are the canonical instruction that defines a constant value (that is, whether constant_values().GetConstantInstId(inst_id) == inst_id) is a fundamental part of what we're trying to document here.

How so? None of the places where we use this enum seem to care about that property.

I think the place where this matters the most is for Unique: the key property that Unique gives is that constant_values().GetConstantInstId(inst_id) == inst_id for every instruction of that kind. On reflection, it might not be such a big deal for other instructions, except that it's a part of establishing their canonical form.

For what it's worth, the model I had in mind wasn't "the instruction value could be the canonical representation of its constant value", it was more like "the canonicalized instruction value could be the canonical representation of its constant value", where canonicalization consists of recursively replacing IDs with their canonical counterparts, but not performing any evaluation.

Oh, OK. For me, the new terminology isn't communicating that well -- I find it unclear to describe an instruction as "represent[ing] a constant" if that instruction representation can never be the representation that's used for a constant because it has non-canonical field values -- but maybe a different term would work better with this same style of approach. Maybe a good term (eg, from lambda calculus / term rewrite systems) to use would be to talk about whether an expression is in reduced form or not.

That said, the process by which we evaluate is to first canonicalize the fields and then attempt to reduce the instruction, so talking about instructions that are reduced but not canonical doesn't really match the evaluator's logic. But maybe nonetheless separating canonicalization from reduction here would make the explanation clearer.

It looks to me like ArrayType is Conditional, not Always, and in fact it's the example we use above to explain Conditional.

Oops, I guess I saw the InstIsType::Always and thought it was the InstConstantKind!

From a quick skim, I'm not seeing an obvious way to transpose your example to any of the Always inst kinds -- they all look like their operands are fully-evaluated by construction.

I think ClassType is an example where this happens -- hopefully a correct one this time! If we form a ClassType with a non-canonical SpecificId, evaluation will replace that with a corresponding canonical SpecificId (with the arguments canonicalized) when forming the canonical ClassType.

Side musing: I wonder if it would be cleaner to factor this into two separate enums: one that chooses the TryEvalTypedInst algorithm, and one that's responsible for specifying whether insts can be/have a constant value.

I think probably the best test for that kind of approach would be to see what impact it has on the consumers of this information. It appears to me that the two choices aren't orthogonal, and I'm worried it'd add complexity to write the code in a way that either tries to support all the combinations or, say, static_asserts on invalid combinations, when many of them never come up or don't make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants