Skip to content

Change invariant list of subclassed AST in ast fields, to covariant Sequence #13942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hunterhogan opened this issue May 4, 2025 · 6 comments

Comments

@hunterhogan
Copy link
Contributor

In stdlib/ast.pyi, if a subclass of ast.AST has a field with a "collection" type, then the type is builtins.list. In some cases, because list is invariant, the feedback from type checkers is not helpful. Consider the following simple code.

from ast import FunctionDef, Pass, arguments

list_stmt_subclasses = [Pass()]

FunctionDef("chicken", arguments([], [], None, [], [], None, []), list_stmt_subclasses, [], None, None, [])

Mypy reports:

Argument 3 to "FunctionDef" has incompatible type "list[Pass]"; expected "list[stmt]" Mypy(arg-type)

"List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
Consider using "Sequence" instead, which is covariant

Pylance reports:

Argument of type "list[Pass]" cannot be assigned to parameter "body" of type "list[stmt]" in function "__init__"
  "list[Pass]" is not assignable to "list[stmt]"
    Type parameter "_T@list" is invariant, but "Pass" is not the same as "stmt"
    Consider switching from "list" to "Sequence" which is covariant Pylance(reportArgumentType)

When I use a stub file that defines body: Sequence[stmt] in all the right places, the above code does not lead to diagnostic reports.

Change list to Sequence for subclassed types

If I understand correctly, even though ast.Pass is a subclass of ast.stmt, the invariance of list requires ast.stmt, not a subclass. In contrast, the covariance of Sequence permits substituting subclasses for the base class. I'm not a trained programmer, and I don't have professional programming experience. In the past, when I have encountered this issue and got help from an AI assistant, the "solution" was always

list_stmt_subclasses = [cast(ast.stmt, element) for element in list_stmt_subclasses]

Those kinds of statements have no value, increase complexity, and increase the risk of bugs.

If the solution is to change some list to Sequence in the stub file, then I happen to have already a list (no pun intended) of subclasses and fields.

About the list contents:

  1. Classes as of Python 3.13 without differentiating other versions.
  2. No deprecated classes.
  3. Includes list if the type is subclassed, such as list[ast.expr] but not list[ast.arg].
  4. All typing.TypeAlias from the stub file were replaced with the, uh, true name (what is that called?). That might not have affected this list, but I'm not sure.
  5. Includes list[str] because str could be subclassed, but I don't really have the knowledge to analyze its inclusion or exclusion.
Class Field Type
Assign targets list[ast.expr]
AsyncFor body list[ast.stmt]
AsyncFor orelse list[ast.stmt]
AsyncFunctionDef body list[ast.stmt]
AsyncFunctionDef decorator_list list[ast.expr]
AsyncFunctionDef type_params list[ast.type_param]
AsyncWith body list[ast.stmt]
BoolOp values list[ast.expr]
Call args list[ast.expr]
ClassDef bases list[ast.expr]
ClassDef body list[ast.stmt]
ClassDef decorator_list list[ast.expr]
ClassDef type_params list[ast.type_param]
Compare comparators list[ast.expr]
Compare ops list[ast.cmpop]
Delete targets list[ast.expr]
Dict keys list[ast.expr]
Dict values list[ast.expr]
ExceptHandler body list[ast.stmt]
For body list[ast.stmt]
For orelse list[ast.stmt]
FunctionDef body list[ast.stmt]
FunctionDef decorator_list list[ast.expr]
FunctionDef type_params list[ast.type_param]
FunctionType argtypes list[ast.expr]
Global names list[str]
If body list[ast.stmt]
If orelse list[ast.stmt]
Interactive body list[ast.stmt]
JoinedStr values list[ast.expr]
List elts list[ast.expr]
MatchClass kwd_attrs list[str]
MatchClass kwd_patterns list[ast.pattern]
MatchClass patterns list[ast.pattern]
MatchMapping keys list[ast.expr]
MatchMapping patterns list[ast.pattern]
MatchOr patterns list[ast.pattern]
MatchSequence patterns list[ast.pattern]
Module body list[ast.stmt]
Module type_ignores list[ast.type_ignore]
Nonlocal names list[str]
Set elts list[ast.expr]
Try body list[ast.stmt]
Try finalbody list[ast.stmt]
Try handlers list[ast.excepthandler]
Try orelse list[ast.stmt]
TryStar body list[ast.stmt]
TryStar finalbody list[ast.stmt]
TryStar handlers list[ast.excepthandler]
TryStar orelse list[ast.stmt]
Tuple elts list[ast.expr]
TypeAlias type_params list[ast.type_param]
While body list[ast.stmt]
While orelse list[ast.stmt]
With body list[ast.stmt]
arguments defaults list[ast.expr]
arguments kw_defaults list[ast.expr]
comprehension ifs list[ast.expr]
match_case body list[ast.stmt]

Coda

I replaced list with Sequence in an ast-related project, but my inexpert code might not be relevant to this conversation.

@JelleZijlstra
Copy link
Member

Yes, this is a classic variance problem. But these objects are actually lists in AST produced by Python itself, so I don't think we should change the stubs.

In your first example the solution is to declare the type as list_stmt_subclasses: list[ast.stmt] = [Pass()].

@hunterhogan
Copy link
Contributor Author

But these objects are actually lists in AST produced by Python itself, so I don't think we should change the stubs.

I don't understand what you mean. Or more likely, I didn't properly explain what I mean. I am talking about the objects passed to __init__ as a parameter/field for a subclass. So, those objects are whatever the user makes them.

In your first example the solution is to declare the type as list_stmt_subclasses: list[ast.stmt] = [Pass()].

Thank you for ensuring that I was aware of that solution.

I don't have a different simple example or a merely complex example. I only have an absurdly complex example.

My ast modules were originally developed in another package, and if you go to this commit in mapFolding, that is immediately before I switched to Sequence. I have a handful of invariance problems in "mapFolding/someAssemblyRequired/_toolGrab.py". But the real problems are when I try to use the simple tools in compositions, such as in "mapFolding/someAssemblyRequired/Z0Z_makeSomeModules.py". In the next commit, I changed my code to Sequence and it resolved a ton of issues. Because the ast module still expects list however, I had to add list() in some places when I called ast.

That code has significantly improved and is now in https://github.com/hunterhogan/astToolkit. "astToolkit/_toolGrab.py". mapFolding uses it...

... but I don't have enough technical knowledge to either 1) describe the situation properly or 2) show you a clear demonstration. Furthermore, it was only three days ago that I deciphered the documentation for ._field_types, so astToolkit uses an unnecessarily complex system in "toolFactory/". It is almost impossible to believe I can understand anything about Python after looking at that code. Finally, I have other typing issues that I have no idea how to resolve that are probably as easy as knowing how to use ._fields, so it is absurd to think I might be partially right about changing list to something else.

:(


This would require a lot of effort, so I don't think you should do it.

  1. Clone astToolkit, Py >= 3.12. pip install -e .[testing] -e .\toolFactory (or ./toolFactory).
  2. In astToolkit, go to the toolFactory, and add return on this line https://github.com/hunterhogan/astToolkit/blob/fac2c29f4b4f305543a7876b770d46a867f2cd57/toolFactory/astFactory.py#L244, which will have the effect of not changing list to Sequence on 19 different "attributes". (In the ast module, they are called "fields". At the moment, I am confused whether they are attributes or fields or parameters or arguments or something else: in that code, right now, they are "attributes".)
  3. Go to toolFactory/Z0Z_makeAstTools.py and run the module to re-generate some modules in astToolkit.
  4. Now what? IDK. Because I separated the generic logic from mapFolding, most of the composable statements that have invariance problems are in mapFolding. You would need to clone/install mapFolding but use your version of astToolkit instead of PyPI. Then you would have to rummage through "mapFolding/someAssemblyRequired". I guess "mapFolding/someAssemblyRequired/Z0Z_makeSomeModules.py" would have examples, but it also has other typing problems. And that module is a prototype: I haven't converged... the code is not DRY, for example.

I think it all boils down to:

  • you certainly will pass list[ast.alias] to the ast.ImportFrom() constructor
  • but no one will ever pass list[ast.stmt], a builtins.list in which each element is literally class ast.stmt, to the ast.FunctionDef() constructor.
  • If two things behave the same, they should be "labeled" similarly. If two things behave differently, they should have "labels" that differentiate them.

@hunterhogan
Copy link
Contributor Author

I suspect I do not have a sufficient understanding of the difference between __init__ and storing data.

But these objects are actually lists in AST

Maybe I should see that you are pointing out that the data is stored in a list, therefore the field type should be a list.

This situation feels weirdly similar to ordering food in Xi'an. I only knew basic Beijing-er Mandarin, which is practically a foreign language in Xi'an. I tried to order beef dumplings and I got cabbage soup. I didn't have the words to properly differentiate my thoughts, and I couldn't hear pronunciation differences when she explained things to me.

Well, maybe the jiaozi I want is for the __init__ method to sometimes be labeled a Sequence.

class BoolOp(expr):
    if sys.version_info >= (3, 10):
        __match_args__ = ("op", "values")
    op: boolop
    values: list[expr]
    if sys.version_info >= (3, 13):
        def __init__(self, op: boolop, values: Sequence[expr] = ..., **kwargs: Unpack[_Attributes]) -> None: ...
    else:
        def __init__(self, op: boolop, values: Sequence[expr], **kwargs: Unpack[_Attributes]) -> None: ...

    if sys.version_info >= (3, 14):
        def __replace__(self, *, op: boolop = ..., values: Sequence[expr] = ..., **kwargs: Unpack[_Attributes]) -> Self: ...

@JelleZijlstra
Copy link
Member

maybe the jiaozi I want is for the __init__ method to sometimes be labeled a Sequence

The problem with that is now something like op = BoolOp(values=()) gets accepted. Then at runtime, op.values is a tuple, but type checkers will think op.values is a list.

@hunterhogan
Copy link
Contributor Author

maybe the jiaozi I want is for the __init__ method to sometimes be labeled a Sequence

The problem with that is now something like op = BoolOp(values=()) gets accepted. Then at runtime, op.values is a tuple, but type checkers will think op.values is a list.

Ah. If I knew how to read C, then I would have known that the method doesn't enforce the input type and allows tuples. I need to ruminate on this.

@hunterhogan
Copy link
Contributor Author

hunterhogan commented May 10, 2025

If using a tuple instead of a list is a problem because of the type checker, but not because of the interpreter behavior, then a list and a tuple should have the same outcome. Since they don't have the same outcome, should we create an issue in github.com/python/cpython/?

ast.BoolOp field values as list

Code

import ast
thisIsMyTaskIndexCodeBlock = ast.If(ast.BoolOp(ast.Or()
		, values=[ast.Compare(ast.Name('leaf1ndex'), ops=[ast.NotEq()], comparators=[ast.Name('taskDivisions')])
				, ast.Compare(ast.BinOp(ast.Name('leafConnectee'), op=ast.Mod(), right=ast.Name('taskDivisions')), ops=[ast.Eq()], comparators=[ast.Name('taskIndex')])
				])
	, body=[ast.Pass()])

astModule = ast.Module([thisIsMyTaskIndexCodeBlock])
ast.fix_missing_locations(astModule)
print(ast.unparse(astModule))
print()
print(ast.dump(thisIsMyTaskIndexCodeBlock, annotate_fields=True, indent=4))

Print

if leaf1ndex != taskDivisions or leafConnectee % taskDivisions == taskIndex:
    pass

If(
    test=BoolOp(
        op=Or(),
        values=[
            Compare(
                left=Name(id='leaf1ndex', ctx=Load()),
                ops=[
                    NotEq()],
                comparators=[
                    Name(id='taskDivisions', ctx=Load())]),
            Compare(
                left=BinOp(
                    left=Name(id='leafConnectee', ctx=Load()),
                    op=Mod(),
                    right=Name(id='taskDivisions', ctx=Load())),
                ops=[
                    Eq()],
                comparators=[
                    Name(id='taskIndex', ctx=Load())])]),
    body=[
        Pass()])

ast.BoolOp field values as inferred tuple

Code

import ast
thisIsMyTaskIndexCodeBlock = ast.If(ast.BoolOp(ast.Or()
		, values=(ast.Compare(ast.Name('leaf1ndex'), ops=[ast.NotEq()], comparators=[ast.Name('taskDivisions')])
				, ast.Compare(ast.BinOp(ast.Name('leafConnectee'), op=ast.Mod(), right=ast.Name('taskDivisions')), ops=[ast.Eq()], comparators=[ast.Name('taskIndex')])
				))
	, body=[ast.Pass()])

astModule = ast.Module([thisIsMyTaskIndexCodeBlock])
ast.fix_missing_locations(astModule)
print(ast.unparse(astModule))
print()
print(ast.dump(thisIsMyTaskIndexCodeBlock, annotate_fields=True, indent=4))

Print

if leaf1ndex != taskDivisions or leafConnectee % taskDivisions == taskIndex:
    pass

If(
    test=BoolOp(op=Or(), values=(<ast.Compare object at 0x0000024FE6C81590>, <ast.Compare object at 0x0000024FE6C80BD0>)),
    body=[
        Pass()])

ast.BoolOp field values as explicit tuple

Code

import ast
thisIsMyTaskIndexCodeBlock = ast.If(ast.BoolOp(ast.Or()
		, values=tuple([ast.Compare(ast.Name('leaf1ndex'), ops=[ast.NotEq()], comparators=[ast.Name('taskDivisions')])
				, ast.Compare(ast.BinOp(ast.Name('leafConnectee'), op=ast.Mod(), right=ast.Name('taskDivisions')), ops=[ast.Eq()], comparators=[ast.Name('taskIndex')])
				]))
	, body=[ast.Pass()])
astModule = ast.Module([thisIsMyTaskIndexCodeBlock])
ast.fix_missing_locations(astModule)
print(ast.unparse(astModule))
print()
print(ast.dump(thisIsMyTaskIndexCodeBlock, annotate_fields=True, indent=4))

Print

if leaf1ndex != taskDivisions or leafConnectee % taskDivisions == taskIndex:
    pass

If(
    test=BoolOp(op=Or(), values=(<ast.Compare object at 0x0000024FE78F9D90>, <ast.Compare object at 0x0000024FE6C35310>)),
    body=[
        Pass()])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants