Skip to content

[Feature Request] Support system generated ingest pipelines for bulk update operations #18276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
q-andy opened this issue May 13, 2025 · 0 comments · May be fixed by #18277
Open

[Feature Request] Support system generated ingest pipelines for bulk update operations #18276

q-andy opened this issue May 13, 2025 · 0 comments · May be fixed by #18277
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing untriaged

Comments

@q-andy
Copy link
Contributor

q-andy commented May 13, 2025

Is your feature request related to a problem? Please describe

Currently there is inconsistency around how ingest pipelines are applied to single/bulk document update operations described in #17742. This leads to inconsistent document processing, particularly when update requests generate multiple index operations (e.g., upsert scenarios or doc_as_upsert cases): certain flag combinations trigger ingest pipelines, while others don't.

System ingest pipelines introduced in #17817 are intended to apply processor transformations like embedding generation for semantic field while abstracting away pipeline setup for users. In addition to the update inconsistency problems described previously, this introduces more surface area for confusion: for example, semantic field users may bulk update their semantic text field without knowing it uses system ingest pipelines to generate embeddings under the hood. This would cause the text field and the underlying embedding to be out of sync due to pipelines not being triggered, leading to search degradation.

We propose a sub-solution for the general case described in #17742 where we resolve and execute system pipelines for all update requests to make this behavior consistent. Much of this work is also shared with resolving the general case of the original issue.

Describe the solution you'd like

Support system ingest pipelines for bulk update operations

Update Request Type Classification

  • Introduce a method to expose all child index requests associated with an update operation

Pipeline Resolution Enhancement

  • Use resolveSystemIngestPipeline to enable resolving only the system ingest pipeline while setting the others to NOOP
  • Based on update request fields, we extract the update request children and conditionally resolve ALL pipelines, resolve ONLY system ingest pipelines, or no pipelines at all.

Slot Management

  • Introduce innerSlot to track individual child index requests within anupdate operation
  • Use innerslot to map pipeline execution results back to the correct child request using (slot, innerSlot) pairs
  • Maintain proper error handling and response mapping for both parent and child operations to their original bulk request slot

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

No response

@q-andy q-andy added enhancement Enhancement or improvement to existing feature or request untriaged labels May 13, 2025
@github-actions github-actions bot added the Indexing Indexing, Bulk Indexing and anything related to indexing label May 13, 2025
@q-andy q-andy linked a pull request May 13, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant