feat: use non-streaming tool calls for streaming tool calls #48

nathan-weinberg · 2025-04-30T01:58:44Z

Fixes #47

Summary by Sourcery

Modify OpenAI compatibility layer to force non-streaming mode for tool calls due to llama.cpp limitations

New Features:

Add explicit handling of streaming for tool calls in OpenAI-compatible API

Bug Fixes:

Resolve issue with streaming tool calls in llama.cpp by forcing non-streaming mode

Enhancements:

Improve tool call compatibility by overriding stream parameter

sourcery-ai · 2025-04-30T01:58:49Z

Reviewer's Guide

This pull request modifies the OpenAI compatibility layer to disable streaming when tool calls are requested. It updates the payload dictionary within the convert_chat_completion_request function to explicitly set stream=False if the incoming request includes tools, accommodating a limitation in llama.cpp.

File-Level Changes

Change	Details	Files
Disabled streaming for tool calls	Forced `stream` parameter to `False` in the request payload when tools are present in the request	`src/ramalama_stack/openai_compat.py`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#47	Work around llama.cpp's lack of streaming tool call support by faking streaming responses for tool calls (i.e., ensure that when a streaming tool call is requested, the provider actually performs a non-streaming tool call and returns the response in a streaming-compatible way).	✅
#47	Update the provider's code so that tool calls work correctly in streaming mode, passing the Llama Stack OpenAI API verification tests for streaming tool calls.	✅

Possibly linked issues

RamaLama provider fails streaming tool call verification tests #47: The PR disables streaming for tool calls in the OpenAI compatibility layer to work around llama.cpp's limitation, fixing the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

nathan-weinberg · 2025-04-30T12:50:37Z

@bbrowning I tested this here - how can I generate the Markdown report from the generated JSON?

bbrowning · 2025-04-30T14:29:17Z

Download and unzip the artifacts from the ramalama verification job at https://github.com/instructlab/lls-openai-client/actions/runs/14745352763 - bottom of that page you'll see "openai-api-verification-results" artifacts.
Clone meta-llama/llama-stack locally, setup a venv, pip install it, etc (basically have a working llama-stack dev env).
Copy the ramalama.json and ramalama-llama-stack.json from the artifacts zip into your llama-stack clone's tests/verifications/test_results/ directory.
Run the following from the root of your llama-stack clone directory:
- python tests/verifications/generate_report.py --provider ramalama ramalama-llama-stack

This could be made easier, but up until now all providers were in-tree so it made sense to have the report generation in-tree as well.

sourcery-ai

Hey @nathan-weinberg - I've reviewed your changes - here's some feedback:

Consider adding a link to relevant llama.cpp documentation or an issue tracker regarding the lack of streaming tool call support for future reference.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Signed-off-by: Nathan Weinberg <[email protected]>

nathan-weinberg force-pushed the streaming-tool-call-fix branch 2 times, most recently from a4310f5 to 4d0f2ca Compare April 30, 2025 12:49

nathan-weinberg force-pushed the streaming-tool-call-fix branch 2 times, most recently from 56dc2d7 to 9a95420 Compare May 1, 2025 18:21

nathan-weinberg marked this pull request as ready for review May 1, 2025 18:32

sourcery-ai bot approved these changes May 1, 2025

View reviewed changes

nathan-weinberg marked this pull request as draft May 2, 2025 03:02

nathan-weinberg force-pushed the streaming-tool-call-fix branch from 9a95420 to 9e99639 Compare May 2, 2025 03:07

feat: use non-streaming tool calls for streaming tool calls

1c44fe8

Signed-off-by: Nathan Weinberg <[email protected]>

nathan-weinberg force-pushed the streaming-tool-call-fix branch from 9e99639 to 1c44fe8 Compare May 7, 2025 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use non-streaming tool calls for streaming tool calls #48

feat: use non-streaming tool calls for streaming tool calls #48

Uh oh!

nathan-weinberg commented Apr 30, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Apr 30, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

nathan-weinberg commented Apr 30, 2025

Uh oh!

bbrowning commented Apr 30, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

feat: use non-streaming tool calls for streaming tool calls #48

Are you sure you want to change the base?

feat: use non-streaming tool calls for streaming tool calls #48

Uh oh!

Conversation

nathan-weinberg commented Apr 30, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

nathan-weinberg commented Apr 30, 2025

Uh oh!

bbrowning commented Apr 30, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nathan-weinberg commented Apr 30, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Apr 30, 2025 •

edited

Loading