Skip to content

feat: use non-streaming tool calls for streaming tool calls #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nathan-weinberg
Copy link
Collaborator

@nathan-weinberg nathan-weinberg commented Apr 30, 2025

Fixes #47

Summary by Sourcery

Modify OpenAI compatibility layer to force non-streaming mode for tool calls due to llama.cpp limitations

New Features:

  • Add explicit handling of streaming for tool calls in OpenAI-compatible API

Bug Fixes:

  • Resolve issue with streaming tool calls in llama.cpp by forcing non-streaming mode

Enhancements:

  • Improve tool call compatibility by overriding stream parameter

Copy link

sourcery-ai bot commented Apr 30, 2025

Reviewer's Guide

This pull request modifies the OpenAI compatibility layer to disable streaming when tool calls are requested. It updates the payload dictionary within the convert_chat_completion_request function to explicitly set stream=False if the incoming request includes tools, accommodating a limitation in llama.cpp.

File-Level Changes

Change Details Files
Disabled streaming for tool calls
  • Forced stream parameter to False in the request payload when tools are present in the request
src/ramalama_stack/openai_compat.py

Assessment against linked issues

Issue Objective Addressed Explanation
#47 Work around llama.cpp's lack of streaming tool call support by faking streaming responses for tool calls (i.e., ensure that when a streaming tool call is requested, the provider actually performs a non-streaming tool call and returns the response in a streaming-compatible way).
#47 Update the provider's code so that tool calls work correctly in streaming mode, passing the Llama Stack OpenAI API verification tests for streaming tool calls.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@nathan-weinberg nathan-weinberg force-pushed the streaming-tool-call-fix branch 2 times, most recently from a4310f5 to 4d0f2ca Compare April 30, 2025 12:49
@nathan-weinberg
Copy link
Collaborator Author

@bbrowning I tested this here - how can I generate the Markdown report from the generated JSON?

@bbrowning
Copy link

  • Download and unzip the artifacts from the ramalama verification job at https://github.com/instructlab/lls-openai-client/actions/runs/14745352763 - bottom of that page you'll see "openai-api-verification-results" artifacts.

  • Clone meta-llama/llama-stack locally, setup a venv, pip install it, etc (basically have a working llama-stack dev env).

  • Copy the ramalama.json and ramalama-llama-stack.json from the artifacts zip into your llama-stack clone's tests/verifications/test_results/ directory.

  • Run the following from the root of your llama-stack clone directory:

    • python tests/verifications/generate_report.py --provider ramalama ramalama-llama-stack

This could be made easier, but up until now all providers were in-tree so it made sense to have the report generation in-tree as well.

@nathan-weinberg nathan-weinberg force-pushed the streaming-tool-call-fix branch 2 times, most recently from 56dc2d7 to 9a95420 Compare May 1, 2025 18:21
@nathan-weinberg nathan-weinberg marked this pull request as ready for review May 1, 2025 18:32
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nathan-weinberg - I've reviewed your changes - here's some feedback:

  • Consider adding a link to relevant llama.cpp documentation or an issue tracker regarding the lack of streaming tool call support for future reference.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@nathan-weinberg nathan-weinberg marked this pull request as draft May 2, 2025 03:02
@nathan-weinberg nathan-weinberg force-pushed the streaming-tool-call-fix branch from 9a95420 to 9e99639 Compare May 2, 2025 03:07
@nathan-weinberg nathan-weinberg force-pushed the streaming-tool-call-fix branch from 9e99639 to 1c44fe8 Compare May 7, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RamaLama provider fails streaming tool call verification tests
2 participants