Skip to content

Add Mistral AI Chat Completion support to Inference Plugin #128538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Jan-Kazlouski-elastic
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented May 27, 2025

Change to existing Mistral AI provider integration allowing completion (both streaming and non-streaming) and chat_completion (only streaming) to be executed as part of inference API.
Changes were tested against next models:

  • mistral-large-latest
  • mistral-small-latest

Notes:

  • Mistral returns at least 5 different formats of non-streaming errors. While in their documentation only 1 type is mentioned. However MistralErrorEntity handles error formats for 4 most common errors: Unauthorized, Bad Request, Not Found, Unprocessable Entity.
  • Format of streaming errors is not defined in Mistral documentation and wasn't received during testing. Assuming mid stream errors follow Open AI format.
  • Changes were made to common OpenAI Response Handler allowing it to handle more error codes. That might affect behavior for other providers, but it is better solution than having duplication.
  • Mistral AI doesn't recognize "stream_options" field despite it being present in OpenAI schema. I added provider-based solution to not include it for Mistral.
  • Task settings not being passed as parameters but not being used is already present solution that might need improvement, e.g. not passing it as parameter.

Examples of RQ/RS from local testing:

Create Completion Endpoint

Success:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "inference_id": "mistral-completion",
    "task_type": "completion",
    "service": "mistral",
    "service_settings": {
        "model": "mistral-small-latest",
        "rate_limit": {
            "requests_per_minute": 240
        }
    }
}

Unauthorized:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{invalid-mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [Unauthorized]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [Unauthorized]"
        }
    },
    "status": 400
}

Not Found:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
        }
    },
    "status": 400
}

Invalid Model:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "wrong-model-name"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [Invalid model: wrong-model-name]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [Invalid model: wrong-model-name]"
        }
    },
    "status": 400
}
Perform Non-Streaming Completion

Success:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
{
    "completion": [
        {
            "result": "The sentence you've provided is the opening line of William Gibson's seminal cyberpunk novel *Neuromancer*. This vivid and evocative description sets the tone for the dystopian, high-tech, low-life world that the novel explores. The imagery of a \"dead channel\" on a television screen suggests a sense of emptiness, static, and the absence of clear signals or information, which can be seen as a metaphor for the fragmented and often chaotic nature of the future Gibson envisions.\n\nThe use of such a striking opening line is characteristic of Gibson's style, which often blends technological and cultural references to create a rich, immersive atmosphere. *Neuromancer* is known for its influence on the cyberpunk genre and its prescient exploration of themes related to artificial intelligence, virtual reality, and the digital age."
        }
    ]
}

Not Found:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
    },
    "status": 404
}
Perform Streaming Completion

Success:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion/_stream
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
event: message
data: {"completion":[{"delta":"The"}]}

event: message
data: {"completion":[{"delta":" sentence"},{"delta":" you"}]}

event: message
data: {"completion":[{"delta":"'ve"}]}

event: message
data: {"completion":[{"delta":" provided"}]}

event: message
data: {"completion":[{"delta":" is"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" opening"}]}

event: message
data: {"completion":[{"delta":" line"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" William"}]}

event: message
data: {"completion":[{"delta":" Gibson"}]}

event: message
data: {"completion":[{"delta":"'s"}]}

event: message
data: {"completion":[{"delta":" seminal"}]}

event: message
data: {"completion":[{"delta":" cyber"}]}

event: message
data: {"completion":[{"delta":"punk"}]}

event: message
data: {"completion":[{"delta":" novel"}]}

event: message
data: {"completion":[{"delta":" *"}]}

event: message
data: {"completion":[{"delta":"Ne"}]}

event: message
data: {"completion":[{"delta":"u"}]}

event: message
data: {"completion":[{"delta":"rom"}]}

event: message
data: {"completion":[{"delta":"ancer"}]}

event: message
data: {"completion":[{"delta":"*."}]}

event: message
data: {"completion":[{"delta":" This"}]}

event: message
data: {"completion":[{"delta":" vivid"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" evoc"}]}

event: message
data: {"completion":[{"delta":"ative"}]}

event: message
data: {"completion":[{"delta":" description"}]}

event: message
data: {"completion":[{"delta":" sets"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" tone"}]}

event: message
data: {"completion":[{"delta":" for"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" dyst"}]}

event: message
data: {"completion":[{"delta":"op"}]}

event: message
data: {"completion":[{"delta":"ian"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" high"}]}

event: message
data: {"completion":[{"delta":"-tech"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" low"}]}

event: message
data: {"completion":[{"delta":"-life"}]}

event: message
data: {"completion":[{"delta":" world"}]}

event: message
data: {"completion":[{"delta":" that"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" novel"}]}

event: message
data: {"completion":[{"delta":" explores"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: {"completion":[{"delta":" The"}]}

event: message
data: {"completion":[{"delta":" imagery"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" \""}]}

event: message
data: {"completion":[{"delta":"dead"}]}

event: message
data: {"completion":[{"delta":" channel"}]}

event: message
data: {"completion":[{"delta":"\""}]}

event: message
data: {"completion":[{"delta":" on"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" television"}]}

event: message
data: {"completion":[{"delta":" screen"}]}

event: message
data: {"completion":[{"delta":" suggests"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" sense"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" empt"}]}

event: message
data: {"completion":[{"delta":"iness"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" static"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" absence"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" clear"}]}

event: message
data: {"completion":[{"delta":" signals"}]}

event: message
data: {"completion":[{"delta":" or"}]}

event: message
data: {"completion":[{"delta":" information"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" which"}]}

event: message
data: {"completion":[{"delta":" can"}]}

event: message
data: {"completion":[{"delta":" be"}]}

event: message
data: {"completion":[{"delta":" seen"}]}

event: message
data: {"completion":[{"delta":" as"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" metaphor"}]}

event: message
data: {"completion":[{"delta":" for"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" fragmented"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" often"}]}

event: message
data: {"completion":[{"delta":" confusing"},{"delta":" reality"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" characters"}]}

event: message
data: {"completion":[{"delta":" in"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" story"}]}

event: message
data: {"completion":[{"delta":".\n\n"}]}

event: message
data: {"completion":[{"delta":"Gib"}]}

event: message
data: {"completion":[{"delta":"son"}]}

event: message
data: {"completion":[{"delta":"'s"}]}

event: message
data: {"completion":[{"delta":" use"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" such"}]}

event: message
data: {"completion":[{"delta":" imagery"}]}

event: message
data: {"completion":[{"delta":" is"}]}

event: message
data: {"completion":[{"delta":" characteristic"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" cyber"}]}

event: message
data: {"completion":[{"delta":"punk"}]}

event: message
data: {"completion":[{"delta":" genre"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" which"}]}

event: message
data: {"completion":[{"delta":" often"}]}

event: message
data: {"completion":[{"delta":" blends"}]}

event: message
data: {"completion":[{"delta":" advanced"}]}

event: message
data: {"completion":[{"delta":" technology"}]}

event: message
data: {"completion":[{"delta":" with"}]}

event: message
data: {"completion":[{"delta":" social"}]}

event: message
data: {"completion":[{"delta":" decay"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" sense"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" alien"}]}

event: message
data: {"completion":[{"delta":"ation"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: {"completion":[{"delta":" The"}]}

event: message
data: {"completion":[{"delta":" \""}]}

event: message
data: {"completion":[{"delta":"port"}]}

event: message
data: {"completion":[{"delta":"\""}]}

event: message
data: {"completion":[{"delta":" mentioned"}]}

event: message
data: {"completion":[{"delta":" could"}]}

event: message
data: {"completion":[{"delta":" refer"}]}

event: message
data: {"completion":[{"delta":" to"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" space"}]}

event: message
data: {"completion":[{"delta":"port"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" se"}]}

event: message
data: {"completion":[{"delta":"aport"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" or"}]}

event: message
data: {"completion":[{"delta":" even"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" data"}]}

event: message
data: {"completion":[{"delta":" port"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" adding"}]}

event: message
data: {"completion":[{"delta":" to"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" ambiguity"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" futur"}]}

event: message
data: {"completion":[{"delta":"istic"}]}

event: message
data: {"completion":[{"delta":" feel"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" setting"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: [DONE]

Not Found:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion/_stream
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
event: error
data: {"error":{"root_cause":[{"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"}],"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"},"status":404}
Create Chat Completion Endpoint

Success:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "inference_id": "mistral-chat-completion",
    "task_type": "chat_completion",
    "service": "mistral",
    "service_settings": {
        "model": "mistral-small-latest",
        "rate_limit": {
            "requests_per_minute": 240
        }
    }
}

Unauthorized:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{invalid-mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [Unauthorized]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [Unauthorized]"
        }
    },
    "status": 400
}

Not Found:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]"
        }
    },
    "status": 400
}

Invalid Model:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "invalid-model-name"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]"
        }
    },
    "status": 400
}
Perform Streaming Chat Completion

Success:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk"}

event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":"Deep"},"index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk"}

event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":" learning"},"finish_reason":"length","index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk","usage":{"completion_tokens":2,"prompt_tokens":8,"total_tokens":10}}

event: message
data: [DONE]

Invalid Model:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "invalid-model-name",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: error
data: {"error":{"code":"bad_request","message":"Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]","type":"mistral_error"}}

Negative Max Tokens:
RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": -1
}
RS:
event: error
data: {"error":{"code":"unprocessable_entity","message":"Received an input validation error response for request from inference entity id [mistral-chat-completion] status [422]. Error message: [Input should be greater than or equal to 0]","type":"mistral_error"}}

Not Found:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: error
data: {"error":{"code":"not_found","message":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]","type":"mistral_error"}}
  • - Have you signed the contributor license agreement?
  • - Have you followed the contributor guidelines?
  • - If submitting code, have you built your formula locally prior to submission with gradle check?
  • - If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • - If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • - If you are submitting this code for a class then read our policy for that.

@elasticsearchmachine elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 27, 2025
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I left a few suggestions.

Could you update the description of the PR so that the example requests are formatted?

Let's also wrap them in code blocks using the three backticks. Thanks for making them collapsable sections though!

builder.startObject(STREAM_OPTIONS_FIELD);
builder.field(INCLUDE_USAGE_FIELD, true);
builder.endObject();
fillStreamOptionsFields(builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an FYI we have some inflight changes that'll affect how we do this: #128592

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the heads up! I inspected the changes in the attached PR. They don't affect my changes. We're good.

@@ -0,0 +1,5 @@
pr: 128538
summary: "[ML] Add Mistral Chat Completion support to the Inference Plugin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
summary: "[ML] Add Mistral Chat Completion support to the Inference Plugin"
summary: "Added Mistral Chat Completion support to the Inference Plugin"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

action.execute(inputs, timeout, listener);
} else {
listener.onFailure(createInvalidModelException(model));
switch (model) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that I believe this will cause failure when backporting to 8.19 because of the JDK version difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot about JDK version. Fixed now.

XContentParser jsonParser = XContentFactory.xContent(XContentType.JSON)
.createParser(XContentParserConfiguration.EMPTY, response.body())
) {
var responseMap = jsonParser.map();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to eventually refactor the error parsing logic to not parse at all in the future (across all the services). How about we convert the bytes to a utf8 string and return that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also renamed it to MistralErrorResponse because it isn't really an entity. Do let me know if you prefer MistralErrorResponseEntity.

@@ -86,11 +88,21 @@ protected void checkForFailureStatusCode(Request request, HttpResult result) thr
throw new RetryException(false, buildError(AUTHENTICATION, request, result));
} else if (statusCode >= 300 && statusCode < 400) {
throw new RetryException(false, buildError(REDIRECTION, request, result));
} else if (statusCode == 422) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be confusing looking at this when the openai docs don't mention these error codes. How about we do this:

Let's push the successful status code check up into the base class (let's do that in a separate PR)

        if (result.isSuccessfulResponse()) {
            return;
        }

I briefly looked at all the response handlers and they're duplicating those lines anyway. If the call in the base class succeeds we can skip calling checkForFailureStatusCode for the child classes.

I think we can then override this method in the child class and add these error code checks after calling super.checkForFailureStatusCode()

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That approach will lead to duplication because while this check will have to be present in both streaming and non-streaming handler classes. Streaming handler class has this hierarchy:

MistralUnifiedChatCompletionResponseHandler
extends
OpenAiUnifiedChatCompletionResponseHandler
extends
OpenAiChatCompletionResponseHandler
extends
OpenAiResponseHandler

Non-streaming handler has this hierarchy:
MistralCompletionResponseHandler
extends
OpenAiChatCompletionResponseHandler
extends
OpenAiResponseHandler

Closest common ancestor is OpenAiChatCompletionResponseHandler, but streaming handler also needs to take in logic from OpenAiUnifiedChatCompletionResponseHandler.
Do we really want to introduce such duplication? It will be literally 1 to 1 because errors are the same. Not very good practice. Please correct me if i'm wrong.

I think we can then override this method in the child class and add these error code checks after calling super.checkForFailureStatusCode()

Also I'd do the call to super in the default branch of the the check in child class because superclass always throws an exception, defaulting in undefined error, while child class check can default to calling super method.

import org.elasticsearch.xpack.inference.services.mistral.response.MistralErrorResponseEntity;

/**
* Handles non-streaming chat completion responses for Mistral models, extending the OpenAI chat completion response handler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Handles non-streaming chat completion responses for Mistral models, extending the OpenAI chat completion response handler.
* Handles non-streaming completion responses for Mistral models, extending the OpenAI chat completion response handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

@@ -58,6 +58,20 @@ public MistralEmbeddingsModel(MistralEmbeddingsModel model, MistralEmbeddingsSer
setPropertiesFromServiceSettings(serviceSettings);
}

protected void setPropertiesFromServiceSettings(MistralEmbeddingsServiceSettings serviceSettings) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be protected? Can we make it private? I think we could potentially leak the this context if this class is extended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see a reason for it to be protected. Changed to private. Same as for completions.

* 2.0.
*/

package org.elasticsearch.xpack.inference.services.openai;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this class to the mistral package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Moved it to appropriate package.

@@ -0,0 +1,51 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prwhelan any suggestions on how to intentionally encounter a midstream error while testing?

* Handles streaming chat completion responses and error parsing for Mistral inference endpoints.
* Adapts the OpenAI handler to support Mistral's simpler error schema with fields like "message" and "http_status_code".
*/
public class MistralUnifiedChatCompletionResponseHandler extends OpenAiUnifiedChatCompletionResponseHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the midstream errors are in the same format as openai, how about we refactor the OpenAiUnifiedChatCompletionResponseHandler so that we can replace some of the strings that reference openai specifically? I think it's just the name of the parser:

"open_ai_error",
true,
args -> Optional.ofNullable((OpenAiErrorResponse) args[0])
);
private static final ConstructingObjectParser<OpenAiErrorResponse, Void> ERROR_BODY_PARSER = new ConstructingObjectParser<>(
"open_ai_error",

Maybe we could extract those classes and rename them to be more generic?

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
But we're not positive that midstream errors are in the same format as openai. It just assumed of the fact that Mistral uses OpenAI type API

…mistral-chat-completion-integration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@Jan-Kazlouski-elastic
Copy link
Contributor Author

Hi @jonathan-buttner
I addressed the comments. All fixes are done. I replied to the one comment related to the OpenAI handlers hierarchy #128538 (comment) to discuss it a bit further.

Also I updated the section related to testing. Three backticks didn't work for me when I was creating initial PR comment, but when I applied them afterwards - worked perfectly. Not sure why, but I will remember for the future PR creation.

@Jan-Kazlouski-elastic
Copy link
Contributor Author

New error format:

Create Completion Endpoint

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}

Unauthorized:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"a580d263fb1521778782b22104efb415\"\n}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"a580d263fb1521778782b22104efb415\"\n}]"
        }
    },
    "status": 400
}

Invalid Model:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: wrong-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: wrong-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
        }
    },
    "status": 400
}
Perform Non-Streaming Completion

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
    },
    "status": 404
}
Perform Streaming Completion

Not Found:

event: error
data: {"error":{"root_cause":[{"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"}],"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"},"status":404}
Create Chat Completion Endpoint

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}

Unauthorized:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"409ddf538d3f1a55bfe4b7324fe01676\"\n}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"409ddf538d3f1a55bfe4b7324fe01676\"\n}]"
        }
    },
    "status": 400
}

Invalid Model:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
        }
    },
    "status": 400
}
Perform Streaming Chat Completion

Not Found:

event: error
data: {"error":{"code":"not_found","message":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]","type":"mistral_error"}}

Negative Max Tokens:

event: error
data: {"error":{"code":"unprocessable_entity","message":"Received an input validation error response for request from inference entity id [mistral-chat-completion] status [422]. Error message: [Input should be greater than or equal to 0]","type":"mistral_error"}}

Invalid Model:

event: error
data: {"error":{"code":"bad_request","message":"Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]","type":"mistral_error"}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants