Add Mistral AI Chat Completion support to Inference Plugin #128538

Jan-Kazlouski-elastic · 2025-05-27T22:21:12Z

Change to existing Mistral AI provider integration allowing completion (both streaming and non-streaming) and chat_completion (only streaming) to be executed as part of inference API.
Changes were tested against next models:

mistral-large-latest
mistral-small-latest

Notes:

Mistral returns at least 5 different formats of non-streaming errors. While in their documentation only 1 type is mentioned. However MistralErrorEntity handles error formats for 4 most common errors: Unauthorized, Bad Request, Not Found, Unprocessable Entity.
Format of streaming errors is not defined in Mistral documentation and wasn't received during testing. Assuming mid stream errors follow Open AI format.
Changes were made to common OpenAI Response Handler allowing it to handle more error codes. That might affect behavior for other providers, but it is better solution than having duplication.
Mistral AI doesn't recognize "stream_options" field despite it being present in OpenAI schema. I added provider-based solution to not include it for Mistral.
Task settings not being passed as parameters but not being used is already present solution that might need improvement, e.g. not passing it as parameter.

Examples of RQ/RS from local testing:

Create Completion Endpoint

Success:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "inference_id": "mistral-completion",
    "task_type": "completion",
    "service": "mistral",
    "service_settings": {
        "model": "mistral-small-latest",
        "rate_limit": {
            "requests_per_minute": 240
        }
    }
}

Unauthorized:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{invalid-mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [Unauthorized]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [Unauthorized]"
        }
    },
    "status": 400
}

Not Found:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
        }
    },
    "status": 400
}

Invalid Model:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "wrong-model-name"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [Invalid model: wrong-model-name]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [Invalid model: wrong-model-name]"
        }
    },
    "status": 400
}

Perform Non-Streaming Completion

Success:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
{
    "completion": [
        {
            "result": "The sentence you've provided is the opening line of William Gibson's seminal cyberpunk novel *Neuromancer*. This vivid and evocative description sets the tone for the dystopian, high-tech, low-life world that the novel explores. The imagery of a \"dead channel\" on a television screen suggests a sense of emptiness, static, and the absence of clear signals or information, which can be seen as a metaphor for the fragmented and often chaotic nature of the future Gibson envisions.\n\nThe use of such a striking opening line is characteristic of Gibson's style, which often blends technological and cultural references to create a rich, immersive atmosphere. *Neuromancer* is known for its influence on the cyberpunk genre and its prescient exploration of themes related to artificial intelligence, virtual reality, and the digital age."
        }
    ]
}

Not Found:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
    },
    "status": 404
}

Perform Streaming Completion

Success:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion/_stream
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
event: message
data: {"completion":[{"delta":"The"}]}

event: message
data: {"completion":[{"delta":" sentence"},{"delta":" you"}]}

event: message
data: {"completion":[{"delta":"'ve"}]}

event: message
data: {"completion":[{"delta":" provided"}]}

event: message
data: {"completion":[{"delta":" is"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" opening"}]}

event: message
data: {"completion":[{"delta":" line"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" William"}]}

event: message
data: {"completion":[{"delta":" Gibson"}]}

event: message
data: {"completion":[{"delta":"'s"}]}

event: message
data: {"completion":[{"delta":" seminal"}]}

event: message
data: {"completion":[{"delta":" cyber"}]}

event: message
data: {"completion":[{"delta":"punk"}]}

event: message
data: {"completion":[{"delta":" novel"}]}

event: message
data: {"completion":[{"delta":" *"}]}

event: message
data: {"completion":[{"delta":"Ne"}]}

event: message
data: {"completion":[{"delta":"u"}]}

event: message
data: {"completion":[{"delta":"rom"}]}

event: message
data: {"completion":[{"delta":"ancer"}]}

event: message
data: {"completion":[{"delta":"*."}]}

event: message
data: {"completion":[{"delta":" This"}]}

event: message
data: {"completion":[{"delta":" vivid"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" evoc"}]}

event: message
data: {"completion":[{"delta":"ative"}]}

event: message
data: {"completion":[{"delta":" description"}]}

event: message
data: {"completion":[{"delta":" sets"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" tone"}]}

event: message
data: {"completion":[{"delta":" for"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" dyst"}]}

event: message
data: {"completion":[{"delta":"op"}]}

event: message
data: {"completion":[{"delta":"ian"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" high"}]}

event: message
data: {"completion":[{"delta":"-tech"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" low"}]}

event: message
data: {"completion":[{"delta":"-life"}]}

event: message
data: {"completion":[{"delta":" world"}]}

event: message
data: {"completion":[{"delta":" that"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" novel"}]}

event: message
data: {"completion":[{"delta":" explores"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: {"completion":[{"delta":" The"}]}

event: message
data: {"completion":[{"delta":" imagery"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" \""}]}

event: message
data: {"completion":[{"delta":"dead"}]}

event: message
data: {"completion":[{"delta":" channel"}]}

event: message
data: {"completion":[{"delta":"\""}]}

event: message
data: {"completion":[{"delta":" on"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" television"}]}

event: message
data: {"completion":[{"delta":" screen"}]}

event: message
data: {"completion":[{"delta":" suggests"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" sense"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" empt"}]}

event: message
data: {"completion":[{"delta":"iness"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" static"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" absence"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" clear"}]}

event: message
data: {"completion":[{"delta":" signals"}]}

event: message
data: {"completion":[{"delta":" or"}]}

event: message
data: {"completion":[{"delta":" information"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" which"}]}

event: message
data: {"completion":[{"delta":" can"}]}

event: message
data: {"completion":[{"delta":" be"}]}

event: message
data: {"completion":[{"delta":" seen"}]}

event: message
data: {"completion":[{"delta":" as"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" metaphor"}]}

event: message
data: {"completion":[{"delta":" for"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" fragmented"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" often"}]}

event: message
data: {"completion":[{"delta":" confusing"},{"delta":" reality"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" characters"}]}

event: message
data: {"completion":[{"delta":" in"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" story"}]}

event: message
data: {"completion":[{"delta":".\n\n"}]}

event: message
data: {"completion":[{"delta":"Gib"}]}

event: message
data: {"completion":[{"delta":"son"}]}

event: message
data: {"completion":[{"delta":"'s"}]}

event: message
data: {"completion":[{"delta":" use"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" such"}]}

event: message
data: {"completion":[{"delta":" imagery"}]}

event: message
data: {"completion":[{"delta":" is"}]}

event: message
data: {"completion":[{"delta":" characteristic"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" cyber"}]}

event: message
data: {"completion":[{"delta":"punk"}]}

event: message
data: {"completion":[{"delta":" genre"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" which"}]}

event: message
data: {"completion":[{"delta":" often"}]}

event: message
data: {"completion":[{"delta":" blends"}]}

event: message
data: {"completion":[{"delta":" advanced"}]}

event: message
data: {"completion":[{"delta":" technology"}]}

event: message
data: {"completion":[{"delta":" with"}]}

event: message
data: {"completion":[{"delta":" social"}]}

event: message
data: {"completion":[{"delta":" decay"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" sense"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" alien"}]}

event: message
data: {"completion":[{"delta":"ation"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: {"completion":[{"delta":" The"}]}

event: message
data: {"completion":[{"delta":" \""}]}

event: message
data: {"completion":[{"delta":"port"}]}

event: message
data: {"completion":[{"delta":"\""}]}

event: message
data: {"completion":[{"delta":" mentioned"}]}

event: message
data: {"completion":[{"delta":" could"}]}

event: message
data: {"completion":[{"delta":" refer"}]}

event: message
data: {"completion":[{"delta":" to"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" space"}]}

event: message
data: {"completion":[{"delta":"port"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" se"}]}

event: message
data: {"completion":[{"delta":"aport"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" or"}]}

event: message
data: {"completion":[{"delta":" even"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" data"}]}

event: message
data: {"completion":[{"delta":" port"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" adding"}]}

event: message
data: {"completion":[{"delta":" to"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" ambiguity"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" futur"}]}

event: message
data: {"completion":[{"delta":"istic"}]}

event: message
data: {"completion":[{"delta":" feel"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" setting"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: [DONE]

Not Found:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion/_stream
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
event: error
data: {"error":{"root_cause":[{"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"}],"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"},"status":404}

Create Chat Completion Endpoint

Success:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "inference_id": "mistral-chat-completion",
    "task_type": "chat_completion",
    "service": "mistral",
    "service_settings": {
        "model": "mistral-small-latest",
        "rate_limit": {
            "requests_per_minute": 240
        }
    }
}

Unauthorized:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{invalid-mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [Unauthorized]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [Unauthorized]"
        }
    },
    "status": 400
}

Not Found:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]"
        }
    },
    "status": 400
}

Invalid Model:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "invalid-model-name"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]"
        }
    },
    "status": 400
}

Perform Streaming Chat Completion

Success:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk"}

event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":"Deep"},"index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk"}

event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":" learning"},"finish_reason":"length","index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk","usage":{"completion_tokens":2,"prompt_tokens":8,"total_tokens":10}}

event: message
data: [DONE]

Invalid Model:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "invalid-model-name",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: error
data: {"error":{"code":"bad_request","message":"Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]","type":"mistral_error"}}

Negative Max Tokens:
RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": -1
}
RS:
event: error
data: {"error":{"code":"unprocessable_entity","message":"Received an input validation error response for request from inference entity id [mistral-chat-completion] status [422]. Error message: [Input should be greater than or equal to 0]","type":"mistral_error"}}

Not Found:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: error
data: {"error":{"code":"not_found","message":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]","type":"mistral_error"}}

- Have you signed the contributor license agreement?
- Have you followed the contributor guidelines?
- If submitting code, have you built your formula locally prior to submission with gradle check?
- If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
- If submitting code, have you checked that your submission is for an OS and architecture that we support?
- If you are submitting this code for a class then read our policy for that.

jonathan-buttner

Looking good. I left a few suggestions.

Could you update the description of the PR so that the example requests are formatted?

Let's also wrap them in code blocks using the three backticks. Thanks for making them collapsable sections though!

jonathan-buttner · 2025-05-29T15:06:00Z

...a/org/elasticsearch/xpack/inference/external/unified/UnifiedChatCompletionRequestEntity.java

-            builder.startObject(STREAM_OPTIONS_FIELD);
-            builder.field(INCLUDE_USAGE_FIELD, true);
-            builder.endObject();
+            fillStreamOptionsFields(builder);


Just an FYI we have some inflight changes that'll affect how we do this: #128592

Thank you for the heads up! I inspected the changes in the attached PR. They don't affect my changes. We're good.

jonathan-buttner · 2025-05-29T15:09:47Z

docs/changelog/128538.yaml

@@ -0,0 +1,5 @@
+pr: 128538
+summary: "[ML] Add Mistral Chat Completion support to the Inference Plugin"


Suggested change

summary: "[ML] Add Mistral Chat Completion support to the Inference Plugin"

summary: "Added Mistral Chat Completion support to the Inference Plugin"

jonathan-buttner · 2025-05-29T15:59:30Z

...ference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java

-            action.execute(inputs, timeout, listener);
-        } else {
-            listener.onFailure(createInvalidModelException(model));
+        switch (model) {


Just a reminder that I believe this will cause failure when backporting to 8.19 because of the JDK version difference.

Yes, I forgot about JDK version. Fixed now.

jonathan-buttner · 2025-05-29T16:11:27Z

.../org/elasticsearch/xpack/inference/services/mistral/response/MistralErrorResponseEntity.java

+            XContentParser jsonParser = XContentFactory.xContent(XContentType.JSON)
+                .createParser(XContentParserConfiguration.EMPTY, response.body())
+        ) {
+            var responseMap = jsonParser.map();


I think we're going to eventually refactor the error parsing logic to not parse at all in the future (across all the services). How about we convert the bytes to a utf8 string and return that?

Done. Also renamed it to MistralErrorResponse because it isn't really an entity. Do let me know if you prefer MistralErrorResponseEntity.

jonathan-buttner · 2025-05-29T16:23:53Z

...e/src/main/java/org/elasticsearch/xpack/inference/services/openai/OpenAiResponseHandler.java

@@ -86,11 +88,21 @@ protected void checkForFailureStatusCode(Request request, HttpResult result) thr
            throw new RetryException(false, buildError(AUTHENTICATION, request, result));
        } else if (statusCode >= 300 && statusCode < 400) {
            throw new RetryException(false, buildError(REDIRECTION, request, result));
+        } else if (statusCode == 422) {


I think it might be confusing looking at this when the openai docs don't mention these error codes. How about we do this:

Let's push the successful status code check up into the base class (let's do that in a separate PR)

if (result.isSuccessfulResponse()) { return; }

I briefly looked at all the response handlers and they're duplicating those lines anyway. If the call in the base class succeeds we can skip calling checkForFailureStatusCode for the child classes.

I think we can then override this method in the child class and add these error code checks after calling super.checkForFailureStatusCode()

That approach will lead to duplication because while this check will have to be present in both streaming and non-streaming handler classes. Streaming handler class has this hierarchy:

MistralUnifiedChatCompletionResponseHandler
extends
OpenAiUnifiedChatCompletionResponseHandler
extends
OpenAiChatCompletionResponseHandler
extends
OpenAiResponseHandler

Non-streaming handler has this hierarchy:
MistralCompletionResponseHandler
extends
OpenAiChatCompletionResponseHandler
extends
OpenAiResponseHandler

Closest common ancestor is OpenAiChatCompletionResponseHandler, but streaming handler also needs to take in logic from OpenAiUnifiedChatCompletionResponseHandler.
Do we really want to introduce such duplication? It will be literally 1 to 1 because errors are the same. Not very good practice. Please correct me if i'm wrong.

I think we can then override this method in the child class and add these error code checks after calling super.checkForFailureStatusCode()

Also I'd do the call to super in the default branch of the the check in child class because superclass always throws an exception, defaulting in undefined error, while child class check can default to calling super method.

jonathan-buttner · 2025-05-29T16:25:32Z

.../org/elasticsearch/xpack/inference/services/openai/MistralChatCompletionResponseHandler.java

+import org.elasticsearch.xpack.inference.services.mistral.response.MistralErrorResponseEntity;
+
+/**
+ * Handles non-streaming chat completion responses for Mistral models, extending the OpenAI chat completion response handler.


Suggested change

* Handles non-streaming chat completion responses for Mistral models, extending the OpenAI chat completion response handler.

* Handles non-streaming completion responses for Mistral models, extending the OpenAI chat completion response handler.

Fixed. Thanks.

jonathan-buttner · 2025-05-29T16:32:28Z

...va/org/elasticsearch/xpack/inference/services/mistral/embeddings/MistralEmbeddingsModel.java

@@ -58,6 +58,20 @@ public MistralEmbeddingsModel(MistralEmbeddingsModel model, MistralEmbeddingsSer
        setPropertiesFromServiceSettings(serviceSettings);
    }

+    protected void setPropertiesFromServiceSettings(MistralEmbeddingsServiceSettings serviceSettings) {


Does this need to be protected? Can we make it private? I think we could potentially leak the this context if this class is extended.

Don't see a reason for it to be protected. Changed to private. Same as for completions.

jonathan-buttner · 2025-05-29T16:33:35Z

.../org/elasticsearch/xpack/inference/services/openai/MistralChatCompletionResponseHandler.java

+ * 2.0.
+ */
+
+package org.elasticsearch.xpack.inference.services.openai;


Let's move this class to the mistral package.

Good catch. Moved it to appropriate package.

jonathan-buttner · 2025-05-29T16:37:07Z

...sticsearch/xpack/inference/services/mistral/MistralUnifiedChatCompletionResponseHandler.java

@@ -0,0 +1,51 @@
+/*


@prwhelan any suggestions on how to intentionally encounter a midstream error while testing?

jonathan-buttner · 2025-05-29T17:06:58Z

...sticsearch/xpack/inference/services/mistral/MistralUnifiedChatCompletionResponseHandler.java

+ * Handles streaming chat completion responses and error parsing for Mistral inference endpoints.
+ * Adapts the OpenAI handler to support Mistral's simpler error schema with fields like "message" and "http_status_code".
+ */
+public class MistralUnifiedChatCompletionResponseHandler extends OpenAiUnifiedChatCompletionResponseHandler {


If the midstream errors are in the same format as openai, how about we refactor the OpenAiUnifiedChatCompletionResponseHandler so that we can replace some of the strings that reference openai specifically? I think it's just the name of the parser:

elasticsearch/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/OpenAiUnifiedChatCompletionResponseHandler.java

Lines 115 to 120 in 2830768

"open_ai_error",

true,

args -> Optional.ofNullable((OpenAiErrorResponse) args[0])

);

private static final ConstructingObjectParser<OpenAiErrorResponse, Void> ERROR_BODY_PARSER = new ConstructingObjectParser<>(

"open_ai_error",

Maybe we could extract those classes and rename them to be more generic?

Done.
But we're not positive that midstream errors are in the same format as openai. It just assumed of the fact that Mistral uses OpenAI type API

…mistral-chat-completion-integration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

…sponse entity

Jan-Kazlouski-elastic · 2025-05-29T21:34:40Z

Hi @jonathan-buttner
I addressed the comments. All fixes are done. I replied to the one comment related to the OpenAI handlers hierarchy #128538 (comment) to discuss it a bit further.

Also I updated the section related to testing. Three backticks didn't work for me when I was creating initial PR comment, but when I applied them afterwards - worked perfectly. Not sure why, but I will remember for the future PR creation.

Jan-Kazlouski-elastic · 2025-05-29T22:04:19Z

New error format:

Create Completion Endpoint

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}

Unauthorized:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"a580d263fb1521778782b22104efb415\"\n}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"a580d263fb1521778782b22104efb415\"\n}]"
        }
    },
    "status": 400
}

Invalid Model:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: wrong-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: wrong-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
        }
    },
    "status": 400
}

Perform Non-Streaming Completion

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
    },
    "status": 404
}

Perform Streaming Completion

Not Found:

event: error
data: {"error":{"root_cause":[{"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"}],"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"},"status":404}

Create Chat Completion Endpoint

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}

Unauthorized:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"409ddf538d3f1a55bfe4b7324fe01676\"\n}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"409ddf538d3f1a55bfe4b7324fe01676\"\n}]"
        }
    },
    "status": 400
}

Invalid Model:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
        }
    },
    "status": 400
}

Perform Streaming Chat Completion

Not Found:

event: error
data: {"error":{"code":"not_found","message":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]","type":"mistral_error"}}

Negative Max Tokens:

event: error
data: {"error":{"code":"unprocessable_entity","message":"Received an input validation error response for request from inference entity id [mistral-chat-completion] status [422]. Error message: [Input should be greater than or equal to 0]","type":"mistral_error"}}

Invalid Model:

event: error
data: {"error":{"code":"bad_request","message":"Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]","type":"mistral_error"}}

Add Mistral AI Chat Completion support to Inference Plugin

f7dc246

elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 27, 2025

Add changelog file

0aa8da8

github-actions bot deployed to docs-preview May 27, 2025 22:41 View deployment

Fix tests and typos

c3a8716

github-actions bot deployed to docs-preview May 27, 2025 23:37 View deployment

jonathan-buttner requested changes May 29, 2025

View reviewed changes

Jan-Kazlouski-elastic added 2 commits May 29, 2025 20:45

Merge remote-tracking branch 'refs/remotes/origin/main' into feature/…

69f16b3

…mistral-chat-completion-integration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

Refactor Mistral chat completion integration and add tests

91f8ccf

github-actions bot deployed to docs-preview May 29, 2025 19:07 View deployment

Refactor Mistral error response handling and extract StreamingErrorRe…

ff81e36

…sponse entity

github-actions bot deployed to docs-preview May 29, 2025 21:08 View deployment

		@@ -0,0 +1,5 @@
		pr: 128538
		summary: "[ML] Add Mistral Chat Completion support to the Inference Plugin"

	summary: "[ML] Add Mistral Chat Completion support to the Inference Plugin"
	summary: "Added Mistral Chat Completion support to the Inference Plugin"

	* Handles non-streaming chat completion responses for Mistral models, extending the OpenAI chat completion response handler.
	* Handles non-streaming completion responses for Mistral models, extending the OpenAI chat completion response handler.

	"open_ai_error",
	true,
	args -> Optional.ofNullable((OpenAiErrorResponse) args[0])
	);
	private static final ConstructingObjectParser<OpenAiErrorResponse, Void> ERROR_BODY_PARSER = new ConstructingObjectParser<>(
	"open_ai_error",

Add Mistral AI Chat Completion support to Inference Plugin #128538

Are you sure you want to change the base?

Add Mistral AI Chat Completion support to Inference Plugin #128538

Conversation

Jan-Kazlouski-elastic commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic commented May 29, 2025

Uh oh!

Jan-Kazlouski-elastic commented May 29, 2025

Uh oh!

Uh oh!

Jan-Kazlouski-elastic commented May 27, 2025 •

edited

Loading

Jan-Kazlouski-elastic May 29, 2025 •

edited

Loading

Jan-Kazlouski-elastic May 29, 2025 •

edited

Loading