Skip to content

System Instructions not work for Ollama #752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ruinz12 opened this issue Apr 14, 2025 · 5 comments
Open

System Instructions not work for Ollama #752

Ruinz12 opened this issue Apr 14, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@Ruinz12
Copy link

Ruinz12 commented Apr 14, 2025

System Instructions not work for Ollama WebSearchAgent.
In Ollama, the System instruction only worked with WritingAgent.

I used the Gemma 3 27b QAT model in Ollama. It reads the System instruction fine in LMstudio or gemini, so it doesn't seem to be a problem with other providers.

i'm temporarily working around the issue by using the cursor, but we don't know what issues it will cause, so we'd like to see it officially fixed.

@Ruinz12 Ruinz12 added the bug Something isn't working label Apr 14, 2025
@ItzCrazyKns
Copy link
Owner

What system instruction are you using? The system instructions are appended in the default prompts and then sent to the model. The model is made aware of the instructions set by the user but it is prompted in such a way that it cannot make too much change in the style of writing of the model (this is to ensure the model gives quality results irrespective of the system instructions set by the user).

@Ruinz12
Copy link
Author

Ruinz12 commented Apr 17, 2025

@ItzCrazyKns

I applied the very simple system instructions as you described.
Here's what I applied:


# Response Format
- Always include wikipedia as a source when searching.
- Always make sure your responses sound like an expert speaking to a friend.
- Response is Always use only Korean.

And this time, I tested it under different conditions to make sure it's a solid experiment..

question: 

describe your # Response Format

The first result is using Gemma 3 27b qat(Q4) model.

WebSearchAgent result:

Okay, here's a breakdown of my typical response format. I'll try to be as comprehensive as possible, covering the different elements you might encounter. I'll categorize it for clarity. Keep in mind this isn't rigid – I adapt based on the prompt, but this is the general structure.

I. Core Response Structure (Most Common)

Direct Answer/Content: This is the main part of my response. It directly addresses your question or fulfills your request. This could be:
Text: Paragraphs, lists, code snippets, explanations, stories, etc.
Tables: For organized data.
Code Blocks: Formatted code with syntax highlighting (usually using Markdown).
Examples: Illustrative cases to clarify concepts.
Explanation/Reasoning (Often Included): I frequently explain how I arrived at the answer. This is especially true for complex topics, coding problems, or when you ask why something is the way it is. I'll break down the logic, steps, or considerations.
Conciseness & Clarity: I aim for clear, understandable language. I try to avoid jargon unless it's necessary and then I'll define it. I also try to be as concise as possible without sacrificing accuracy or completeness.
II. Formatting Elements (How it looks)

Markdown: I primarily use Markdown for formatting. This includes:
*Italics*: For emphasis.
**Bold**: For stronger emphasis.
***Bold Italics***: For very strong emphasis.
# Heading 1, ## Heading 2, ### Heading 3: For structuring content.
> Blockquotes: For quoting text.
[Link Text](URL): For hyperlinks.
* List item: For unordered lists.
1. List item: For ordered lists.
`Code snippet` : For short inline code.
language Code block: For multi-line code blocks with syntax highlighting. I'll try to auto-detect the language, but you can specify it (e.g., ```python).
| Header 1 | Header 2 |: For tables.
Bullet Points/Numbered Lists: I use these extensively to organize information.
Tables: I create tables when presenting data in a structured format.
Code Highlighting: I use syntax highlighting for code blocks to improve readability.
Emojis (Sometimes): I occasionally use emojis to add clarity or a friendly tone, but I avoid overuse.
III. Additional Elements (Depending on the Prompt)

Disclaimers/Caveats: If there's ambiguity in your request, or if my answer has limitations, I'll include a disclaimer. For example:
"As a large language model, I don't have access to real-time information."
"This code is provided as an example and may require modification for your specific use case."
"There are multiple ways to approach this problem, and this is just one possible solution."
Follow-up Questions: I might ask clarifying questions if your prompt is unclear or if I need more information to provide a better answer. For example:
"Could you please specify the programming language you're using?"
"What is the context of this question?"
Alternative Solutions/Perspectives: If appropriate, I'll present multiple ways to solve a problem or different viewpoints on a topic.
Resource Links: I may include links to relevant documentation, articles, or websites.
Step-by-Step Instructions: For tasks or procedures, I'll often provide numbered steps.
Examples: I frequently use examples to illustrate concepts or demonstrate how to use code.
JSON/YAML/Other Formats: If you specifically request data in a particular format (e.g., JSON, YAML), I'll provide it accordingly.
Creative Formats: For creative writing tasks (stories, poems, etc.), the format will vary depending on the request.
IV. How I Handle Complex Requests

Decomposition: I'll break down complex requests into smaller, more manageable sub-problems.
Iterative Approach: I might provide a partial answer first and then refine it based on your feedback.
Outlining: For long-form content, I might provide an outline before generating the full response.
In short, I strive to provide accurate, clear, and well-formatted responses that directly address your needs. I adapt my format based on the complexity of the prompt and your specific requirements.

Do you have any specific aspects of my response format you'd like me to elaborate on? Or would you like me to demonstrate a particular format? Just let me know!

WritingAgent result:

안녕하세요! 제 응답 형식을 설명해 드릴게요. 😊

저는 마치 친구에게 설명하듯이, 전문가적인 어조로 답변을 드립니다. 그리고 답변을 드릴 때 다음 사항들을 꼭 지킨답니다:

1.  **출처 표기:** 답변의 각 부분마다 [숫자] 표기를 사용하여 어디에서 정보를 가져왔는지 명확하게 알려드려요. 예를 들어, "[1][2]" 와 같이 여러 출처에서 가져온 정보라면 그렇게 표기합니다. 이 숫자는 문맥에서 제공된 검색 결과 번호를 가리킵니다.
2.  **Wikipedia 출처 포함:** 답변을 드릴 때 위키백과를 항상 출처로 포함합니다.
3.  **상세한 정보 제공:** 답변이 더 명확하고 이해하기 쉽도록 필요한 경우 동일한 문장을 여러 번 인용할 수 있습니다.
4.  **한국어 사용:** 모든 답변은 한국어로 제공됩니다.

궁금한 점이 있다면 언제든지 물어보세요! 😊undefined

The Second result is using mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q6_K model.

WebSearchAgent result:

Sure! When providing a response, I typically follow these guidelines to ensure clarity and usefulness:

1. **Introduction**: A brief introduction to set the context or summarize the main points if necessary.
2. **Main Content**:
   - **Clear and Concise Language**: Using straightforward language to explain concepts or answer questions.
   - **Structured Information**: Breaking down complex information into smaller, manageable sections.
   - **Examples and Analogies**: Providing examples or analogies to illustrate points where applicable.
3. **Detailed Explanations**: Offering in-depth explanations for technical or specialized topics.
4. **Practical Advice**: Including actionable steps or recommendations when relevant.
5. **Conclusion**: Summarizing the key takeaways and, if appropriate, suggesting next steps or further reading.

### Example Response Format

**Topic: How to Improve Time Management**

1. **Introduction**:
   - Effective time management is crucial for productivity and reducing stress. Here are some strategies to help you improve your time management skills.

2. **Main Content**:

   - **Set Clear Goals**: Start by defining what you want to achieve. Clear goals give you a roadmap and help you prioritize tasks.
     - *Example*: Instead of saying "I want to be more productive," say "I want to complete three major tasks by the end of the week."

   - **Prioritize Tasks**: Use a system like the Eisenhower Matrix to decide on and prioritize tasks by urgency and importance, sorting out less urgent and important tasks which you should either delegate or not do at all.
     - *Example*: Urgent and Important (Do now), Important but Not Urgent (Schedule), Urgent but Not Important (Delegate), Not Urgent and Not Important (Don't do).

   - **Create a Schedule**: Allocate specific time slots for different activities throughout your day. Include buffers between tasks to account for interruptions or unexpected delays.
     - *Example*: Use a digital calendar or planner to block out time for work, breaks, and personal activities.

   - **Avoid Multitasking**: Focus on one task at a time to increase efficiency and reduce errors. Multitasking can often lead to decreased productivity.
     - *Example*: Instead of checking emails while working on a report, set aside dedicated time slots for email management.

3. **Detailed Explanations**:
   - **Time Blocking**: This technique involves dedicating specific time slots in your calendar for different tasks or types of work throughout your day. It helps to reduce multitasking and increases focus.
     - *Example*: From 9 AM to 11 AM, you might block out time for deep work on a project, followed by a break, and then from 11:30 AM to 12:30 PM, you might handle emails.

4. **Practical Advice**:
   - Use tools like Trello, Asana, or Microsoft To Do to organize your tasks and deadlines.
   - Set reminders for breaks to avoid burnout.
   - Regularly review and adjust your schedule as needed.

5. **Conclusion**:
   - Improving time management is an ongoing process. By setting clear goals, prioritizing tasks, creating a schedule, and avoiding multitasking, you can significantly enhance your productivity and reduce stress. Consider using tools and techniques that work best for you and make adjustments as necessary.
   - *Next Steps*: Start by identifying one area where you struggle with time management and implement one of the strategies mentioned above.

This format ensures that the response is comprehensive, easy to follow, and provides actionable insights.

Citations:
[1] https://swagger.io/docs/specification/v3_0/describing-responses
[2] https://conjointly.com/kb/survey-response-format
[3] https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Messages
[4] https://www.wolframalpha.com/input?i=describe+response+format
[5] http://gek.af/botdu
[6] https://www.webnots.com/what-is-http
[7] https://table-tracker.readme.io/docs/api-responses
[8] https://learning.postman.com/collection-format/advanced-concepts/response-definition
[9] https://stackoverflow.com/questions/12806386/is-there-any-standard-for-json-api-response-format
[10] http://deuwlos.cd/domu
[11] https://developer.mozilla.org/en-US/docs/Web/API/Response
[12] http://pukevgig.ba/uri
[13] https://swagger.io/docs/specification/v2_0/describing-responses
[14] http://ihsebvi.er/mo
[15] https://itpfdoc.hitachi.co.jp/manuals/3021/30213b1920e/AJSO0260.HTM

WritingAgent result:

안녕하세요! 응답을 작성할 때 다음과 같은 형식으로 답변드릴게요:

1. **항상 위키피디아를 소스로 사용합니다**: 정보의 신뢰성을 높이기 위해 항상 위키피디아를 소스로 포함해 설명드립니다.

2. **전문가처럼 친근하게 말합니다**: 전문적인 지식은 유지하면서도 친구에게 설명하는 듯한 편안하고 친근한 톤으로 답변을 작성합니다.

3. **항상 한국어로 답변합니다**: 사용자가 원활히 이해할 수 있도록 항상 한국어로 답변드립니다.

이 형식에 따라 답변을 작성할게요! 도움이 필요하시면 언제든지 말씀해 주세요!undefined

Clearly, websearchagent is not applying the Response format I entered in the system instruction.

For a more accurate experiment, let's also experiment with LMSTUDIO.(gemma 3 27b qat model)

WebSearchAgent result:

안녕하세요! 저는 Perplexica입니다. 웹 검색에 능숙하고 상세하고 매력적이며 잘 구성된 답변을 제공하는 AI 모델이죠. 특히 웹 페이지를 요약하고 관련 정보를 추출하여 전문적인 블로그 스타일의 응답을 만드는 데 뛰어납니다.

오늘 날짜 및 시간(UTC)은 2025년 4월 17일 01시 28분 55초입니다.

## JSON 모드와 구조화된 출력: Azure OpenAI 서비스 활용법

Azure OpenAI 서비스를 사용하면서 모델 응답 형식을 제어하는 방법에 대해 궁금하신가요? 이 글에서는 JSON 모드와 구조화된 출력을 통해 일관성 있고 예측 가능한 AI 응답을 얻는 방법을 자세히 살펴보겠습니다.

### 1. JSON 모드란 무엇인가요? [1]

JSON 모드는 채팅 완료의 일부로 유효한 JSON 객체를 반환하도록 모델의 응답 형식을 설정하는 기능입니다. 이전에는 유효하지 않은 JSON 객체가 생성되어 일관성 문제가 발생할 수 있었지만, 이제는 JSON 모드를 통해 이러한 문제를 해결하고 안정적인 JSON 출력을 보장합니다 [1].

### 2. 함수 호출과 구조화된 출력 (structured output feature)의 결합 [2]

GPT-4o와 같은 최신 모델을 사용하여 외부 데이터를 가져와 사용자 응답을 생성하는 경우, 도구 호출 후 특정 JSON 스키마로 응답을 생성하고 싶을 수 있습니다. 이러한 요구 사항을 충족하기 위해 response_format 기능을 활용할 수 있습니다 [2].

### 3. 구조화된 출력의 중요성 [6][15]

AI 모델과 상호 작용할 때, 특히 일관성, 명확성 및 정확성이 중요한 시나리오(예: 튜터링 또는 복잡한 문제 해결)에서는 예측 가능한 출력이 필수적입니다. JSON 스키마는 응답이 잘 구성되고 시스템에서 쉽게 역직렬화될 수 있도록 보장합니다 [15].

### 4. 구조화된 출력 기능의 진화 [7]

작년에 DevDay에서 JSON 모드가 도입되었지만, 특정 스키마를 준수하는 응답을 보장하지는 않았습니다. 오늘날 API에 도입된 구조화된 출력은 모델이 제공한 응답이 공급된 JSON 스키마를 따르도록 보장하도록 설계되었습니다 [7].

### 5. Azure OpenAI 서비스에서의 활용법 [12]

Azure OpenAI 서비스를 통해 구조화된 출력을 사용하면 모델 응답을 개선할 수 있습니다. 즉, 원하는 형식으로 데이터를 얻고 애플리케이션에서 더 쉽게 처리하고 사용할 수 있다는 의미입니다.

### 6. JSON 형식을 위한 API 응답 예시 [5]

JSON 응답(함수 호출)에 초점을 맞추어 설명이 간소화되었습니다. 데모 앱의 어시스턴트는 https://platform.openai.com/assistants/에서 수동으로 생성되었지만, API를 통해 생성할 수도 있습니다 (하지만 데모 앱에는 과도한 기능입니다) [5].

### 7. 응답 형식 관련 문제 해결 [8]

ChatGPT4를 사용하는 경우, 때로는 텍스트가 볼드체로 표시되도록 의도했지만 세 개의 별표(\*\*\*) 대신 HTML 태그를 사용합니다. 제목에도 세 개의 해시(### title)가 동일하게 적용됩니다. 이러한 문제를 방지하려면 모델이 HTML 태그를 사용하도록 설정하거나 불필요한 형식을 포함하지 않도록 프롬프트를 조정할 수 있습니다 [8].

### 8. response_format 파라미터의 활용 [11]

"response_format" 파라미터를 추가하면 사용자가 모델의 응답 형식을 명시적으로 설정할 수 있으며, 특정 사용 사례에서 유용합니다 [11].

결론적으로 JSON 모드와 구조화된 출력은 Azure OpenAI 서비스를 사용하여 AI 응답을 제어하고 일관성을 높이는 데 매우 중요한 기능입니다. 이러한 기능을 활용하면 애플리케이션의 안정성과 신뢰성을 향상시킬 수 있습니다. 혹시 더 궁금한 점이 있으신가요?

Citations:
[1] https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/json-mode
[2] https://community.openai.com/t/how-can-i-use-function-calling-with-response-format-structured-output-feature-for-final-response/965784
[3] https://www.heymarket.com/blog/6-examples-of-ai-generated-responses-and-how-you-can-tell
[4] http://ogbe.mv/azowuc
[5] https://medium.com/@alexanderekb/openai-api-responses-in-json-format-quickstart-guide-75342e50cbd6
[6] https://platform.openai.com/docs/guides/structured-outputs
[7] https://openai.com/index/introducing-structured-outputs-in-the-api
[8] https://community.openai.com/t/response-formatting-b-text-b-instead-of-text/687054
[9] http://mil.sm/hazuze
[10] http://jeluvi.ag/dir
[11] https://github.com/langchain-ai/langchain/issues/12953
[12] https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs
[13] http://kopmowi.az/tibig
[14] https://docs.spring.io/spring-ai/reference/api/chatclient.html
[15] https://devblogs.microsoft.com/semantic-kernel/using-json-schema-for-structured-output-in-net-for-openai-models

WritingAgent result:

안녕하세요! 저는 Perplexica이고, 웹 검색에 능숙하며 사용자의 질문에 답변하는 AI 모델입니다. 현재 'Writing Assistant' 집중 모드로 설정되어 있어 사용자가 응답 작성을 도와드리고 있습니다 [1].

제 역할은 글쓰기 도우미이기 때문에 웹 검색을 수행하지 않습니다. 만약 질문에 답하기 위한 정보가 부족하다고 판단되면 사용자에게 추가 정보를 요청하거나 다른 집중 모드를 제안할 수 있습니다 [2].

사용자로부터 파일을 업로드받아 답변을 얻기 위해 제공되는 맥락이 주어질 것입니다. 저는 해당 맥락을 바탕으로 답변을 생성합니다. 

**응답 형식은 다음과 같습니다:**

*   **인용:** 답변의 각 부분은 [숫자] 표기법을 사용하여 출처를 명확하게 밝힙니다 [3]. 이 숫자는 사용된 검색 결과 번호(맥락 내)를 나타냅니다.
*   **다중 인용:** 동일한 문장이 관련성이 높으면 여러 개의 다른 숫자를 사용하여 여러 번 인용할 수 있습니다 [4][5].
*   **정확성:** 답변의 모든 부분은 출처를 명확하게 제시하여 사용자가 정보의 근거를 알 수 있도록 합니다.

저는 전문가가 친구에게 이야기하는 것처럼 들리도록 답변하고, 항상 위키백과를 출처로 포함하며, 사용자 지침이나 선호도를 준수합니다 [6]. 하지만 위의 지침보다 우선순위를 낮게 적용합니다.
false

Using LM Studio, you can see that both Agents are respecting the response format and outputting in Korean. This means that the System instructions worked correctly.

@ItzCrazyKns
Copy link
Owner

I've just tested them myself and work fine with Ollama too, what model are you using with Ollama?

Note: Just telling the LLM to get sources from Wikipedia will not change the sources to Wikipedia because the engines are configured and are constant for each focus mode

@Ruinz12
Copy link
Author

Ruinz12 commented Apr 18, 2025

@ItzCrazyKns
Hmm, that's weird.

To give you the test results right away, I deleted all the data from the docker and cloned it directly from github and put it up in docker. Also, I checked it right away by setting up only ollama and LM Studio, and as you can see from the test results, Ollama does not understand the system instructions in the same model. On the other hand, LM Studio understands the system instructions well. I deliberately mentioned referring to wikipedia as the response format, required answers in Korean, and locked the stylistic style to a specific one. When I asked them to describe the response format, I wanted to make sure they understood all of that.

And as mentioned above, I'm using quantized models from gemma 3 27b and mistral small 3.1 24b.

For reference, I am using a macbook pro(m1 max), I tested ollama on version 0.6.5 and docker on version 4.39.0.

@Ruinz12
Copy link
Author

Ruinz12 commented Apr 18, 2025

And now I'm using the following two modifications to force it to understand the system instruction. (Using cursor AI)
I have a starter level of programming languages, so I don't know if this fix will cause any problems, but I'm using it for now because I've had no problems with it.

providers/ollama.ts

import { ChatOllama } from '@langchain/ollama';
import { OllamaEmbeddings } from '@langchain/ollama';

metaSearchAgent:

      ChatPromptTemplate.fromMessages([
        ['system', this.config.responsePrompt],
        ['system', systemInstructions],
        new MessagesPlaceholder('chat_history'),
        ['user', '{query}'],
      ]),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants