gpuFillInfo not implemented on darwin #5465

ImLunaHey · 2025-05-26T10:51:30Z

LocalAI version:

v2.29.0 (fd17a33)

Environment, CPU architecture, OS, and Version:

Darwin Alexiss-Mac-mini.local 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:22:58 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T8132 arm64

Describe the bug

when trying to use chat models i get an error.

To Reproduce

install localAI on macos
add smolvlm-256m-instruct model
try to chat with model

Expected behavior

no error

Logs

11:48AM ERR guessDefaultsFromFile(TotalAvailableVRAM): gpuFillInfo not implemented on darwin

Additional context

N/A

The text was updated successfully, but these errors were encountered:

mudler · 2025-05-26T12:46:13Z

thanks for opening the issue - however that error shouldn't be really fatal. is the inference working? if not, can you paste the output when running with --debug flag?

ImLunaHey · 2025-05-26T13:09:12Z

nothing seems to be working. not sure if it's related to above error or something else its hitting.

here's a clean run. i deleted the configuration and models directory and started again.

xo@Alexiss-Mac-mini Downloads % local-ai                 
2:06PM INF Setting logging to info
2:06PM INF Starting LocalAI using 10 threads, with models path: /Users/xo/Downloads/models
2:06PM INF LocalAI version: v2.29.0 (fd17a3312c4c1f5688152eff227e27d9b7bce365)
2:06PM INF Preloading models from /Users/xo/Downloads/models
2:06PM INF core/startup process completed!
2:06PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
^C%                                                                                                                                                       xo@Alexiss-Mac-mini Downloads % local-ai --debug
2:06PM DBG Setting logging to debug
2:06PM INF Starting LocalAI using 10 threads, with models path: /Users/xo/Downloads/models
2:06PM INF LocalAI version: v2.29.0 (fd17a3312c4c1f5688152eff227e27d9b7bce365)
2:06PM INF Preloading models from /Users/xo/Downloads/models
2:06PM DBG Extracting backend assets files to /tmp/localai/backend_data
2:06PM DBG processing api keys runtime update
2:06PM DBG processing external_backends.json
2:06PM DBG external backends loaded from external_backends.json
2:06PM INF core/startup process completed!
2:06PM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json
2:06PM DBG No configuration file found at /tmp/localai/config/assistants.json
2:06PM DBG No configuration file found at /tmp/localai/config/assistantsFile.json
2:06PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
2:06PM INF Success ip=192.168.0.108 latency=4.313334ms method=GET status=200 url=/
2:06PM INF Success ip=192.168.0.108 latency="40.375µs" method=GET status=200 url=/static/assets/highlightjs.css
2:06PM INF Success ip=192.168.0.108 latency="9.667µs" method=GET status=200 url=/static/general.css
2:06PM INF Success ip=192.168.0.108 latency="510.584µs" method=GET status=200 url=/static/assets/font1.css
2:06PM INF Success ip=192.168.0.108 latency="12.708µs" method=GET status=200 url=/static/assets/font2.css
2:06PM INF Success ip=192.168.0.108 latency="15.791µs" method=GET status=200 url=/static/assets/tailwindcss.js
2:06PM INF Success ip=192.168.0.108 latency="14.166µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
2:06PM INF Success ip=192.168.0.108 latency="13.375µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
2:06PM INF Success ip=192.168.0.108 latency="17.542µs" method=GET status=200 url=/static/assets/tw-elements.css
2:06PM INF Success ip=192.168.0.108 latency="9.292µs" method=GET status=200 url=/static/assets/flowbite.min.js
2:06PM INF Success ip=192.168.0.108 latency="29.584µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
2:06PM INF Success ip=192.168.0.108 latency="9.791µs" method=GET status=200 url=/static/assets/htmx.js
2:06PM INF Success ip=192.168.0.108 latency="7.292µs" method=GET status=200 url=/static/assets/highlightjs.js
2:06PM INF Success ip=192.168.0.108 latency="24.75µs" method=GET status=200 url=/static/assets/alpine.js
2:06PM INF Success ip=192.168.0.108 latency="24.375µs" method=GET status=200 url=/static/assets/marked.js
2:06PM INF Success ip=192.168.0.108 latency="8µs" method=GET status=200 url=/static/assets/tw-elements.js
2:06PM INF Success ip=192.168.0.108 latency="26.416µs" method=GET status=200 url=/static/assets/purify.js
2:06PM INF Success ip=192.168.0.108 latency="14.458µs" method=GET status=200 url=/static/logo_horizontal.png
2:06PM INF Success ip=192.168.0.108 latency="30.875µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-solid-900.woff2
2:06PM INF Success ip=192.168.0.108 latency=3.698042ms method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuLyfMZg.ttf
2:06PM INF Success ip=192.168.0.108 latency="34.958µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-brands-400.woff2
2:06PM INF Success ip=192.168.0.108 latency="17.292µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuFuYMZg.ttf
2:06PM INF Success ip=192.168.0.108 latency="31.833µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuGKYMZg.ttf
2:06PM INF Success ip=192.168.0.108 latency="25.708µs" method=GET status=200 url=/static/favicon.svg
2:06PM INF Success ip=192.168.0.108 latency=512.776792ms method=GET status=200 url=/browse/
2:06PM INF Success ip=192.168.0.108 latency="10.583µs" method=GET status=200 url=/static/assets/highlightjs.js
2:06PM INF Success ip=192.168.0.108 latency="27.791µs" method=GET status=200 url=/static/assets/alpine.js
2:06PM INF Success ip=192.168.0.108 latency="4.959µs" method=GET status=200 url=/static/assets/tailwindcss.js
2:06PM INF Success ip=192.168.0.108 latency="5.875µs" method=GET status=200 url=/static/assets/marked.js
2:06PM INF Success ip=192.168.0.108 latency="39.292µs" method=GET status=200 url=/static/assets/purify.js
2:06PM INF Success ip=192.168.0.108 latency="4.375µs" method=GET status=200 url=/static/assets/htmx.js
2:06PM INF Success ip=192.168.0.108 latency="2.042µs" method=GET status=200 url=/static/assets/flowbite.min.js
2:06PM INF Success ip=192.168.0.108 latency="15.666µs" method=GET status=200 url=/static/assets/tw-elements.js
2:06PM INF Success ip=192.168.0.108 latency="2.459µs" method=GET status=200 url=/static/general.css
2:06PM INF Success ip=192.168.0.108 latency="2.958µs" method=GET status=200 url=/static/assets/highlightjs.css
2:06PM INF Success ip=192.168.0.108 latency="3µs" method=GET status=200 url=/static/assets/font2.css
2:06PM INF Success ip=192.168.0.108 latency="2.125µs" method=GET status=200 url=/static/assets/font1.css
2:06PM INF Success ip=192.168.0.108 latency="2.042µs" method=GET status=200 url=/static/assets/tw-elements.css
2:06PM INF Success ip=192.168.0.108 latency="2.875µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
2:06PM INF Success ip=192.168.0.108 latency="2.834µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
2:06PM INF Success ip=192.168.0.108 latency="10.709µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
2:06PM INF Success ip=192.168.0.108 latency="2.292µs" method=GET status=200 url=/static/logo_horizontal.png
2:06PM DBG UI job submitted to install  : localai@smolvlm-256m-instruct

2:06PM INF Success ip=192.168.0.108 latency=1.363208ms method=POST status=200 url=/browse/install/model/localai@smolvlm-256m-instruct
2:06PM DBG Config overrides map[mmproj:mmproj-SmolVLM-256M-Instruct-Q8_0.gguf parameters:map[model:SmolVLM-256M-Instruct-Q8_0.gguf]]
2:06PM DBG Checking "mmproj-SmolVLM-256M-Instruct-Q8_0.gguf" exists and matches SHA
2:06PM INF Downloading "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/mmproj-SmolVLM-256M-Instruct-Q8_0.gguf"
2:06PM INF Success ip=192.168.0.108 latency="450.833µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency="19.333µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency="14.083µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency="17.958µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF File "/Users/xo/Downloads/models/mmproj-SmolVLM-256M-Instruct-Q8_0.gguf" downloaded and verified
2:06PM DBG Checking "SmolVLM-256M-Instruct-Q8_0.gguf" exists and matches SHA
2:06PM INF Success ip=192.168.0.108 latency="9.834µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Downloading "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/SmolVLM-256M-Instruct-Q8_0.gguf"
2:06PM INF Success ip=192.168.0.108 latency="11.75µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency="19.834µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency="14.042µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Downloading /Users/xo/Downloads/models/SmolVLM-256M-Instruct-Q8_0.gguf.partial: 74.1 MiB/166.9 MiB (22.20%) ETA: 17.518772145s
2:06PM INF Success ip=192.168.0.108 latency="12.375µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency="8.209µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency="8.75µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF File "/Users/xo/Downloads/models/SmolVLM-256M-Instruct-Q8_0.gguf" downloaded and verified
2:06PM DBG Written config file /Users/xo/Downloads/models/smolvlm-256m-instruct.yaml
2:06PM DBG Written gallery file /Users/xo/Downloads/models/._gallery_smolvlm-256m-instruct.yaml
2:06PM DBG guessDefaultsFromFile: modelPath is empty
2:06PM INF Preloading models from /Users/xo/Downloads/models

  Model name: smolvlm-256m-instruct                                           


2:06PM INF Success ip=192.168.0.108 latency="10.167µs" method=GET status=200 url=/browse/job/progress/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM DBG no processing model found for job : 469656dc-3a32-11f0-bfd5-5aa3f0aee386

2:06PM DBG JOB finished  : &{Deletion:false FileName: Error:<nil> Processed:true Message:completed Progress:100 TotalFileSize: DownloadedFileSize: GalleryModelName:localai@smolvlm-256m-instruct}

2:06PM INF Success ip=192.168.0.108 latency="38.375µs" method=GET status=200 url=/browse/job/469656dc-3a32-11f0-bfd5-5aa3f0aee386
2:06PM INF Success ip=192.168.0.108 latency=2.219417ms method=GET status=200 url=/chat/
2:06PM INF Success ip=192.168.0.108 latency="27.584µs" method=GET status=200 url=/static/assets/highlightjs.js
2:06PM INF Success ip=192.168.0.108 latency="11.667µs" method=GET status=200 url=/static/assets/tailwindcss.js
2:06PM INF Success ip=192.168.0.108 latency="17.333µs" method=GET status=200 url=/static/assets/alpine.js
2:06PM INF Success ip=192.168.0.108 latency="6.375µs" method=GET status=200 url=/static/assets/purify.js
2:06PM INF Success ip=192.168.0.108 latency="32.5µs" method=GET status=200 url=/static/assets/marked.js
2:06PM INF Success ip=192.168.0.108 latency="9.833µs" method=GET status=200 url=/static/assets/flowbite.min.js
2:06PM INF Success ip=192.168.0.108 latency="19µs" method=GET status=200 url=/static/assets/htmx.js
2:06PM INF Success ip=192.168.0.108 latency="35.291µs" method=GET status=200 url=/static/assets/highlightjs.css
2:06PM INF Success ip=192.168.0.108 latency="23.291µs" method=GET status=200 url=/static/general.css
2:06PM INF Success ip=192.168.0.108 latency="56.792µs" method=GET status=200 url=/static/assets/font1.css
2:06PM INF Success ip=192.168.0.108 latency="101.875µs" method=GET status=200 url=/static/assets/font2.css
2:06PM INF Success ip=192.168.0.108 latency="15.625µs" method=GET status=200 url=/static/assets/tw-elements.css
2:06PM INF Success ip=192.168.0.108 latency="31.709µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
2:06PM INF Success ip=192.168.0.108 latency="11.75µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
2:06PM INF Success ip=192.168.0.108 latency="35.291µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
2:06PM INF Success ip=192.168.0.108 latency="22.208µs" method=GET status=200 url=/static/chat.js
2:06PM INF Success ip=192.168.0.108 latency="28.875µs" method=GET status=200 url=/static/logo_horizontal.png
2:06PM DBG context local model name not found, setting to the first model first model name=smolvlm-256m-instruct
2:06PM ERR guessDefaultsFromFile(TotalAvailableVRAM): gpuFillInfo not implemented on darwin
2:06PM DBG guessDefaultsFromFile: NGPULayers set NGPULayers=99999999
2:06PM DBG guessDefaultsFromFile: template already set name=smolvlm-256m-instruct
2:06PM DBG Chat endpoint configuration read: &{PredictionOptions:{BasicModelRequest:{Model:SmolVLM-256M-Instruct-Q8_0.gguf} Language: Translate:false N:0 TopP:0x1400f20bd90 TopK:0x1400f20bd98 Temperature:0x1400f20bda0 Maxtokens:0x1400f20bdd0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0x1400f20bdc8 TypicalP:0x1400f20bdc0 Seed:0x1400f20bde0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:smolvlm-256m-instruct F16:0x1400f20bc10 Threads:0x1400f20bd80 Debug:0x1400f33bb30 Roles:map[] Embeddings:0x1400f20bdd9 Backend: TemplateConfig:{Chat:<|im_start|>
{{.Input -}}
Assistant:  ChatMessage:{{if eq .RoleName "assistant"}}Assistant{{else if eq .RoleName "system"}}System{{else if eq .RoleName "user"}}User{{end}}: {{.Content }}<end_of_utterance>
 Completion:{{-.Input}}
 Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_ANY FLAG_CHAT] KnownUsecases:<nil> PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0x1400f20bdb8 MirostatTAU:0x1400f20bdb0 Mirostat:0x1400f20bda8 NGPULayers:0x140100bd3a0 MMap:0x1400f20bc11 MMlock:0x1400f20bdd9 LowVRAM:0x1400f20bdd9 Grammar: StopWords:[<|im_end|> <dummy32000> </s> <| <end_of_utterance> <|endoftext|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0x140100bd340 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj:mmproj-SmolVLM-256M-Instruct-Q8_0.gguf FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[]}
2:06PM DBG Parameters: &{PredictionOptions:{BasicModelRequest:{Model:SmolVLM-256M-Instruct-Q8_0.gguf} Language: Translate:false N:0 TopP:0x1400f20bd90 TopK:0x1400f20bd98 Temperature:0x1400f20bda0 Maxtokens:0x1400f20bdd0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0x1400f20bdc8 TypicalP:0x1400f20bdc0 Seed:0x1400f20bde0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:smolvlm-256m-instruct F16:0x1400f20bc10 Threads:0x1400f20bd80 Debug:0x1400f33bb30 Roles:map[] Embeddings:0x1400f20bdd9 Backend: TemplateConfig:{Chat:<|im_start|>
{{.Input -}}
Assistant:  ChatMessage:{{if eq .RoleName "assistant"}}Assistant{{else if eq .RoleName "system"}}System{{else if eq .RoleName "user"}}User{{end}}: {{.Content }}<end_of_utterance>
 Completion:{{-.Input}}
 Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_ANY FLAG_CHAT] KnownUsecases:<nil> PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0x1400f20bdb8 MirostatTAU:0x1400f20bdb0 Mirostat:0x1400f20bda8 NGPULayers:0x140100bd3a0 MMap:0x1400f20bc11 MMlock:0x1400f20bdd9 LowVRAM:0x1400f20bdd9 Grammar: StopWords:[<|im_end|> <dummy32000> </s> <| <end_of_utterance> <|endoftext|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0x140100bd340 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj:mmproj-SmolVLM-256M-Instruct-Q8_0.gguf FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[]}
2:06PM DBG templated message for chat: User: hi<end_of_utterance>

2:06PM DBG Prompt (before templating): User: hi<end_of_utterance>

2:06PM DBG Template found, input modified to: <|im_start|>
User: hi<end_of_utterance>
Assistant: 
2:06PM DBG Prompt (after templating): <|im_start|>
User: hi<end_of_utterance>
Assistant: 
2:06PM DBG Stream request received
2:06PM INF Success ip=192.168.0.108 latency=126.236959ms method=POST status=200 url=/v1/chat/completions
2:06PM DBG Sending chunk: {"created":1748264817,"object":"chat.completion.chunk","id":"5be94af6-5fc1-4edb-8c9f-66abb763138d","model":"smolvlm-256m-instruct","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

2:06PM DBG Loading from the following backends (in order): [llama-cpp llama-cpp-fallback whisper piper silero-vad huggingface]
2:06PM INF Trying to load the model 'smolvlm-256m-instruct' with the backend '[llama-cpp llama-cpp-fallback whisper piper silero-vad huggingface]'
2:06PM INF [llama-cpp] Attempting to load
2:06PM INF BackendLoader starting backend=llama-cpp modelID=smolvlm-256m-instruct o.model=SmolVLM-256M-Instruct-Q8_0.gguf
2:06PM DBG Loading model in memory from file: /Users/xo/Downloads/models/SmolVLM-256M-Instruct-Q8_0.gguf
2:06PM DBG Loading Model smolvlm-256m-instruct with gRPC (file: /Users/xo/Downloads/models/SmolVLM-256M-Instruct-Q8_0.gguf) (backend: llama-cpp): {backendString:llama-cpp model:SmolVLM-256M-Instruct-Q8_0.gguf modelID:smolvlm-256m-instruct assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x1400f290588 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
2:06PM DBG [llama-cpp-fallback] llama-cpp variant available
2:06PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
2:06PM DBG GRPC Service for smolvlm-256m-instruct will be running at: '127.0.0.1:49201'
2:06PM DBG GRPC Service state dir: /var/folders/2y/qgy1xk9s3q19_jn5_dkz8rxh0000gn/T/go-processmanager3678604483
2:06PM DBG GRPC Service Started
2:06PM DBG Wait for the service to start up
2:06PM DBG Options: ContextSize:8192 Seed:1683608517 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:10 MMProj:"mmproj-SmolVLM-256M-Instruct-Q8_0.gguf"
2:07PM DBG GRPC(smolvlm-256m-instruct-127.0.0.1:49201): stderr dyld[2252]: Library not loaded: @rpath/libutf8_validity.dylib
2:07PM DBG GRPC(smolvlm-256m-instruct-127.0.0.1:49201): stderr   Referenced from: <4DBDA67D-9E21-3E84-8410-4EF66B6E235D> /private/tmp/localai/backend_data/backend-assets/lib/libprotobuf.29.3.0.dylib
2:07PM DBG GRPC(smolvlm-256m-instruct-127.0.0.1:49201): stderr   Reason: tried: '/private/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file), '/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file)
2:07PM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:49201: connect: connection refused\""
2:07PM DBG GRPC Service NOT ready
2:07PM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: grpc service not ready
2:07PM DBG Loading model in memory from file: /Users/xo/Downloads/models/SmolVLM-256M-Instruct-Q8_0.gguf
2:07PM DBG Loading Model smolvlm-256m-instruct with gRPC (file: /Users/xo/Downloads/models/SmolVLM-256M-Instruct-Q8_0.gguf) (backend: llama-cpp-fallback): {backendString:llama-cpp model:SmolVLM-256M-Instruct-Q8_0.gguf modelID:smolvlm-256m-instruct assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x1400f290588 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
2:07PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
2:07PM DBG GRPC Service for smolvlm-256m-instruct will be running at: '127.0.0.1:49222'
2:07PM DBG GRPC Service state dir: /var/folders/2y/qgy1xk9s3q19_jn5_dkz8rxh0000gn/T/go-processmanager973037600
2:07PM DBG GRPC Service Started
2:07PM DBG Wait for the service to start up
2:07PM DBG Options: ContextSize:8192 Seed:1683608517 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:10 MMProj:"mmproj-SmolVLM-256M-Instruct-Q8_0.gguf"
2:07PM DBG GRPC(smolvlm-256m-instruct-127.0.0.1:49222): stderr dyld[2259]: Library not loaded: @rpath/libutf8_validity.dylib
2:07PM DBG GRPC(smolvlm-256m-instruct-127.0.0.1:49222): stderr   Referenced from: <4DBDA67D-9E21-3E84-8410-4EF66B6E235D> /private/tmp/localai/backend_data/backend-assets/lib/libprotobuf.29.3.0.dylib
2:07PM DBG GRPC(smolvlm-256m-instruct-127.0.0.1:49222): stderr   Reason: tried: '/private/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file), '/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file)

ImLunaHey · 2025-05-26T13:11:33Z

using ollama directly in CLI i get no issues.

ImLunaHey added bug Something isn't working unconfirmed labels May 26, 2025

corvusmod mentioned this issue May 27, 2025

Using darwin binaries in M1 Apple silicon experiences #5507

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gpuFillInfo not implemented on darwin #5465

gpuFillInfo not implemented on darwin #5465

ImLunaHey commented May 26, 2025

mudler commented May 26, 2025

Uh oh!

ImLunaHey commented May 26, 2025

Uh oh!

ImLunaHey commented May 26, 2025

Uh oh!

Uh oh!

gpuFillInfo not implemented on darwin #5465

gpuFillInfo not implemented on darwin #5465

Comments

ImLunaHey commented May 26, 2025

mudler commented May 26, 2025

Uh oh!

ImLunaHey commented May 26, 2025

Uh oh!

ImLunaHey commented May 26, 2025

Uh oh!