v2.29.0

I am thrilled to announce the release of LocalAI v2.29.0! This update focuses heavily on refining our container image strategy, making default images leaner and providing clearer options for users needing specific features or hardware acceleration. We've also added support for new models like Qwen3, enhanced existing backends, and introduced experimental endpoints, like video generation!

⚠️ Important: Breaking Changes

This release includes significant changes to container image tagging and contents. Please review carefully:

Python Dependencies Moved: Images containing extra Python dependencies (like those for diffusers) now require the -extras suffix (e.g., latest-gpu-nvidia-cuda-12-extras). Default images are now slimmer and do not include these dependencies.
FFmpeg is Now Standard: All core images now include FFmpeg. The separate -ffmpeg tags have been removed. If you previously used an -ffmpeg tagged image, simply switch to the corresponding base image tag (e.g., latest-gpu-hipblas-ffmpeg becomes latest-gpu-hipblas).

Here below some examples, note that the CI is still publishing the images so won't be available until jobs are processed, and the installation scripts will be updated right after images are publicly available.

CPU only image:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU Images:

# CUDA 12.0 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# CUDA 12.0 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12-extras

# CUDA 11.7 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11

# CUDA 11.7 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11-extras

# NVIDIA Jetson (L4T) ARM64
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

AMD GPU Images (ROCm):

# ROCm with core features
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

# ROCm with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas-extras

Intel GPU Images (oneAPI):

# Intel GPU with FP16 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16

# Intel GPU with FP16 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16-extras

# Intel GPU with FP32 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32

# Intel GPU with FP32 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32-extras

Vulkan GPU Images:

# Vulkan with core features
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

AIO Images (pre-downloaded models):

# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu

# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12

# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11

# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel-f16

# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas

For more information about the AIO images and pre-downloaded models, see Container Documentation.

Key Changes in v2.29.0

📦 Container Image Overhaul

-extras Suffix: Images with additional Python dependencies are now identified by the -extras suffix.
Default Images: Standard tags (like latest, latest-gpu-nvidia-cuda-12) now provide core LocalAI functionality without the extra Python libraries.
FFmpeg Inclusion: FFmpeg is bundled in all images, simplifying setup for multimedia tasks.
New latest-* Tags: Added specific latest tags for various GPU architectures:
- latest-gpu-hipblas (AMD ROCm)
- latest-gpu-intel-f16 (Intel oneAPI FP16)
- latest-gpu-intel-f32 (Intel oneAPI FP32)
- latest-gpu-nvidia-cuda-12 (NVIDIA CUDA 12)
- latest-gpu-vulkan (Vulkan)

🚀 New Features & Enhancements

Qwen3 Model Support: Officially integrated support for the Qwen3 model family.
Experimental Auto GPU Offload: LocalAI can now attempt to automatically detect GPUs and configure optimal layer offloading for llama.cpp and CLIP.
Whisper.cpp GPU Acceleration: Updated whisper.cpp and enabled GPU support via cuBLAS (NVIDIA) and Vulkan. SYCL and Hipblas support are in progress.
Experimental Video Generation: Introduced a /video/generations endpoint. Stay tuned for compatible model backends!
Installer Uninstall Option: The install.sh script now includes a --uninstall flag for easy removal.
Expanded Hipblas Targets: Added support for a wider range of AMD GPU architectures. gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102

🧹 Backend Updates

AutoGPTQ Backend Removed: This backend has been dropped due to being discontinued upstream.
llama.cpp experimental support to automatically detect GPU layers offloading.

The Complete Local Stack for Privacy-First AI

With LocalAGI rejoining LocalAI alongside LocalRecall, our ecosystem provides a complete, open-source stack for private, secure, and intelligent AI operations:

LocalAI

The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall

A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall

Join the Movement! ❤️

A massive THANK YOU to our incredible community! LocalAI has over 32,500 stars, and LocalAGI has already rocketed past 650+ stars!

As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time. If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!

👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI

Let's continue building the future of AI, together! 🙌

Full changelog 👇

👉 Click to expand 👈

What's Changed

Breaking Changes 🛠

chore(autogptq): drop archived backend by @mudler in #5214
chore(ci): build only images with ffmpeg included, simplify tags by @mudler in #5251
chore(ci): strip 'core' in the image suffix, identify python-based images with 'extras' by @mudler in #5353

Bug fixes 🐛

fix: bark-cpp: assign FLAG_TTS to bark-cpp backend by @M0Rf30 in #5186
fix(talk): Talk interface sends content-type headers to chatgpt by @baflo in #5200
fix: installation script compatibility with fedora 41 and later, fedora headless unclear errors by @Bloodis94 in #5239
fix(stablediffusion-ggml): Build with DSD CUDA, HIP and Metal flags by @richiejp in #5236
fix(install/gpu):Fix docker not being able to leverage the GPU on systems that have SELinux Enforced by @Bloodis94 in #5252
fix(aio): Fix copypasta in download files for gpt-4 model by @richiejp in #5276
fix(diffusers): consider options only in form of key/value by @mudler in #5277
fix(gpu): do not assume gpu being returned has node and mem by @mudler in #5310
fix(hipblas): do not build all cpu-specific flags by @mudler in #5322

Exciting New Features 🎉

chore(ci): add latest images for core by @mudler in #5198
feat(install.sh): allow to uninstall with --uninstall by @mudler in #5202
chore: bump grpc limits to 50MB by @mudler in #5212
feat(llama.cpp/clip): inject gpu options if we detect GPUs by @mudler in #5243
feat(install): added complete process for installing nvidia drivers on fedora without pulling X11 by @Bloodis94 in #5246
feat(video-gen): add endpoint for video generation by @mudler in #5247
feat(llama.cpp): estimate vram usage by @mudler in #5299
chore(defaults): enlarge defaults, drop gpu layers which is infered by @mudler in #5308
fix(arm64): do not build instructions which are not available by @mudler in #5318
feat(whisper.cpp): gpu support by @mudler in #5344

🧠 Models

chore(model gallery): add suno-ai bark-cpp model by @M0Rf30 in #5187
chore(model gallery): add menlo_rezero-v0.1-llama-3.2-3b-it-grpo-250404 by @mudler in #5194
chore(model gallery): add thedrummer_rivermind-12b-v1 by @mudler in #5195
chore(model gallery): add dreamgen_lucid-v1-nemo by @mudler in #5196
chore(model gallery): add qwen2.5-14b-instruct-1m by @mudler in #5201
chore(model gallery): add ibm-granite_granite-3.3-8b-instruct by @mudler in #5204
chore(model gallery): add ibm-granite_granite-3.3-2b-instruct by @mudler in #5205
chore(model gallery): add readyart_amoral-fallen-omega-gemma3-12b by @mudler in #5206
chore(model gallery): add google-gemma-3-27b-it-qat-q4_0-small by @mudler in #5207
chore(model gallery): add pictor-1338-qwenp-1.5b by @mudler in #5208
chore(model gallery) add llama_3.3_70b_darkhorse-i1 by @mudler in #5222
chore(model gallery) add amoral-gemma3-1b-v2 by @mudler in #5223
chore(model gallery): add starrysky-12b-i1 by @mudler in #5224
chore(model gallery): add soob3123_veritas-12b by @mudler in #5241
chore(model gallery): add l3.3-geneticlemonade-unleashed-v2-70b by @mudler in #5249
chore(model gallery): add l3.3-genetic-lemonade-sunset-70b by @mudler in #5250
chore(model gallery): add nvidia_openmath-nemotron-32b by @mudler in #5260
chore(model gallery): add nvidia_openmath-nemotron-1.5b by @mudler in #5261
chore(model gallery): add nvidia_openmath-nemotron-7b by @mudler in #5262
chore(model gallery): add nvidia_openmath-nemotron-14b by @mudler in #5263
chore(model gallery): add nvidia_openmath-nemotron-14b-kaggle by @mudler in #5264
chore(model gallery): add qwen3-30b-a3b by @mudler in #5269
chore(model gallery): add qwen3-32b by @mudler in #5270
chore(model-gallery): ⬆️ update checksum by @localai-bot in #5268
chore(model gallery): add qwen3-14b by @mudler in #5271
chore(model gallery): add qwen3-8b by @mudler in #5272
chore(model gallery): add qwen3-4b by @mudler in #5273
chore(model gallery): add qwen3-1.7b by @mudler in #5274
chore(model gallery): add qwen3-0.6b by @mudler in #5275
chore(model gallery): add mlabonne_qwen3-14b-abliterated by @mudler in #5281
chore(model gallery): add mlabonne_qwen3-8b-abliterated by @mudler in #5282
chore(model gallery): add mlabonne_qwen3-4b-abliterated by @mudler in #5283
chore(model gallery): add qwen3-30b-a3b-abliterated by @mudler in #5285
chore(model gallery): add qwen3-8b-jailbroken by @mudler in #5286
chore(model gallery): add fast-math-qwen3-14b by @mudler in #5287
chore(model gallery): add microsoft_phi-4-mini-reasoning by @mudler in #5288
chore(model gallery): add josiefied-qwen3-8b-abliterated-v1 by @mudler in #5293
chore(model gallery): add furina-8b by @mudler in #5294
chore(model gallery): add microsoft_phi-4-reasoning-plus by @mudler in #5295
chore(model gallery): add microsoft_phi-4-reasoning by @mudler in #5296
chore(model gallery): add shuttleai_shuttle-3.5 by @mudler in #5297
chore(model gallery): add webthinker-qwq-32b-i1 by @mudler in #5298
chore(model gallery): add planetoid_27b_v.2 by @mudler in #5301
chore(model gallery): add genericrpv3-4b by @mudler in #5302
chore(model gallery): add comet_12b_v.5-i1 by @mudler in #5303
chore(model gallery): add amoral-qwen3-14b by @mudler in #5304
chore(model gallery): add qwen-3-32b-medical-reasoning-i1 by @mudler in #5305
chore(model gallery): add smoothie-qwen3-8b by @mudler in #5306
chore(model gallery): add qwen3-30b-a1.5b-high-speed by @mudler in #5311
chore(model gallery): add kalomaze_qwen3-16b-a3b by @mudler in #5312
chore(model gallery): add rei-v3-kto-12b by @mudler in #5313
chore(model gallery): add allura-org_remnant-qwen3-8b by @mudler in #5317
chore(model-gallery): ⬆️ update checksum by @localai-bot in #5321
chore(model gallery): add huihui-ai_qwen3-14b-abliterated by @mudler in #5324
chore(model gallery): add goekdeniz-guelmez_josiefied-qwen3-8b-abliterated-v1 by @mudler in #5325
chore(model gallery): add claria-14b by @mudler in #5326
chore(model gallery): add qwen3-14b-griffon-i1 by @mudler in #5330
chore(model gallery): add qwen3-4b-esper3-i1 by @mudler in #5332
chore(model gallery): add servicenow-ai_apriel-nemotron-15b-thinker by @mudler in #5333
chore(model gallery): add cognition-ai_kevin-32b by @mudler in #5334
chore(model gallery): add qwen3-14b-uncensored by @mudler in #5335
chore(model gallery): add symiotic-14b-i1 by @mudler in #5336
chore(model gallery): add gemma-3-12b-fornaxv.2-qat-cot by @mudler in #5337
chore(model-gallery): ⬆️ update checksum by @localai-bot in #5346
chore(model gallery): add gryphe_pantheon-proto-rp-1.8-30b-a3b by @mudler in #5347
chore(model gallery): add qwen_qwen2.5-vl-7b-instruct by @mudler in #5348
chore(model gallery): add qwen_qwen2.5-vl-72b-instruct by @mudler in #5349

📖 Documentation and examples

chore(docs): improve installer.sh docs by @mudler in #5232
docs(Vulkan): Add GPU docker documentation for Vulkan by @sredman in #5255
docs: update docs for DisableWebUI flag by @Mohit-Gaur in #5256
fix(CUDA):Add note for how to run CUDA with SELinux by @sredman in #5259

👒 Dependencies

chore: ⬆️ Update ggml-org/llama.cpp to 80f19b41869728eeb6a26569957b92a773a2b2c6 by @localai-bot in #5183
chore: ⬆️ Update ggml-org/llama.cpp to 015022bb53387baa8b23817ac03743705c7d472b by @localai-bot in #5192
chore: ⬆️ Update ggml-org/llama.cpp to 2f74c354c0f752ed9aabf7d3a350e6edebd7e744 by @localai-bot in #5203
chore: ⬆️ Update ggml-org/llama.cpp to 6408210082cc0a61b992b487be7e2ff2efbb9e36 by @localai-bot in #5211
chore: ⬆️ Update ggml-org/llama.cpp to 00137157fca3d17b90380762b4d7cc158d385bd3 by @localai-bot in #5218
chore: ⬆️ Update ggml-org/llama.cpp to 6602304814e679cc8c162bb760a034aceb4f8965 by @localai-bot in #5228
chore: ⬆️ Update ggml-org/llama.cpp to 1d735c0b4fa0551c51c2f4ac888dd9a01f447985 by @localai-bot in #5233
chore(deps): bump mxschmitt/action-tmate from 3.19 to 3.21 by @dependabot in #5231
chore: ⬆️ Update ggml-org/llama.cpp to 658987cfc9d752dca7758987390d5fb1a7a0a54a by @localai-bot in #5234
chore: ⬆️ Update ggml-org/llama.cpp to ecda2ec4b347031a9b8a89ee2efc664ce63f599c by @localai-bot in #5238
chore: ⬆️ Update ggml-org/llama.cpp to 226251ed56b85190e18a1cca963c45b888f4953c by @localai-bot in #5240
chore: ⬆️ Update ggml-org/llama.cpp to 295354ea6848a77bdee204ee1c971d9b92ffcca9 by @localai-bot in #5245
chore: ⬆️ Update ggml-org/llama.cpp to 77d5e9a76a7b4a8a7c5bf9cf6ebef91860123cba by @localai-bot in #5254
chore: ⬆️ Update ggml-org/llama.cpp to ced44be34290fab450f8344efa047d8a08e723b4 by @localai-bot in #5258
chore(deps): bump appleboy/scp-action from 0.1.7 to 1.0.0 by @dependabot in #5265
chore: ⬆️ Update ggml-org/llama.cpp to 5f5e39e1ba5dbea814e41f2a15e035d749a520bc by @localai-bot in #5267
chore: ⬆️ Update ggml-org/llama.cpp to e2e1ddb93a01ce282e304431b37e60b3cddb6114 by @localai-bot in #5278
fix: vllm missing logprobs by @wyattearp in #5279
chore: ⬆️ Update ggml-org/llama.cpp to 3e168bede4d27b35656ab8026015b87659ecbec2 by @localai-bot in #5284
chore: ⬆️ Update ggml-org/llama.cpp to d7a14c42a1883a34a6553cbfe30da1e1b84dfd6a by @localai-bot in #5292
chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194 by @mudler in #5307
chore: ⬆️ Update ggml-org/llama.cpp to 36667c8edcded08063ed51c7d57e9e086bbfc903 by @localai-bot in #5300
fix: use rice when embedding large binaries by @mudler in #5309
chore: ⬆️ Update ggml-org/llama.cpp to 9fdfcdaeddd1ef57c6d041b89cd8fb7048a0f028 by @localai-bot in #5316
chore(deps): bump mxschmitt/action-tmate from 3.21 to 3.22 by @dependabot in #5319
chore(deps): bump llama.cpp to b34c859146630dff136943abc9852ca173a7c9d6 by @mudler in #5323
chore: ⬆️ Update ggml-org/llama.cpp to 91a86a6f354aa73a7aab7bc3d283be410fdc93a5 by @localai-bot in #5329
chore: ⬆️ Update ggml-org/llama.cpp to 814f795e063c257f33b921eab4073484238a151a by @localai-bot in #5331
chore: ⬆️ Update ggml-org/llama.cpp to f05a6d71a0f3dbf0730b56a1abbad41c0f42e63d by @localai-bot in #5340
chore(deps): bump whisper.cpp by @mudler in #5338
feat: Add sycl support for whisper.cpp by @mudler in #5341
chore: ⬆️ Update ggml-org/llama.cpp to 33eff4024084d1f0c8441b79f7208a52fad79858 by @localai-bot in #5343
chore: ⬆️ Update ggml-org/llama.cpp to 15e6125a397f6086c1dfdf7584acdb7c730313dc by @localai-bot in #5345
chore: ⬆️ Update ggml-org/whisper.cpp to 2e310b841e0b4e7cf00890b53411dd9f8578f243 by @localai-bot in #4785
chore: ⬆️ Update ggml-org/llama.cpp to 9a390c4829cd3058d26a2e2c09d16e3fd12bf1b1 by @localai-bot in #5351
chore(deps): bump dependabot/fetch-metadata from 2.3.0 to 2.4.0 by @dependabot in #5355
chore(deps): bump securego/gosec from 2.22.3 to 2.22.4 by @dependabot in #5356

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5191
feat(swagger): update swagger by @localai-bot in #5217
fix(ci): add clang by @mudler in #5242
chore(deps): bump grpcio to 1.72.0 by @mudler in #5244
feat(swagger): update swagger by @localai-bot in #5253

New Contributors

@baflo made their first contribution in #5200
@Bloodis94 made their first contribution in #5239
@sredman made their first contribution in #5255
@Mohit-Gaur made their first contribution in #5256
@wyattearp made their first contribution in #5279

Full Changelog: v2.28.0...v2.29.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

v2.29.0

v2.29.0

⚠️ Important: Breaking Changes

CPU only image:

NVIDIA GPU Images:

AMD GPU Images (ROCm):

Intel GPU Images (oneAPI):

Vulkan GPU Images:

AIO Images (pre-downloaded models):

Key Changes in v2.29.0

📦 Container Image Overhaul

🚀 New Features & Enhancements

🧹 Backend Updates

The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

Join the Movement! ❤️

Full changelog 👇

What's Changed

Breaking Changes 🛠

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

Contributors

Uh oh!