Skip to content

v2.29.0

Latest
Compare
Choose a tag to compare
@mudler mudler released this 12 May 20:31
· 74 commits to master since this release
fd17a33




v2.29.0

I am thrilled to announce the release of LocalAI v2.29.0! This update focuses heavily on refining our container image strategy, making default images leaner and providing clearer options for users needing specific features or hardware acceleration. We've also added support for new models like Qwen3, enhanced existing backends, and introduced experimental endpoints, like video generation!

⚠️ Important: Breaking Changes

This release includes significant changes to container image tagging and contents. Please review carefully:

  • Python Dependencies Moved: Images containing extra Python dependencies (like those for diffusers) now require the -extras suffix (e.g., latest-gpu-nvidia-cuda-12-extras). Default images are now slimmer and do not include these dependencies.
  • FFmpeg is Now Standard: All core images now include FFmpeg. The separate -ffmpeg tags have been removed. If you previously used an -ffmpeg tagged image, simply switch to the corresponding base image tag (e.g., latest-gpu-hipblas-ffmpeg becomes latest-gpu-hipblas).

Here below some examples, note that the CI is still publishing the images so won't be available until jobs are processed, and the installation scripts will be updated right after images are publicly available.

CPU only image:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU Images:

# CUDA 12.0 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# CUDA 12.0 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12-extras

# CUDA 11.7 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11

# CUDA 11.7 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11-extras

# NVIDIA Jetson (L4T) ARM64
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

AMD GPU Images (ROCm):

# ROCm with core features
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

# ROCm with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas-extras

Intel GPU Images (oneAPI):

# Intel GPU with FP16 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16

# Intel GPU with FP16 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16-extras

# Intel GPU with FP32 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32

# Intel GPU with FP32 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32-extras

Vulkan GPU Images:

# Vulkan with core features
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

AIO Images (pre-downloaded models):

# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu

# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12

# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11

# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel-f16

# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas

For more information about the AIO images and pre-downloaded models, see Container Documentation.

Key Changes in v2.29.0

📦 Container Image Overhaul

  • -extras Suffix: Images with additional Python dependencies are now identified by the -extras suffix.
  • Default Images: Standard tags (like latest, latest-gpu-nvidia-cuda-12) now provide core LocalAI functionality without the extra Python libraries.
  • FFmpeg Inclusion: FFmpeg is bundled in all images, simplifying setup for multimedia tasks.
  • New latest-* Tags: Added specific latest tags for various GPU architectures:
    • latest-gpu-hipblas (AMD ROCm)
    • latest-gpu-intel-f16 (Intel oneAPI FP16)
    • latest-gpu-intel-f32 (Intel oneAPI FP32)
    • latest-gpu-nvidia-cuda-12 (NVIDIA CUDA 12)
    • latest-gpu-vulkan (Vulkan)

🚀 New Features & Enhancements

  • Qwen3 Model Support: Officially integrated support for the Qwen3 model family.
  • Experimental Auto GPU Offload: LocalAI can now attempt to automatically detect GPUs and configure optimal layer offloading for llama.cpp and CLIP.
  • Whisper.cpp GPU Acceleration: Updated whisper.cpp and enabled GPU support via cuBLAS (NVIDIA) and Vulkan. SYCL and Hipblas support are in progress.
  • Experimental Video Generation: Introduced a /video/generations endpoint. Stay tuned for compatible model backends!
  • Installer Uninstall Option: The install.sh script now includes a --uninstall flag for easy removal.
  • Expanded Hipblas Targets: Added support for a wider range of AMD GPU architectures. gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102

🧹 Backend Updates

  • AutoGPTQ Backend Removed: This backend has been dropped due to being discontinued upstream.
  • llama.cpp experimental support to automatically detect GPU layers offloading.

The Complete Local Stack for Privacy-First AI

With LocalAGI rejoining LocalAI alongside LocalRecall, our ecosystem provides a complete, open-source stack for private, secure, and intelligent AI operations:

LocalAI Logo

LocalAI

The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI Logo

LocalAGI

A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall Logo

LocalRecall

A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall

Join the Movement! ❤️

A massive THANK YOU to our incredible community! LocalAI has over 32,500 stars, and LocalAGI has already rocketed past 650+ stars!

As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time. If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!

👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI

Let's continue building the future of AI, together! 🙌

Full changelog 👇

👉 Click to expand 👈

What's Changed

Breaking Changes 🛠

  • chore(autogptq): drop archived backend by @mudler in #5214
  • chore(ci): build only images with ffmpeg included, simplify tags by @mudler in #5251
  • chore(ci): strip 'core' in the image suffix, identify python-based images with 'extras' by @mudler in #5353

Bug fixes 🐛

  • fix: bark-cpp: assign FLAG_TTS to bark-cpp backend by @M0Rf30 in #5186
  • fix(talk): Talk interface sends content-type headers to chatgpt by @baflo in #5200
  • fix: installation script compatibility with fedora 41 and later, fedora headless unclear errors by @Bloodis94 in #5239
  • fix(stablediffusion-ggml): Build with DSD CUDA, HIP and Metal flags by @richiejp in #5236
  • fix(install/gpu):Fix docker not being able to leverage the GPU on systems that have SELinux Enforced by @Bloodis94 in #5252
  • fix(aio): Fix copypasta in download files for gpt-4 model by @richiejp in #5276
  • fix(diffusers): consider options only in form of key/value by @mudler in #5277
  • fix(gpu): do not assume gpu being returned has node and mem by @mudler in #5310
  • fix(hipblas): do not build all cpu-specific flags by @mudler in #5322

Exciting New Features 🎉

  • chore(ci): add latest images for core by @mudler in #5198
  • feat(install.sh): allow to uninstall with --uninstall by @mudler in #5202
  • chore: bump grpc limits to 50MB by @mudler in #5212
  • feat(llama.cpp/clip): inject gpu options if we detect GPUs by @mudler in #5243
  • feat(install): added complete process for installing nvidia drivers on fedora without pulling X11 by @Bloodis94 in #5246
  • feat(video-gen): add endpoint for video generation by @mudler in #5247
  • feat(llama.cpp): estimate vram usage by @mudler in #5299
  • chore(defaults): enlarge defaults, drop gpu layers which is infered by @mudler in #5308
  • fix(arm64): do not build instructions which are not available by @mudler in #5318
  • feat(whisper.cpp): gpu support by @mudler in #5344

🧠 Models

  • chore(model gallery): add suno-ai bark-cpp model by @M0Rf30 in #5187
  • chore(model gallery): add menlo_rezero-v0.1-llama-3.2-3b-it-grpo-250404 by @mudler in #5194
  • chore(model gallery): add thedrummer_rivermind-12b-v1 by @mudler in #5195
  • chore(model gallery): add dreamgen_lucid-v1-nemo by @mudler in #5196
  • chore(model gallery): add qwen2.5-14b-instruct-1m by @mudler in #5201
  • chore(model gallery): add ibm-granite_granite-3.3-8b-instruct by @mudler in #5204
  • chore(model gallery): add ibm-granite_granite-3.3-2b-instruct by @mudler in #5205
  • chore(model gallery): add readyart_amoral-fallen-omega-gemma3-12b by @mudler in #5206
  • chore(model gallery): add google-gemma-3-27b-it-qat-q4_0-small by @mudler in #5207
  • chore(model gallery): add pictor-1338-qwenp-1.5b by @mudler in #5208
  • chore(model gallery) add llama_3.3_70b_darkhorse-i1 by @mudler in #5222
  • chore(model gallery) add amoral-gemma3-1b-v2 by @mudler in #5223
  • chore(model gallery): add starrysky-12b-i1 by @mudler in #5224
  • chore(model gallery): add soob3123_veritas-12b by @mudler in #5241
  • chore(model gallery): add l3.3-geneticlemonade-unleashed-v2-70b by @mudler in #5249
  • chore(model gallery): add l3.3-genetic-lemonade-sunset-70b by @mudler in #5250
  • chore(model gallery): add nvidia_openmath-nemotron-32b by @mudler in #5260
  • chore(model gallery): add nvidia_openmath-nemotron-1.5b by @mudler in #5261
  • chore(model gallery): add nvidia_openmath-nemotron-7b by @mudler in #5262
  • chore(model gallery): add nvidia_openmath-nemotron-14b by @mudler in #5263
  • chore(model gallery): add nvidia_openmath-nemotron-14b-kaggle by @mudler in #5264
  • chore(model gallery): add qwen3-30b-a3b by @mudler in #5269
  • chore(model gallery): add qwen3-32b by @mudler in #5270
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #5268
  • chore(model gallery): add qwen3-14b by @mudler in #5271
  • chore(model gallery): add qwen3-8b by @mudler in #5272
  • chore(model gallery): add qwen3-4b by @mudler in #5273
  • chore(model gallery): add qwen3-1.7b by @mudler in #5274
  • chore(model gallery): add qwen3-0.6b by @mudler in #5275
  • chore(model gallery): add mlabonne_qwen3-14b-abliterated by @mudler in #5281
  • chore(model gallery): add mlabonne_qwen3-8b-abliterated by @mudler in #5282
  • chore(model gallery): add mlabonne_qwen3-4b-abliterated by @mudler in #5283
  • chore(model gallery): add qwen3-30b-a3b-abliterated by @mudler in #5285
  • chore(model gallery): add qwen3-8b-jailbroken by @mudler in #5286
  • chore(model gallery): add fast-math-qwen3-14b by @mudler in #5287
  • chore(model gallery): add microsoft_phi-4-mini-reasoning by @mudler in #5288
  • chore(model gallery): add josiefied-qwen3-8b-abliterated-v1 by @mudler in #5293
  • chore(model gallery): add furina-8b by @mudler in #5294
  • chore(model gallery): add microsoft_phi-4-reasoning-plus by @mudler in #5295
  • chore(model gallery): add microsoft_phi-4-reasoning by @mudler in #5296
  • chore(model gallery): add shuttleai_shuttle-3.5 by @mudler in #5297
  • chore(model gallery): add webthinker-qwq-32b-i1 by @mudler in #5298
  • chore(model gallery): add planetoid_27b_v.2 by @mudler in #5301
  • chore(model gallery): add genericrpv3-4b by @mudler in #5302
  • chore(model gallery): add comet_12b_v.5-i1 by @mudler in #5303
  • chore(model gallery): add amoral-qwen3-14b by @mudler in #5304
  • chore(model gallery): add qwen-3-32b-medical-reasoning-i1 by @mudler in #5305
  • chore(model gallery): add smoothie-qwen3-8b by @mudler in #5306
  • chore(model gallery): add qwen3-30b-a1.5b-high-speed by @mudler in #5311
  • chore(model gallery): add kalomaze_qwen3-16b-a3b by @mudler in #5312
  • chore(model gallery): add rei-v3-kto-12b by @mudler in #5313
  • chore(model gallery): add allura-org_remnant-qwen3-8b by @mudler in #5317
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #5321
  • chore(model gallery): add huihui-ai_qwen3-14b-abliterated by @mudler in #5324
  • chore(model gallery): add goekdeniz-guelmez_josiefied-qwen3-8b-abliterated-v1 by @mudler in #5325
  • chore(model gallery): add claria-14b by @mudler in #5326
  • chore(model gallery): add qwen3-14b-griffon-i1 by @mudler in #5330
  • chore(model gallery): add qwen3-4b-esper3-i1 by @mudler in #5332
  • chore(model gallery): add servicenow-ai_apriel-nemotron-15b-thinker by @mudler in #5333
  • chore(model gallery): add cognition-ai_kevin-32b by @mudler in #5334
  • chore(model gallery): add qwen3-14b-uncensored by @mudler in #5335
  • chore(model gallery): add symiotic-14b-i1 by @mudler in #5336
  • chore(model gallery): add gemma-3-12b-fornaxv.2-qat-cot by @mudler in #5337
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #5346
  • chore(model gallery): add gryphe_pantheon-proto-rp-1.8-30b-a3b by @mudler in #5347
  • chore(model gallery): add qwen_qwen2.5-vl-7b-instruct by @mudler in #5348
  • chore(model gallery): add qwen_qwen2.5-vl-72b-instruct by @mudler in #5349

📖 Documentation and examples

  • chore(docs): improve installer.sh docs by @mudler in #5232
  • docs(Vulkan): Add GPU docker documentation for Vulkan by @sredman in #5255
  • docs: update docs for DisableWebUI flag by @Mohit-Gaur in #5256
  • fix(CUDA):Add note for how to run CUDA with SELinux by @sredman in #5259

👒 Dependencies

  • chore: ⬆️ Update ggml-org/llama.cpp to 80f19b41869728eeb6a26569957b92a773a2b2c6 by @localai-bot in #5183
  • chore: ⬆️ Update ggml-org/llama.cpp to 015022bb53387baa8b23817ac03743705c7d472b by @localai-bot in #5192
  • chore: ⬆️ Update ggml-org/llama.cpp to 2f74c354c0f752ed9aabf7d3a350e6edebd7e744 by @localai-bot in #5203
  • chore: ⬆️ Update ggml-org/llama.cpp to 6408210082cc0a61b992b487be7e2ff2efbb9e36 by @localai-bot in #5211
  • chore: ⬆️ Update ggml-org/llama.cpp to 00137157fca3d17b90380762b4d7cc158d385bd3 by @localai-bot in #5218
  • chore: ⬆️ Update ggml-org/llama.cpp to 6602304814e679cc8c162bb760a034aceb4f8965 by @localai-bot in #5228
  • chore: ⬆️ Update ggml-org/llama.cpp to 1d735c0b4fa0551c51c2f4ac888dd9a01f447985 by @localai-bot in #5233
  • chore(deps): bump mxschmitt/action-tmate from 3.19 to 3.21 by @dependabot in #5231
  • chore: ⬆️ Update ggml-org/llama.cpp to 658987cfc9d752dca7758987390d5fb1a7a0a54a by @localai-bot in #5234
  • chore: ⬆️ Update ggml-org/llama.cpp to ecda2ec4b347031a9b8a89ee2efc664ce63f599c by @localai-bot in #5238
  • chore: ⬆️ Update ggml-org/llama.cpp to 226251ed56b85190e18a1cca963c45b888f4953c by @localai-bot in #5240
  • chore: ⬆️ Update ggml-org/llama.cpp to 295354ea6848a77bdee204ee1c971d9b92ffcca9 by @localai-bot in #5245
  • chore: ⬆️ Update ggml-org/llama.cpp to 77d5e9a76a7b4a8a7c5bf9cf6ebef91860123cba by @localai-bot in #5254
  • chore: ⬆️ Update ggml-org/llama.cpp to ced44be34290fab450f8344efa047d8a08e723b4 by @localai-bot in #5258
  • chore(deps): bump appleboy/scp-action from 0.1.7 to 1.0.0 by @dependabot in #5265
  • chore: ⬆️ Update ggml-org/llama.cpp to 5f5e39e1ba5dbea814e41f2a15e035d749a520bc by @localai-bot in #5267
  • chore: ⬆️ Update ggml-org/llama.cpp to e2e1ddb93a01ce282e304431b37e60b3cddb6114 by @localai-bot in #5278
  • fix: vllm missing logprobs by @wyattearp in #5279
  • chore: ⬆️ Update ggml-org/llama.cpp to 3e168bede4d27b35656ab8026015b87659ecbec2 by @localai-bot in #5284
  • chore: ⬆️ Update ggml-org/llama.cpp to d7a14c42a1883a34a6553cbfe30da1e1b84dfd6a by @localai-bot in #5292
  • chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194 by @mudler in #5307
  • chore: ⬆️ Update ggml-org/llama.cpp to 36667c8edcded08063ed51c7d57e9e086bbfc903 by @localai-bot in #5300
  • fix: use rice when embedding large binaries by @mudler in #5309
  • chore: ⬆️ Update ggml-org/llama.cpp to 9fdfcdaeddd1ef57c6d041b89cd8fb7048a0f028 by @localai-bot in #5316
  • chore(deps): bump mxschmitt/action-tmate from 3.21 to 3.22 by @dependabot in #5319
  • chore(deps): bump llama.cpp to b34c859146630dff136943abc9852ca173a7c9d6 by @mudler in #5323
  • chore: ⬆️ Update ggml-org/llama.cpp to 91a86a6f354aa73a7aab7bc3d283be410fdc93a5 by @localai-bot in #5329
  • chore: ⬆️ Update ggml-org/llama.cpp to 814f795e063c257f33b921eab4073484238a151a by @localai-bot in #5331
  • chore: ⬆️ Update ggml-org/llama.cpp to f05a6d71a0f3dbf0730b56a1abbad41c0f42e63d by @localai-bot in #5340
  • chore(deps): bump whisper.cpp by @mudler in #5338
  • feat: Add sycl support for whisper.cpp by @mudler in #5341
  • chore: ⬆️ Update ggml-org/llama.cpp to 33eff4024084d1f0c8441b79f7208a52fad79858 by @localai-bot in #5343
  • chore: ⬆️ Update ggml-org/llama.cpp to 15e6125a397f6086c1dfdf7584acdb7c730313dc by @localai-bot in #5345
  • chore: ⬆️ Update ggml-org/whisper.cpp to 2e310b841e0b4e7cf00890b53411dd9f8578f243 by @localai-bot in #4785
  • chore: ⬆️ Update ggml-org/llama.cpp to 9a390c4829cd3058d26a2e2c09d16e3fd12bf1b1 by @localai-bot in #5351
  • chore(deps): bump dependabot/fetch-metadata from 2.3.0 to 2.4.0 by @dependabot in #5355
  • chore(deps): bump securego/gosec from 2.22.3 to 2.22.4 by @dependabot in #5356

Other Changes

New Contributors

Full Changelog: v2.28.0...v2.29.0