Welcome to the Jarvis AI Assistant project! ๐๏ธ This AI-powered assistant can perform various tasks such as providing weather reports ๐ฆ๏ธ, summarizing news ๐ฐ, sending emails ๐ง , CAG , and more, all through voice commands. Below, you'll find detailed instructions on how to set up, use, and interact with this assistant. ๐ง
โ
Voice Activation: activate listening mode. ๐ค
โ
Speech Recognition: Recognizes and processes user commands via speech input. ๐ฃ๏ธ
โ
AI Responses: Provides responses using AI-generated text-to-speech output. ๐ถ
โ
Task Execution: Handles multiple tasks, including:
- ๐ง Sending emails
- ๐ฆ๏ธ Summarizing weather reports
- ๐ Data Analysis using csv*
- ๐ง๐ปโ๐ป Pesonalize chat
- ๐ฐ Reading news headlines
- ๐ผ๏ธ Image generation
- ๐ฆ Database functions
- ๐ฑ Phone call automation using ADB
- ๐ค AI-based task execution
- ๐ก Automate websites & applications
- ๐ง Retrieval-Augmented Generation (RAG) for knowledge-based interactions on various topics
- โ Timeout Handling: Automatically deactivates listening mode after 5 minutes of inactivity. โณ
- โ Automatic Input Processing: If no "stop" command is detected within 60 seconds, input is finalized and sent to the AI model for processing. โ๏ธ
- โ Multiple Function Calls: Call multiple functions simultaneously, even if their inputs and outputs are unrelated. ๐
Before running the project, ensure you have the following installed:
โ
Python 3.9 or later ๐
โ
Required libraries (listed in requirements.txt
) ๐
-
Create a
.env
file in the root directory of the project. -
Add your API keys and other configuration variables to the
.env
file:Weather_api=your_weather_api_key News_api=your_news_api_key Sender_email=your_email Receiver_email=subject_email Password_email=email_password
2 . Install system requriements
bash ./intialize.sh
-
Setup API Keys & Passwords :
- ๐ฉ๏ธ WEATHER API - Get weather data.
- ๐ฐ NEWS API - Fetch latest news headlines.
- ๐ง GMAIL PASSWORD - Generate an app password for sending emails.
- ๐ง OLLAMA - Download models from Ollama (manual steup) .
install Models from ollama
ollama run gemma3:4b ollama run granite3.1-dense:2b ollama pull nomic-embed-text
- [portaudio] - download portaudio to work with sound.
- ๐ฎ GEMINI AI - API access for function execution.
Model
architecture gemma3
parameters 4.3B
context length 8192
embedding length 2560
quantization Q4_K_M
Parameters
stop "<end_of_turn>"
temperature 0.1
License
Gemma Terms of Use
Last modified: February 21, 2024
Model
architecture granite
parameters 2.5B
context length 131072
embedding length 2048
quantization Q4_K_M
System
Knowledge Cutoff Date: April 2024.
You are Granite, developed by IBM.
License
Apache License
Version 2.0, January 2004
gemini-2.0-flash
Audio, images, videos, and text Text, images (experimental), and audio (coming soon) Next generation features, speed, thinking, realtime streaming, and multimodal generation
gemini-2.0-flash-lite
Audio, images, videos, and text Text A Gemini 2.0 Flash model optimized for cost efficiency and low latency
gemini-2.0-pro-exp-02-05
Audio, images, videos, and text Text Our most powerful Gemini 2.0 model
gemini-1.5-flash
Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks
git clone https://github.com/ganeshnikhil/J.A.R.V.I.S.2.0.git
cd J.A.R.V.I.S.2.0
pip install -r requirements.txt
streamlit run ui.py
๐ Transitioned to Gemini AI-powered function calling, allowing multiple function calls simultaneously for better efficiency! โ๏ธ If Gemini AI fails to generate function calls, the system automatically falls back to an Ollama-based model for reliable execution.ย
๐น AI Model Used: Gemini AI ๐ง
โ
Higher accuracy โ
Structured data processing โ
Reliable AI-driven interactions
๐ก Retrieval-Augmented Generation (RAG) dynamically loads relevant markdown-based knowledge files based on the queried topic, reducing hallucinations and improving response accuracy.
๐น Integrated Android Debug Bridge (ADB) to enable voice-controlled phone automation! ๐๏ธ
โ
Make phone calls โ๏ธ
โ
Open apps & toggle settings ๐ฒ
โ
Access phone data & remote operations ๐ ๏ธ
๐ Windows
winget install --id=Google.AndroidSDKPlatformTools -e
๐ Linux
sudo apt install adb
๐ Mac
brew install android-platform-tools
โจ Deeper mobile integration ๐ฑ
โจ Advanced AI-driven automation ๐ค
โจ Improved NLP-based command execution ๐ง
โจ Multi-modal interactions (text + voice + image) ๐ผ๏ธ
๐ Stay tuned for future updates! ๐ฅ
## Gemini Model Comparison
The following table provides a comparison of various Gemini models with respect to their rate limits:
| Model | RPM | TPM | RPD |
|------------------------------------- |-----:|----------:| -----:|
| **Gemini 2.0 Flash** | 15 | 1,000,000 | 1,500 |
| **Gemini 2.0 Flash-Lite Preview** | 30 | 1,000,000 | 1,500 |
| **Gemini 2.0 Pro Experimental 02-05** | 2 | 1,000,000 | 50 |
| **Gemini 2.0 Flash Thinking Experimental** | 10 | 4,000,000 | 1,500 |
| **Gemini 1.5 Flash** | 15 | 1,000,000 | 1,500 |
| **Gemini 1.5 Flash-8B** | 15 | 1,000,000 | 1,500 |
| **Gemini 1.5 Pro** | 2 | 32,000 | 50 |
| **Imagen 3** | -- | -- | -- |
- RPM: Requests per minute
- TPM: Tokens per minute
- RPD: Requests per day
The focus of project is mostly on using small model and free (api) models , get accurate agentic behaviours , to run these on low spec systems to.