A computer chip with the letter ia printed on it

Exploring m.ai 0.1: The Future of Personal AI Assistance without the Price Tag

6/12/20264 min read

an abstract background of blue, green, and yellow colors

M.AI 0.1

A Personal AI Assistant That Runs on Your Infrastructure — With Zero Paid Dependencies

Voice + Text • 8 LLM Providers • Persistent Memory • GitHub Awareness • Project Automation

What It Is

M.AI 0.1 is a self-hosted personal AI assistant — voice and text — that orchestrates eight free LLM providers, remembers everything across sessions through a three-tier memory system, sees across the developer's entire GitHub portfolio, and executes real work on real projects through a sandboxed skill runner. It is governed by a written 19-rule constitution embedded in its system prompt, and its entire operating cost is zero.

This is not a wrapper around one API. It is a full assistant platform: routing engine, health monitoring, memory with vector embeddings, agentic tool execution, GitHub repository intelligence, and project integration — deployed with a single Docker Compose file that runs identically on a laptop, a Hostinger VPS, or Oracle's free ARM tier.

The Problem It Solves

Personal AI assistants today force a choice: pay monthly subscriptions for a hosted assistant you don't control, or run weak local models on hardware you don't have. Either way, your conversations, your keys, and your data live on someone else's terms.

M.AI 0.1 takes a third path: orchestrate the generous free tiers of eight different LLM providers — Gemini, Groq, Cerebras, SambaNova, OpenRouter, Hugging Face and more — behind one intelligent router, so the combined capacity rivals a paid plan while the cost stays at exactly zero. The infrastructure is yours. The keys are yours. The memory is yours.

Core Capabilities

• Voice and text chat through a browser UI with a live 3D orb visualization (Three.js)

• Speech-to-text via browser Web Speech API with Groq Whisper as cloud fallback — dual STT paths

• Text-to-speech via the browser's Speech Synthesis API — zero cost, works offline

• Multi-provider LLM routing across 8 providers and 20+ models, selected per task type

• GitHub repository awareness — connects to the developer's GitHub account and answers questions about any project in the portfolio

• Three-tier memory: session memory, project knowledge, and persistent long-term memory with vector embeddings

• Agentic tool execution — up to 8 reasoning rounds per request, with native function calling on Gemini

• Project skill runner — imported projects expose skills (dev servers, builds, pipelines) the assistant can execute on command

• Persistent conversations that survive page refreshes and restarts

Hidden Technical Strengths

GitHub Portfolio Intelligence

M.AI connects directly to the developer's GitHub account and can inspect every repository in the portfolio — code, structure, READMEs, and project metadata. Ask it what a project does, how two projects differ, which repo contains a particular feature, or what the tech stack of any project is, and it answers from the actual source — not from memory or guesswork. This turns the assistant into a living index of the developer's entire body of work: a visitor, collaborator, or client can ask about any project and get an accurate, current answer drawn from the repositories themselves.

Capability-Aware Routing Engine

The router does not just fail over blindly. Every provider declares a capability matrix — what it can do (chat, code, vision, reasoning, embeddings, transcription) and how large its context window is. For each incoming task, the router checks key availability, current health status, capability fit, and whether the payload would exceed 90% of the provider's context window. Only then does it dispatch. Task-specific orderings live in a single config file: voice replies route to the fastest provider, deep reasoning routes to DeepSeek-R1, code routes to Qwen Coder.

Self-Healing Provider Health System

Every provider call updates a Redis-backed health tracker. Three consecutive failures mark a provider as degraded — and the router silently reorders around it. A background timer probes degraded providers every five minutes and restores them automatically when they recover. Health state never blocks a call; it only informs preference. If Redis itself is down, an in-memory fallback takes over. The system degrades gracefully at every layer.

BYOK — Keys That Never Touch the Server

API keys are entered in the browser, stored in localStorage, and sent as per-request headers. The server reads the header, uses the key for exactly one upstream call, and discards it. Nothing is persisted server-side. A user can point the hosted instance at their own keys with zero trust required in the host — a security model most commercial AI products cannot offer.

Tool-Forwarding Intelligence

Function-calling schemas are forwarded only to providers that actually support them (Gemini). For providers that do not, tools are stripped before dispatch — preventing the schema-validation errors that plague naive multi-provider setups. The orchestrator then executes returned tool calls and feeds results back into the loop, up to 8 rounds.

Physical Resource Caps, Not Financial Limits

Project skills run as child processes with explicit CPU, RAM, timeout, and concurrency limits defined per skill in the project manifest. The system never says 'you've hit your quota' — limits are physics, not finance. This is a deliberate constitutional rule.

A Written Constitution

Nineteen rules are embedded in the system prompt and loaded at startup: amplify the human rather than replace them; log everything; a panic stop command is always valid; zero paid services ever; more providers equals more capability; self-modification requires passing a six-stage gate with human approval. The assistant's behavior is governed by a document, not by vibes.

Three-Tier Memory With Embedding Search

Tier one is in-process session memory. Tier two is project knowledge loaded at boot. Tier three is long-term memory persisted as JSON with vector embeddings — recalled by cosine-similarity search, with keyword search as fallback when embeddings are unavailable. The assistant remembers past conversations across restarts and retrieves them by meaning, not just exact words.

Project Integration via Manifest

Any project can be onboarded through an intake wizard. Its manifest declares skills, content, and resource caps; the orchestrator loads it at boot, injects its context into the system prompt, and exposes its skills over HTTP. The first integrated project is Palawan Creator — a complete Instagram automation system the assistant can drive by voice.

Zero-Cost Operating Model

Every provider in the chain offers a genuinely free tier with no credit card required:

• Google Gemini — flagship models plus free embeddings, 1M-token context

• Groq — Llama 70B and Whisper STT at the fastest inference speeds available

• Cerebras and SambaNova — additional Llama-class capacity on specialized hardware

• OpenRouter — free access to DeepSeek-R1 and Qwen3-235B class models

• Hugging Face — embeddings and LLM fallback

• GitHub API — free repository access for portfolio intelligence

One provider's free tier is a toy. Eight free tiers behind an intelligent router is a production assistant.

Deployment

• One docker-compose.yml runs identically on a laptop, Hostinger VPS, or Oracle's Always Free ARM tier — no special cases

• Three containers: Node.js orchestrator, Next.js web UI (standalone build for minimal image), and Redis — all with health checks

• Live in production at spaienoids.com behind Nginx with Let's Encrypt SSL

• Builds on both ARM and x86 — the Oracle free tier can serve as automatic failover

• Redis optional; projects mounted read-only; secrets injected at deploy time, never committed

M.AI 0.1 is what a personal assistant looks like

when you own the infrastructure, the memory, and the rules — and pay nothing for any of it.