Logo
Blog

Building an AI Assistant That Lives in Your Browser: A Portfolio Experiment

Building an AI Assistant That Lives in Your Browser: A Portfolio Experiment

Building an AI Assistant That Lives in Your Browser: A Portfolio Experiment

How I created a privacy-first, browser-native AI that can answer questions about my creative work—and what I learned about the future of personal AI assistants.

Jul 10, 2025

Imagine an AI assistant that knows your entire creative portfolio, runs completely offline, and never sends your data to anyone else's servers. This isn't a distant future—it's what I've built for my own website, and it's changing how I think about personal AI.

The Problem with Traditional AI Assistants

We're living in the golden age of AI assistants, but there's a catch: they all live in someone else's cloud. Whether it's ChatGPT, Claude, or any other service, your conversations flow through corporate servers, your data gets processed by third parties, and you're always dependent on an internet connection.

For a portfolio website—a deeply personal space where I showcase my creative work—this felt wrong. I wanted something different: an AI that could intelligently discuss my projects, understand my creative process, and help visitors explore my work, all while respecting privacy and working entirely offline.

So I built one.

What I Created

My portfolio now features an AI assistant that can answer questions about my work, suggest relevant projects, and engage in thoughtful conversations about art, technology, and creativity. But here's the remarkable part: it runs entirely in your browser.

No servers. No data collection. No internet required after the initial load.

AI-generated image showing a futuristic browser interface with an AI assistant overlay on a portfolio website

The User Experience

When you visit my portfolio and navigate to the chat interface, here's what happens:

  • One-time setup: On your first visit, the AI models download directly to your browser (around 1-2GB total)
  • Lightning-fast subsequent visits: Once cached, the assistant loads instantly—faster than most websites
  • Complete offline capability: After initial setup, everything works without an internet connection
  • Persistent conversations: Your chat history persists as you navigate around the site
  • Global availability: The assistant's status is visible from any page, and you can return to conversations anytime

The Technical Architecture

Building a browser-native AI assistant required solving several complex challenges. Here's how I did it:

RAG (Retrieval-Augmented Generation) System

The heart of the assistant is a sophisticated RAG implementation that processes all my portfolio content:

  • Content Processing: Every blog post, project description, and piece of creative work gets converted into searchable vector embeddings
  • Intelligent Chunking: Using LangChain's advanced text splitting, content is broken down intelligently while preserving context
  • Hybrid Search: The system combines semantic similarity with category-specific boosting to find the most relevant information

WebLLM Integration

The real magic happens with WebLLM, which brings large language models directly to the browser:

  • Embedding Model: snowflake-arctic-embed-m-q0f32-MLC-b4 for semantic search
  • Language Model: gemma-2-2b-it-q4f32_1-MLC-1k for response generation
  • Hardware Acceleration: Automatically leverages GPU acceleration via WebGPU when available
  • Cross-Platform Performance: Optimized for various devices and browsers
  • Optimized Performance: Smart token management and context window optimization

WebGPU Acceleration

The assistant requires WebGPU for AI inference - this isn't just for performance optimization, it's essential for functionality. WebLLM cannot run without WebGPU support, so ensuring your browser has it enabled is critical:

Desktop Browsers:

  • Chrome/Edge: WebGPU enabled by default on recent versions
  • Safari (macOS): Enable in Settings > Advanced > "Show features for web developers", then Settings > Feature Flags > Enable WebGPU
  • Firefox: Experimental support, may require flags

Mobile Devices:

  • Safari (iOS): Settings > Apps > Safari > Advanced > Feature Flags > Enable WebGPU (iOS 18+)
  • Chrome (Android): Generally enabled by default on recent versions

Important: If WebGPU is not available in your browser, the AI assistant will not function. The system will detect WebGPU availability and guide you through enabling it if needed.

Technical diagram showing the flow from user query → content retrieval → AI response

Multi-Context React Architecture

To ensure excellent performance and user experience, I built a sophisticated state management system:

// Simplified architecture overview
PortfolioAssistantProvider
├── AIAssistantProvider (manages WebLLM engine)
└── ChatHistoryProvider (manages conversations)

This architecture provides:

  • Global state persistence across page navigation
  • Performance optimization through intelligent memoization
  • Separation of concerns for maintainable code
  • Progressive loading with clear user feedback

The Numbers: Performance That Surprised Me

After extensive testing across different devices and network conditions, the results were impressive:

Hardware Performance

  • NVIDIA RTX Desktop: Models load in 30-45 seconds, responses in 2-5 seconds (with WebGPU)
  • Apple M1 MacBook: Excellent efficiency, 45-60 second initial load (requires WebGPU)
  • Modern smartphones: Usable performance, though initial download takes longer (WebGPU required)
  • Subsequent visits: 1-3 second load times across all devices
  • WebGPU Requirement: Essential for all WebLLM functionality - not optional

Browser Compatibility Requirements

WebGPU is mandatory for the AI assistant to function:

  • Ensures your browser can run WebLLM models at all
  • Check browser compatibility and enable feature flags if needed
  • Without WebGPU, the assistant cannot initialize or respond to queries

Browser Recommendations:

  • Desktop: Chrome or Edge (best WebGPU support out of the box)
  • macOS: Safari with WebGPU enabled, or Chrome
  • iOS: Safari with WebGPU feature flag enabled (iOS 14.5+)
  • Android: Chrome with latest updates

Network Impact

  • Initial download: 1-2GB (one-time, cached permanently)
  • Runtime bandwidth: Zero—everything runs offline
  • Cache efficiency: 99% hit rate for returning visitors

User Experience Metrics

  • Time to first interaction: Around 60 seconds (first visit), around 3 seconds (returning)
  • Response quality: High relevance with source citations
  • Conversation persistence: 100% across page navigation
  • Error rate: Less than 1% thanks to robust context management

Performance visualization with flowing geometric patterns representing data flow and processing speeds

What Makes This Special

Complete Privacy

This isn't just marketing speak—the assistant literally cannot send your data anywhere:

  • No network requests during operation
  • Local-only processing for all AI inference
  • Browser storage for conversation history
  • Your device, your data with full control

Offline-First Design

After the initial model download, the assistant works completely offline:

  • No internet dependency for core functionality
  • Airplane mode compatible once initialized
  • Reliable performance regardless of network conditions
  • Future-proof against service outages or API changes

Portfolio-Aware Intelligence

Unlike generic AI assistants, this one deeply understands my work:

  • Project-specific knowledge about my creative process
  • Technical expertise in the tools and technologies I use
  • Creative context about my artistic approach and influences
  • Interactive exploration of interconnected projects and themes

Abstract visualization of interconnected nodes and pathways representing user engagement patterns

Challenges and Limitations

Building this system wasn't without obstacles:

Technical Challenges

Model Size vs. Performance: Finding the right balance between capability and download size required extensive testing. The 1024-token context window of Gemma-2-2b requires careful prompt engineering.

Context Window Management: With limited tokens available, I had to build sophisticated truncation and prioritization systems to ensure relevant information reaches the model.

Cross-Platform Compatibility: Different browsers and devices handle WebLLM differently, requiring robust fallback mechanisms. WebGPU support varies across platforms, with some requiring manual feature flag activation.

Performance Optimization: Ensuring optimal performance across devices with varying WebGPU support levels. The system gracefully degrades from GPU acceleration to CPU processing when needed.

Current Limitations

Potential for Hallucinations: Like all AI systems, the assistant can occasionally generate plausible-sounding but incorrect information. I've built in safeguards and clear disclaimers.

Knowledge Cutoff: The assistant only knows about content in my portfolio—it can't access real-time information or recent updates not yet processed.

Hardware Requirements: Requires WebGPU support in the browser - this is mandatory, not optional. The AI assistant cannot function without WebGPU enabled.

Browser Compatibility & Setup

To get the best performance from the AI assistant, ensure your browser supports WebGPU acceleration:

Quick Setup Guide:

  1. Chrome/Edge (Desktop & Android): Usually works out of the box with recent versions
  2. Safari (macOS):
    • Go to Settings > Advanced > Check "Show features for web developers"
    • Then Settings > Feature Flags > Enable WebGPU
  3. Safari (iOS):
    • iOS 18+: Settings > Apps > Safari > Advanced > Feature Flags > Enable WebGPU
    • iOS 17 and below: Settings > Safari > Advanced > Feature Flags > Enable WebGPU
  4. Firefox: Limited experimental support, may require manual configuration

The assistant will automatically detect and use the best available acceleration method, but enabling WebGPU can improve performance by up to 10x on compatible hardware.

The Future of Personal AI

This experiment has convinced me that browser-native AI represents a fundamental shift in how we'll interact with personalized information:

Privacy by Default

When AI runs locally, privacy isn't a policy promise—it's a technical reality. Your data never leaves your device because it doesn't need to.

Personalization Without Surveillance

The assistant learns about my work through explicit content processing, not behavioral tracking or data mining.

Resilient and Independent

Local AI doesn't depend on corporate APIs, subscription models, or network connectivity. Once you have it, you own it.

Conceptual design showing distributed geometric patterns representing decentralized AI networks

Looking Ahead

This is just the beginning. I'm already working on several enhancements:

Visual Understanding

Adding image processing capabilities so the assistant can discuss my visual artwork and sculptures in detail.

Real-Time Updates

Building a system to automatically incorporate new projects and blog posts without manual intervention.

Expanded Context

Exploring larger models and more sophisticated context management to enable longer, more nuanced conversations.

Open Source Components

Planning to release key components as open-source tools to help others build similar systems.

Building Your Own

Interested in creating something similar? Here are the key technologies and patterns that made this possible:

Essential Technologies

  • WebLLM: Browser-native LLM inference
  • LangChain: RAG pipeline and text processing
  • Vector Embeddings: Semantic search capabilities
  • React Context: State management and persistence

Key Design Patterns

  • Progressive Enhancement: Works without JavaScript, enhanced with AI
  • Graceful Degradation: Useful even when AI features aren't available
  • Privacy-First Architecture: Local processing with no external dependencies
  • Performance Optimization: Lazy loading, intelligent caching, and token management

Why This Matters

We're at an inflection point in AI development. The current paradigm—powerful but centralized AI services—offers incredible capabilities at the cost of privacy, independence, and control.

Browser-native AI represents a different path: one where AI enhances your digital life without requiring you to surrender your data or depend on external services. It's AI that serves you, not the other way around.

Abstract composition contrasting centralized structures with distributed organic patterns

For creators, artists, and professionals, this opens up possibilities we're just beginning to explore. Imagine:

  • Designers with AI assistants trained exclusively on their portfolio and creative process
  • Writers with AI companions that understand their voice and stylistic preferences
  • Researchers with AI tools trained on their personal knowledge base and methodologies
  • Teachers with AI assistants that know their curriculum and teaching style

The Bigger Picture

This portfolio assistant is more than a cool technical demo—it's a proof of concept for a more democratic, privacy-respecting future of AI. A future where sophisticated AI capabilities don't require surrendering your data to tech giants or depending on their continued goodwill.

The technology exists today. The tools are available. The only question is whether we'll choose to build AI systems that serve us as individuals, or continue down the path of centralized AI that treats us as products.

Try It Yourself

Ready to experience the future of personal AI? Visit my portfolio's chat interface and start a conversation. Ask about my projects, my creative process, or the technical details of how this assistant works.

Remember: everything happens locally on your device. Your questions, the AI's responses, and your conversation history never leave your browser.

Go to chat

Elegant call-to-action design with gradient flows and interactive elements suggesting engagement


Technical Note: This system represents ongoing research into privacy-preserving AI. While I've implemented extensive safeguards, the assistant may occasionally generate inaccurate information. For critical project details or collaboration inquiries, please contact me directly.

Performance Note: Initial model download requires a modern browser and stable internet connection. Subsequent uses are much faster and work completely offline.

Want to discuss the technical implementation, explore collaboration opportunities, or share your own experiments with local AI? The assistant is ready to chat, or feel free to reach out directly!


This post is part of an ongoing series about the intersection of art, technology, and privacy. Follow along as I continue exploring how creative technologists can build more human-centered AI systems.