Building an AI Assistant That Lives in Your Browser: A Portfolio Experiment

Imagine an AI assistant that knows your entire creative portfolio, runs completely offline, and never sends your data to anyone else's servers. This isn't a distant future—it's what I've built for my own website, and it's changing how I think about personal AI.

The Problem with Traditional AI Assistants

We're living in the golden age of AI assistants, but there's a catch: they all live in someone else's cloud. Whether it's ChatGPT, Claude, or any other service, your conversations flow through corporate servers, your data gets processed by third parties, and you're always dependent on an internet connection.

For a portfolio website—a deeply personal space where I showcase my creative work—this felt wrong. I wanted something different: an AI that could intelligently discuss my projects, understand my creative process, and help visitors explore my work, all while respecting privacy and working entirely offline.

So I built one.

What I Created

My portfolio now features an AI assistant that can answer questions about my work, suggest relevant projects, and engage in thoughtful conversations about art, technology, and creativity. But here's the remarkable part: it runs entirely in your browser.

No servers. No data collection. No internet required after the initial load.

AI-generated image showing a futuristic browser interface with an AI assistant overlay on a portfolio website

The User Experience

When you visit my portfolio and navigate to the chat interface, here's what happens:

One-time setup: On your first visit, the AI models download directly to your browser (around 1-2GB total)
Lightning-fast subsequent visits: Once cached, the assistant loads instantly—faster than most websites
Complete offline capability: After initial setup, everything works without an internet connection
Persistent conversations: Your chat history persists as you navigate around the site
Global availability: The assistant's status is visible from any page, and you can return to conversations anytime

The Technical Architecture

Building a browser-native AI assistant required solving several complex challenges. Here's how I did it:

RAG (Retrieval-Augmented Generation) System

The heart of the assistant is a sophisticated RAG implementation that processes all my portfolio content:

Content Processing: Every blog post, project description, and piece of creative work gets converted into searchable vector embeddings
Intelligent Chunking: Using LangChain's advanced text splitting, content is broken down intelligently while preserving context
Hybrid Search: The system combines semantic similarity with category-specific boosting to find the most relevant information

WebLLM Integration

The real magic happens with WebLLM, which brings large language models directly to the browser:

Embedding Model: snowflake-arctic-embed-m-q0f32-MLC-b4 for semantic search
Language Model: gemma-2-2b-it-q4f32_1-MLC-1k for response generation
Hardware Acceleration: Automatically leverages GPU acceleration via WebGPU when available
Cross-Platform Performance: Optimized for various devices and browsers
Optimized Performance: Smart token management and context window optimization

WebGPU Acceleration

The assistant requires WebGPU for AI inference - this isn't just for performance optimization, it's essential for functionality. WebLLM cannot run without WebGPU support, so ensuring your browser has it enabled is critical:

Desktop Browsers:

Chrome/Edge: WebGPU enabled by default on recent versions
Safari (macOS): Enable in Settings > Advanced > "Show features for web developers", then Settings > Feature Flags > Enable WebGPU
Firefox: Experimental support, may require flags

Mobile Devices:

Safari (iOS): Settings > Apps > Safari > Advanced > Feature Flags > Enable WebGPU (iOS 18+)
Chrome (Android): Generally enabled by default on recent versions

Important: If WebGPU is not available in your browser, the AI assistant will not function. The system will detect WebGPU availability and guide you through enabling it if needed.

Technical diagram showing the flow from user query → content retrieval → AI response

Multi-Context React Architecture

To ensure excellent performance and user experience, I built a sophisticated state management system:

// Simplified architecture overview
PortfolioAssistantProvider
├── AIAssistantProvider (manages WebLLM engine)
└── ChatHistoryProvider (manages conversations)

This architecture provides:

Global state persistence across page navigation
Performance optimization through intelligent memoization
Separation of concerns for maintainable code
Progressive loading with clear user feedback

The Numbers: Performance That Surprised Me

After extensive testing across different devices and network conditions, the results were impressive:

Hardware Performance

NVIDIA RTX Desktop: Models load in 30-45 seconds, responses in 2-5 seconds (with WebGPU)
Apple M1 MacBook: Excellent efficiency, 45-60 second initial load (requires WebGPU)
Modern smartphones: Usable performance, though initial download takes longer (WebGPU required)
Subsequent visits: 1-3 second load times across all devices
WebGPU Requirement: Essential for all WebLLM functionality - not optional

Browser Compatibility Requirements

WebGPU is mandatory for the AI assistant to function:

Ensures your browser can run WebLLM models at all
Check browser compatibility and enable feature flags if needed
Without WebGPU, the assistant cannot initialize or respond to queries

Browser Recommendations:

Desktop: Chrome or Edge (best WebGPU support out of the box)
macOS: Safari with WebGPU enabled, or Chrome
iOS: Safari with WebGPU feature flag enabled (iOS 14.5+)
Android: Chrome with latest updates

Network Impact

Initial download: 1-2GB (one-time, cached permanently)
Runtime bandwidth: Zero—everything runs offline
Cache efficiency: 99% hit rate for returning visitors

User Experience Metrics

Time to first interaction: Around 60 seconds (first visit), around 3 seconds (returning)
Response quality: High relevance with source citations
Conversation persistence: 100% across page navigation
Error rate: Less than 1% thanks to robust context management

Performance visualization with flowing geometric patterns representing data flow and processing speeds

What Makes This Special

Complete Privacy

This isn't just marketing speak—the assistant literally cannot send your data anywhere:

No network requests during operation
Local-only processing for all AI inference
Browser storage for conversation history
Your device, your data with full control

Offline-First Design

After the initial model download, the assistant works completely offline:

No internet dependency for core functionality
Airplane mode compatible once initialized
Reliable performance regardless of network conditions
Future-proof against service outages or API changes

Portfolio-Aware Intelligence

Unlike generic AI assistants, this one deeply understands my work:

Project-specific knowledge about my creative process
Technical expertise in the tools and technologies I use
Creative context about my artistic approach and influences
Interactive exploration of interconnected projects and themes

Abstract visualization of interconnected nodes and pathways representing user engagement patterns

Challenges and Limitations

Building this system wasn't without obstacles:

Technical Challenges

Model Size vs. Performance: Finding the right balance between capability and download size required extensive testing. The 1024-token context window of Gemma-2-2b requires careful prompt engineering.

Context Window Management: With limited tokens available, I had to build sophisticated truncation and prioritization systems to ensure relevant information reaches the model.

Cross-Platform Compatibility: Different browsers and devices handle WebLLM differently, requiring robust fallback mechanisms. WebGPU support varies across platforms, with some requiring manual feature flag activation.

Performance Optimization: Ensuring optimal performance across devices with varying WebGPU support levels. The system gracefully degrades from GPU acceleration to CPU processing when needed.

Current Limitations

Potential for Hallucinations: Like all AI systems, the assistant can occasionally generate plausible-sounding but incorrect information. I've built in safeguards and clear disclaimers.

Knowledge Cutoff: The assistant only knows about content in my portfolio—it can't access real-time information or recent updates not yet processed.

Hardware Requirements: Requires WebGPU support in the browser - this is mandatory, not optional. The AI assistant cannot function without WebGPU enabled.

Browser Compatibility & Setup

To get the best performance from the AI assistant, ensure your browser supports WebGPU acceleration:

Quick Setup Guide:

Chrome/Edge (Desktop & Android): Usually works out of the box with recent versions
Safari (macOS):
- Go to Settings > Advanced > Check "Show features for web developers"
- Then Settings > Feature Flags > Enable WebGPU
Safari (iOS):
- iOS 18+: Settings > Apps > Safari > Advanced > Feature Flags > Enable WebGPU
- iOS 17 and below: Settings > Safari > Advanced > Feature Flags > Enable WebGPU
Firefox: Limited experimental support, may require manual configuration

The assistant will automatically detect and use the best available acceleration method, but enabling WebGPU can improve performance by up to 10x on compatible hardware.

The Future of Personal AI

This experiment has convinced me that browser-native AI represents a fundamental shift in how we'll interact with personalized information:

Privacy by Default

When AI runs locally, privacy isn't a policy promise—it's a technical reality. Your data never leaves your device because it doesn't need to.

Personalization Without Surveillance

The assistant learns about my work through explicit content processing, not behavioral tracking or data mining.

Resilient and Independent

Local AI doesn't depend on corporate APIs, subscription models, or network connectivity. Once you have it, you own it.

Conceptual design showing distributed geometric patterns representing decentralized AI networks

Looking Ahead

This is just the beginning. I'm already working on several enhancements:

Visual Understanding

Adding image processing capabilities so the assistant can discuss my visual artwork and sculptures in detail.

Real-Time Updates

Building a system to automatically incorporate new projects and blog posts without manual intervention.

Expanded Context

Exploring larger models and more sophisticated context management to enable longer, more nuanced conversations.

Open Source Components

Planning to release key components as open-source tools to help others build similar systems.

Building Your Own

Interested in creating something similar? Here are the key technologies and patterns that made this possible:

Essential Technologies

WebLLM: Browser-native LLM inference
LangChain: RAG pipeline and text processing
Vector Embeddings: Semantic search capabilities
React Context: State management and persistence

Key Design Patterns

Progressive Enhancement: Works without JavaScript, enhanced with AI
Graceful Degradation: Useful even when AI features aren't available
Privacy-First Architecture: Local processing with no external dependencies
Performance Optimization: Lazy loading, intelligent caching, and token management

Why This Matters

We're at an inflection point in AI development. The current paradigm—powerful but centralized AI services—offers incredible capabilities at the cost of privacy, independence, and control.

Browser-native AI represents a different path: one where AI enhances your digital life without requiring you to surrender your data or depend on external services. It's AI that serves you, not the other way around.

Abstract composition contrasting centralized structures with distributed organic patterns

For creators, artists, and professionals, this opens up possibilities we're just beginning to explore. Imagine:

Designers with AI assistants trained exclusively on their portfolio and creative process
Writers with AI companions that understand their voice and stylistic preferences
Researchers with AI tools trained on their personal knowledge base and methodologies
Teachers with AI assistants that know their curriculum and teaching style

The Bigger Picture

This portfolio assistant is more than a cool technical demo—it's a proof of concept for a more democratic, privacy-respecting future of AI. A future where sophisticated AI capabilities don't require surrendering your data to tech giants or depending on their continued goodwill.

The technology exists today. The tools are available. The only question is whether we'll choose to build AI systems that serve us as individuals, or continue down the path of centralized AI that treats us as products.

Try It Yourself

Ready to experience the future of personal AI? Visit my portfolio's chat interface and start a conversation. Ask about my projects, my creative process, or the technical details of how this assistant works.

Remember: everything happens locally on your device. Your questions, the AI's responses, and your conversation history never leave your browser.

Go to chat

Elegant call-to-action design with gradient flows and interactive elements suggesting engagement

Technical Note: This system represents ongoing research into privacy-preserving AI. While I've implemented extensive safeguards, the assistant may occasionally generate inaccurate information. For critical project details or collaboration inquiries, please contact me directly.

Performance Note: Initial model download requires a modern browser and stable internet connection. Subsequent uses are much faster and work completely offline.

Want to discuss the technical implementation, explore collaboration opportunities, or share your own experiments with local AI? The assistant is ready to chat, or feel free to reach out directly!

This post is part of an ongoing series about the intersection of art, technology, and privacy. Follow along as I continue exploring how creative technologists can build more human-centered AI systems.