Field Notes & Write-Ups — Prasiddha Thapaliya

// TL;DR NIJI Chat is a privacy-first, fully offline AI chat application built with Flutter and powered by Google's Gemma model running entirely on-device. No API keys, no cloud calls, no data leaving the user's phone. Available on iOS and Android, NIJI demonstrates that powerful generative AI doesn't require an internet connection — or trust in a third party.

View NIJI Chat on the App Store → View NIJI Chat on Google Play →

What Is NIJI?

NIJI (日本語で "rainbow") is a cross-platform chat application where every message is processed locally on the device. There is no server, no subscription, and no network request made when you hit send. The model runs in-process — on your CPU or NPU — and the response is generated entirely within the app sandbox.

The idea originated from a simple frustration: every AI assistant available demanded either a cloud account, a paid API key, or both. For users who handle sensitive conversations — medical, legal, personal — that model is a non-starter. NIJI was built to prove the alternative is not only possible, but polished.

100%

On-Device

0

Cloud Calls

2

Platforms

The Problem with Cloud AI

Most AI chat applications send your prompts to a remote server — typically OpenAI, Anthropic, or Google. This means every message you type is:

Transmitted over the network (interception risk)
Stored on a third-party server (retention policy risk)
Potentially used for model training (privacy policy risk)
Subject to outages, rate limits, and paid usage caps

For enterprise users, healthcare workers, or anyone handling confidential data, cloud AI is simply not an option. The privacy tradeoff is too large.

Architecture: Flutter + Gemma

NIJI is built with Flutter for the UI and cross-platform targeting, and Google's Gemma as the underlying language model. Gemma's smaller variants (2B parameters) are specifically designed for on-device inference — optimized for mobile CPU and NPU execution with quantization support.

// Frontend

Flutter (Dart) — cross-platform UI
Custom chat interface components
Streaming token output UI
Conversation history management

// AI Runtime

Google Gemma 2B (INT4 quantized)
MediaPipe LLM Inference API
On-device model loading
Native CPU/NPU acceleration

// iOS Build

Xcode — iOS packaging & signing
TestFlight — beta distribution
App Store Connect — production
Core ML acceleration support

// Android Build

Android Studio — APK/AAB builds
Play Console — internal testing
GPU/NPU delegate config
ProGuard & R8 optimization

The key integration point is MediaPipe's LLM Inference API, which handles the heavy lifting of model loading, tokenization, and inference scheduling. Flutter communicates with the native runtime via platform channels, keeping the Dart layer focused on UI and conversation logic while the native layer manages the model lifecycle.

Cross-Platform Deployment

One of Flutter's core promises is "write once, run anywhere" — but on-device AI adds real complexity to that equation. Each platform has its own inference backend, hardware acceleration model, and performance constraints.

iOS Deployment

On iOS, the model runs via Core ML and the Apple Neural Engine (ANE) on devices with A12 Bionic or newer. The quantized Gemma 2B model fits within the memory budget for most modern iPhones and processes tokens at a comfortable rate for real-time chat. The app was distributed through TestFlight for beta testing before submission to the App Store.

Android Deployment

Android inference is handled through MediaPipe's GPU delegate on capable devices, falling back to CPU inference on older hardware. The APK packaging required careful ProGuard configuration to ensure the native inference libraries were not stripped during optimization. Performance varies significantly across the Android device landscape — a known challenge with on-device AI.

Privacy Benefits

Zero network dependency: The app functions identically with no internet connection — on a plane, in a secure facility, or in airplane mode.
No telemetry or logging: No analytics SDKs, no crash reporting that transmits data, no session tokens sent to external services.
Air-gap capable: Once the model is loaded, the app can run on a device that has never connected to the internet — suitable for high-security environments.
Conversation isolation: All history is stored in local app storage, encrypted by the OS sandbox. Deleting the app removes all data completely.
No account required: No sign-up, no email, no identity attached to usage.

Performance Considerations

On-device LLMs are not without trade-offs. The Gemma 2B model, while compact by LLM standards, still requires:

~1.5 GB of RAM for model weights in INT4 quantized form
3–6 seconds for initial model load on mid-range hardware
~15–30 tokens/sec generation speed on modern iPhones with ANE
Thermal throttling during extended sessions on thinner devices

These constraints are real but manageable. For the conversational use case NIJI targets, the response latency is acceptable — and it is a latency that comes with complete privacy rather than exposing data to a cloud provider.

The Future of Local AI

NIJI is a proof of concept for what I believe is the inevitable direction of AI applications: models that live on the device, not in a data center. As hardware continues to improve — Apple's Neural Engine, Qualcomm's Hexagon NPU, and Google's Tensor chips — the performance gap between cloud and on-device inference will continue to narrow.

The roadmap for NIJI includes:

Multi-modal support — image and document analysis without cloud upload
Model selection UI — swap between Gemma, Llama, and Phi variants
Encrypted local conversation export for personal knowledge management
Enterprise edition with MDM-compatible deployment for secure organizations

The broader vision: every sensitive conversation deserves to stay private. AI that respects that principle isn't a niche product — it's the future of responsible AI deployment.

"Privacy is not a feature — it's the foundation. NIJI was built on the premise that powerful AI and complete data sovereignty are not mutually exclusive. You should never have to choose between a smart assistant and keeping your conversations private." — Prasiddha Thapaliya, NorthBridge IT Solutions

// TL;DR As an IT technician working primarily on a MacBook Air M2, I hit a wall: ARM architecture meant no native x86 Windows — limiting my diagnostic and virtualization work. Rather than spin up a cloud VM, I built a custom PC from scratch to gain real hands-on experience. Phase 1 covered the hardware build and Windows 11 installation. Phase 2 evolved it into a virtual enterprise network lab running pfSense, Windows 11, and Linux Mint VMs with policy-based firewall rules.

PHASE 1 Hardware Build & OS Deployment ✓ Complete

The Problem

My MacBook Air M2 is a great daily driver — but its ARM-based architecture meant that installing Windows required ARM-specific ISO images, cutting me off from the full x86 Windows environment I needed for proper IT diagnostics. Cloud VMs like Azure were an option, but I wanted the real thing: hands-on hardware experience from scratch.

MacBook Air M2 — the primary machine that started it all

Hardware Specifications

Every component was chosen deliberately — balancing budget, performance headroom for virtualization, and the absence of a discrete GPU (which the Ryzen 5600G's integrated Radeon handles cleanly).

Component	Spec
Processor	AMD Ryzen 5 5600G — 6C/12T, 4.4 GHz, integrated Radeon graphics
Motherboard	MSI B550M PRO-VDH WiFi ProSeries (PCIe 4.0, M.2, Wi-Fi 6)
RAM	TEAMGROUP T-Create Expert 32GB DDR4 3200MHz CL16
Storage	Timetec 512GB SSD M.2 SATA III 3D NAND
Power Supply	Thermaltake Smart BX1 650W 80+ Bronze
Case	Thermaltake Tempered Glass Micro ATX Gaming Case
Thermal Paste	Corsair TM30 Performance Thermal Paste
Accessories	Amazon Basics 6-Outlet Surge Protector (790 Joule)

The 5600G was a strategic pick — strong multi-core performance for nested virtualization with no discrete GPU needed, keeping costs down. 32GB of RAM ensures real headroom when running multiple VMs simultaneously under VMware or VirtualBox.

Parts collection — all components laid out before assembly

BIOS Configuration

Two critical configurations were made during the initial BIOS setup:

A-XMP Profile enabled — unlocks the RAM's maximum supported frequency of 3200MHz. Without this, DDR4 defaults to JEDEC speeds (typically 2133MHz), leaving performance on the table.
TPM 2.0 activated — required for Windows 11's Secure Boot and system integrity features. The B550M is TPM 2.0 compliant out of the box once enabled in firmware.

BIOS configuration — A-XMP enabled, TPM 2.0 active

OS Installation

A clean installation of Windows 11 Home (version 24H2) was performed from a standard x86 ISO — exactly the environment that wasn't accessible on the M2. No bloatware, no upgrades-in-place. Fresh slate, optimized for diagnostics and future virtualization work.

Windows 11 Home (24H2) — clean installation complete

Phase 1 — Status Update

Hardware assembly & troubleshooting — CPU, RAM, PSU, storage, thermals all validated
System-level diagnostics — CPU, memory, and storage health confirmed post-build
BIOS/UEFI configuration — A-XMP and TPM 2.0 configured for peak performance & Win11 compatibility
Windows 11 Home (24H2) — clean install with post-install performance optimization
Virtualization setup — VMware / Hyper-V with Ubuntu & Kali (moved to Phase 2)
Security config & benchmarking — BitLocker, Secure Boot, Defender (moved to Phase 2)

PHASE 2 Virtual Enterprise Network Lab ✓ Complete

Phase 2 transformed OSmith from a workstation into a simulated enterprise network — firewall enforcement, selective device-level access control, DNS filtering, and cross-platform policy testing, all running in VMware Workstation.

Virtual Machine Setup

Three VMs were deployed on VMware Workstation using a Custom (VMnet11) host-only network, simulating a real internal LAN where all devices communicate through a managed gateway.

🛡️

pfSense 2.7.2

Virtual firewall & DHCP server

🪟

Windows 11 VM

Standard client workstation

🐧

Linux Mint VM

Admin / test machine

All VMs sit on VMnet11. pfSense manages IP allocation via 192.168.1.0/24 and handles all traffic routing between the virtual LAN and the WAN (NAT).

VMware Workstation — three VMs on VMnet11 custom network

pfSense Configuration

pfSense was accessed via its WebConfigurator at https://192.168.1.1 from a client VM. Key interface and service configuration:

WAN: Connected to NAT for external internet access
LAN: Connected to VMnet11 at 192.168.1.1/24
DHCP Server: Enabled for LAN — auto-assigns IPs to client VMs
DNS Resolver: Configured with host overrides for local hostname resolution

pfSense WebConfigurator — LAN/WAN interface overview

Blocking ChatGPT Across All Devices

The first rule test was a global block: prevent all LAN clients from reaching chat.openai.com by adding a LAN firewall rule targeting the FQDN alias.

Result: both Windows 11 and Linux Mint received "Site can't be reached" and "Server not found" errors — DNS resolution was intercepted at the gateway before any HTTPS request was made.

Both VMs blocked from ChatGPT — firewall rule in effect

Conditional Access: Windows Blocked, Linux Allowed

The real test was demonstrating per-device policy enforcement — a core enterprise firewall concept. The goal: Linux Mint can reach ChatGPT, Windows 11 cannot.

Implementation

Created a Firewall Alias Ubuntu_Allow pointing to Linux Mint's IP address
Created a Firewall Alias Block_ChatGPTs targeting the FQDN chat.openai.com
Rule 1 (Allow): Ubuntu_Allow → Block_ChatGPTs — placed at top of rule list
Rule 2 (Block): Entire LAN subnet → Block_ChatGPTs — catches everything else

Rule ordering is critical in pfSense — the first matching rule wins. The allow rule must sit above the block rule, otherwise the allow is never evaluated.

pfSense LAN firewall rules — allow rule above block rule

Result

❌ Windows 11 — DNS resolution fails, site unreachable
✅ Linux Mint — accesses and logs into ChatGPT normally

Connectivity Verification

CLI tools were used throughout to verify routing behavior and rule effectiveness: ping, ip a, nslookup, dig, and traceroute — confirming that DNS was being resolved (or blocked) at the pfSense gateway, not the client OS.

CLI connectivity tests — ping & nslookup confirming rule behavior

Phase 2 — Achievements

Configured virtual LAN with DHCP, DNS, and firewall segmentation using pfSense
Enforced domain-based filtering (FQDN aliases) at gateway level
Demonstrated policy-based access per device using ordered firewall rules
Proved packet flow and DNS behavior using CLI tools and browser validation

Tech Stack & Skills Gained

// Virtualization

VMware Workstation Pro
Custom VMnet host-only networking
Linux Mint & Windows 11 VMs
Multi-OS cross-platform testing

// Firewall & Networking

pfSense 2.7.2 LAN/WAN config
DHCP server & DNS resolver
FQDN-based firewall aliases
Ordered rule enforcement

// Network Security

Policy-based access control
DNS blackhole filtering
Rule testing & verification
Client isolation via gateway

// Diagnostics

ping, ip a, traceroute
dig, nslookup
Browser-based DNS validation
Lab documentation & reporting

"This project demonstrated my ability to build, secure, and manage a cross-platform virtual network using pfSense, Linux, and Windows in a structured, test-driven lab environment. It reflects real-world enterprise firewall concepts and hands-on troubleshooting skills." — Prasiddha Thapaliya, April 2025

// TL;DR XAPI is a privacy-first AI API gateway I built from scratch. It sits between you and major AI providers, scanning every message for 33+ types of sensitive data before forwarding — so the AI never sees your real information. Bring your own API keys, own your data, and never pay a cent for the platform itself.

Why I Built This

AI agents and coding assistants are part of everyday workflows now — personal projects and production work. But the data you send to these APIs is rarely sanitized. Passports, dates of birth, phone numbers, credit card numbers, API keys — they all go through unredacted. I needed something that would scrub sensitive personal data before it left my machine, every single time, without me having to think about it.

What started as a personal tool is now something I'm opening up for others to use. The filter focuses on personally identifiable information — names, emails, phone numbers, credit cards, SSNs, passport numbers. It does not catch infrastructure details like database hostnames or connection strings. If you're pasting those, that's on you. But for the PII that leaks accidentally — the stuff most people don't even realize they're sharing — XAPI catches it.

The idea was simple on paper: a gateway. You send your message through XAPI, it scans for 33+ types of sensitive data, redacts it all, and only then forwards the cleaned version to OpenAI or whichever provider you use. The AI never touches your real data.

33+

PII Types Detected

0

Data Logged

API-First

By Design

I had a few non-negotiables when building this:

Works with ANY AI provider — no vendor lock-in
You bring your own API keys — we never resell, never mark up
Scans and redacts automatically, every single request, no exceptions
Free for personal use — 1,000 requests a month, no credit card required
Accessible as a direct API endpoint for developers and AI agents

I called it XAPI. And I called the philosophy behind it "Sovereign Governance" — the idea that you keep full control over what information leaves your environment, even when using cloud-based AI services.

Breaking Down the Architecture

This project ended up being the most full-stack thing I've ever built. Two major halves — frontend and backend — plus a machine learning microservice in the middle that handles the actual privacy filtering.

// Frontend

React (JavaScript) + Vite
Landing page, API key management
Dark/light themes
Admin console with health monitoring
Mobile app via Capacitor (iOS + Android)

// Backend API

Node.js / Express — cluster mode
Passwordless OTP login (email-based)
JWT auth + automatic monthly hit reset
AES-256 encrypted API key vault
22+ REST endpoints

// PII Filter Pipeline

Python / FastAPI microservice
ONNX ML model — 33+ PII types
Circuit breaker (fail-open after 3 failures)
In-flight request cap
Admin-togglable strict mode

// Infrastructure

PostgreSQL — user data & key vault
Redis — OTP rate limiting & sessions
Nginx — reverse proxy + load balancing
Docker Compose — 6-service deploy
CI/CD — single-command deploy

The frontend is pure static files served via CDN. The backend is a containerized API server with no public-facing ports — everything goes through Nginx. The PII filter runs as a separate microservice so it can scale independently. And all six services — frontend, backend, PII filter, PostgreSQL, Redis, and Nginx — live in a single Docker Compose stack.

The Hardest Problem: ML in Production

Let me be honest: getting a machine learning model to run at request-time without making the user wait was brutal.

The ONNX model that detects PII is fast — but not instant. When you're processing chat messages, every millisecond counts. I had to solve a few things:

Keep-alive connection pooling: Instead of opening a new TCP connection to the PII filter for every request, the backend maintains persistent HTTP connections. This alone cut latency significantly — no TCP handshake overhead per request.
In-flight request cap: If 100 users all hit send at the same time, the PII filter would get overwhelmed. I added a concurrency cap so load spikes don't crash the service.
Circuit breaker: Three consecutive failures? The breaker trips. For the next 30 seconds, requests pass through unredacted — keeping the service alive even when the ML layer is struggling. If privacy is non-negotiable, admins can toggle strict mode: reject ALL requests instead of passing through.

The result: PII scans are fast enough to be imperceptible, and the system gracefully degrades instead of crashing. That's the kind of resilience engineering I genuinely enjoy.

Passwordless Auth & Encrypted Keys

I didn't want users to create passwords. Passwords leak. They get reused. They're a liability. So I built a passwordless email OTP system — you enter your email, get a one-time code, and you're in. No passwords to create, remember, or leak.

The OTP system has its own set of protections: rate-limited requests, anti-brute-force on the verify endpoint (too many wrong attempts burns the code), and automatic expiry. It's simple on the surface, but the abuse prevention took real thought.

For API keys — the ones users store for OpenAI, DeepSeek, and Anthropic — I went all-in on encryption. Every key is encrypted with AES-256-GCM at rest using a server-side key that's never exposed to clients. The key is decrypted only at the moment a proxy request is made, in memory, for the duration of that single request. It's never logged, never stored, and never returned to the frontend. Even if someone dumped the database, they'd get ciphertext.

Admin Tooling

Running a live service with real users means you need visibility. I built admin tooling for real-time user management, daily and monthly active user counts, server health metrics (CPU, memory, uptime, database pool status), PII filter circuit breaker status, and OTP email usage tracking.

There's also a full account deletion workflow — users submit deletion requests, admins approve or decline, and approved deletions trigger automated email notifications and full data anonymization. GDPR-friendly by design.

API Compatibility

XAPI exposes an OpenAI-compatible /v1/chat/completions endpoint. Any tool that speaks OpenAI's API format — Cursor, AI agents, automation scripts — can point at XAPI and get privacy filtering with zero code changes. There's also an Ollama-compatible /api/chat endpoint and streaming via Server-Sent Events. Authentication uses platform API keys for programmatic access.

What Makes This Different

Most AI platforms do one of three things: sell you their own API access (you can't use your own keys), don't filter your data at all (whatever you paste goes straight to them), or charge monthly subscriptions. XAPI does none of that.

You bring your own keys. Your data gets filtered before it leaves. The platform is free for personal use — 1,000 requests per month, automatic monthly reset, no payment infrastructure, no credit cards. It's a privacy tool first, an AI platform second.

Real Stuff People Use It For

Support teams using AI to draft responses to customer emails without exposing names and contact details
Researchers analyzing documents without sending identifiable data to third-party AI services
AI agents and automation scripts that need privacy filtering baked into every API call automatically

What I Learned

This project stretched me across the entire stack in ways I didn't fully anticipate. I went in thinking it was a backend challenge — build an API, proxy some requests, done. It turned into:

Systems design: Microservice boundaries, connection pooling, circuit breakers, fail-open vs fail-closed tradeoffs
Cryptography: AES-256-GCM encryption at rest, key lifecycle management, never logging decrypted values
Authentication architecture: Passwordless OTP flows, JWT session management, brute-force protection, dual auth paths for humans and machines
ML deployment: Running ONNX models in production, managing inference latency, building resilience around an ML service that can fail
DevOps: Docker Compose orchestration, Nginx reverse proxy config, CI/CD pipelines, health checks, environment-aware security
Full-stack: React frontend, Node.js backend, Python microservice, PostgreSQL, Redis — all talking to each other in production

But the biggest lesson? Building privacy-first software is harder than building regular software — and it should be. Every design decision, every data flow, every log statement gets extra scrutiny. "Does this need to exist? Does this need to be stored? Does this need to be transmitted?" Those questions slow you down. But they also make the final product something you can genuinely stand behind.

Where This Goes Next

XAPI is live, running on cloud infrastructure with real users. But I'm not done. On the roadmap: on-premise deployment for businesses with strict data residency requirements, expanded PII detection categories with multi-language support, self-hosted AI model support (Ollama, vLLM) with the same privacy filter, usage analytics dashboards for individual users, and team accounts with shared encrypted key vaults.

The broader vision hasn't changed: every AI conversation deserves to be private. The tools to make that happen shouldn't require enterprise contracts or complex setup. XAPI is my answer to that — a privacy shield that anyone can use, with any AI provider, for free.

"XAPI started as a personal tool and grew into a full platform that genuinely solves a privacy problem for anyone using AI. Building it meant working across the entire stack — React, Node.js, PostgreSQL, Redis, Docker, Nginx, and a machine learning pipeline in production. Privacy isn't a feature — it's the foundation." — Prasiddha Thapaliya, NorthBridge IT Solutions

TechnicalWrite-Ups

Technical
Write-Ups