Skip to content
edge-llm

edge-llm

TypeScript monorepo for running LLMs on-device — WebLLM, Transformers.js, hybrid inference, tool calling, and LoRA fine-tuning.

Active Visit Source Code
TypeScript WebGPU WebLLM Transformers.js MLX ONNX React

Overview

edge-llm is a TypeScript monorepo for running LLMs as close to the user as possible. It provides a unified API across three runtime tiers — WebLLM (WebGPU), Transformers.js (WASM), and traditional API fallback — with automatic capability detection and hot-swapping between them. The goal: use the fastest available runtime without the app knowing or caring which one is active.

Architecture

The monorepo is split into four packages:

Key Features

Why I Built It

Cloud LLMs are powerful but come with latency, cost, and privacy tradeoffs that don’t make sense for every interaction. A lot of what apps need — form filling, entity extraction, classification, basic tool use — can run locally on modern hardware. I wanted a framework that makes that easy without giving up the ability to fall back to a server when needed.

The fine-tuning piece came from wanting to ship models that are actually good at specific tool schemas rather than relying on generic instruction-following and hoping for the best.


← All Projects