Skip to content
edge-llm
Project · Active

edge-llm

TypeScript monorepo for running LLMs on-device. WebLLM, Transformers.js, hybrid inference, tool calling, and LoRA fine-tuning.

Visit the site Source on GitHub
TypeScriptWebGPUWebLLMTransformers.jsMLXONNXReact

Overview

edge-llm is a TypeScript monorepo for running LLMs as close to the user as possible. It provides a unified API across three runtime tiers (WebLLM via WebGPU, Transformers.js via WASM, and traditional API fallback) with automatic capability detection and hot-swapping between them. The goal: use the fastest available runtime without the app knowing or caring which one is active.

Architecture

The monorepo is split into four packages:

Key Features

Why I Built It

Cloud LLMs are powerful but come with latency, cost, and privacy tradeoffs that don’t make sense for every interaction. A lot of what apps need (form filling, entity extraction, classification, basic tool use) can run locally on modern hardware. I wanted a framework that makes that easy without giving up the ability to fall back to a server when needed.

The fine-tuning piece came from wanting to ship models that are actually good at specific tool schemas rather than relying on generic instruction-following and hoping for the best.


← All projects