This is a simple POC for a Chrome extension that shows how to use LLM on the Web locally. It leverages Transformer.js which is a State-of-the-art machine learning for the web and let's us run models directly on the browser.
Sentiment Analyses - Analyses the provided text.Summarization - Summarizes the current open tab content.Question Answering - Answers question based on the current open tab content.I assume you know how to load a local extension in Chrome, if not please follow this guide here: Load Local Extension
The main crux here is the manifest file that makes sure we have right permissions and access for using the chrome API. Here are the necessary configs for manifest file, the path here are pointing to the dist folder.
{ "permissions": ["scripting", "activeTab", "tabs", "storage"], "background": { "service_worker": "./service-worker.js", "type": "module" }, "host_permissions": ["https://*/*"], "action": { "default_popup": "./src/components/popup/popup.html", "default_title": "AI Extension" }, "content_security_policy": { "extension_pages": "script-src 'self' 'wasm-unsafe-eval'; object-src 'self';", "sandbox": "sandbox allow-scripts allow-forms allow-popups allow-modals; script-src 'self' 'unsafe-inline' 'unsafe-eval'; child-src 'self';" }, "content_scripts": [ { "matches": ["<all_urls>"], "js": ["./content.js"] } ] }
For building the extensions UI I have used React, Typescript and Tailwind.
And in order to streamline the things I have used Vite to scaffold the project.
I have configured vite to output separate assets as the extension is not a bundled app. See how we are explicitly defining separate entry points.
Here is the modified config
import { defineConfig } from "vite"; import { resolve } from "path"; import react from "@vitejs/plugin-react"; import copy from "rollup-plugin-copy"; export default defineConfig({ plugins: [react()], build: { rollupOptions: { plugins: [ copy({ targets: [{ src: "./manifest.json", dest: "dist/" }], }), ], input: { popup: resolve(**dirname, "./src/components/popup/popup.html"), "service-worker": resolve(**dirname, "./src/worker/service-worker.js"), content: resolve(\_\_dirname, "./src/content.ts"), }, output: { dir: "dist", entryFileNames: "[name].js", }, }, });
The UI is simple which let's user selects a Model, to load and than use it.
The loaded model stays in the User's cache and the user can interact with that even in offline mode.
We will be using @xenova/transformers package to do the heavy lifting of loading and calling the model in Web.
We can use two ways to load and interact with the models.
Web Worker - This is the best approach if we are building Applications that run in browser, as this takes away the work from main thread.
For loading the model we will create a web-worker.ts file that will house logic to load LLM in browser.
Here is a simple example:
import { pipeline, env, PipelineType } from "@xenova/transformers"; // Skip initial check for local models, since we are not loading any local models. env.allowLocalModels = false; // Due to a bug in onnxruntime-web, we must disable multithreading for now. // See https://github.com/microsoft/onnxruntime/issues/14445 for more information. env.backends.onnx.wasm.numThreads = 1; type PipelineModel = ReturnType<typeof pipeline>; class PipelineSingleton { // define the task we want to use static task = "text-classification" as PipelineType; // define the model for that task static model = "Xenova/distilbert-base-uncased-finetuned-sst-2-english"; static instance: null | PipelineModel = null; // this will create a single instance of the loaded model, and we don't need to laod it again static async getInstance(progress_callback: Function | undefined) { if (this.instance === null) { this.instance = pipeline(this.task, this.model, { // this is the callback function to track loading progress progress_callback, }); } return this.instance; } } // Listen for messages from the main thread self.addEventListener("message", async (event) => { console.log("Web worker received", event); // Retrieve the classification pipeline. When called for the first time, // this will load the pipeline and save it for future use. const classifier = await PipelineSingleton.getInstance(() => { // We also add a progress callback to the pipeline so that we can // track model loading. // console.log(x); // self.postMessage(x); }); // eslint-disable-next-line @typescript-eslint/ban-ts-comment // @ts-ignore const output = await classifier(event.data.text); // Send the output back to the main thread self.postMessage({ status: "complete", output: output, }); return true; });
Now to consume it we just create a instance of web worker pointing to this file.
const App: FC<{}> = () => { const worker = useRef<Worker | undefined>(undefined); useEffect(() => { // create instnce of the worker worker.current = new Worker( new URL("../../worker/web-worker.ts", import.meta.url), { name: "worker", } ); // here we post a message to the web worker worker.current.postMessage({ text: "I am happy" }); // here we listem for any messages coming from web worker worker.current.onmessage = function (event) { console.log("Recieved from Web worker", event.data.output); alert( `Hello from WEB WORKER ${event.data.status} -- ${event.data.output[0].label}` ); }; }, []); return <div>Tets</div>; };
Service Worker - Is another way to offload task from main thread and do any heavy processing needed. We will be using this since its gels well with the Extension flow and is simpler to consume. To create and consume a simple service worker in Extension we need to do the following:
Create a service-worker.js file, that houses logic to load and run llm.
import { pipeline, env, PipelineType } from "@xenova/transformers"; import { AVAILABLE_MODELS } from "../utils/data"; import { getInnerHTMLFromPage } from "../utils/utils"; if ( typeof ServiceWorkerGlobalScope !== "undefined" && self instanceof ServiceWorkerGlobalScope ) { // Skip initial check for local models, since we are not loading any local models. env.allowLocalModels = false; // Due to a bug in onnxruntime-web, we must disable multithreading for now. // See https://github.com/microsoft/onnxruntime/issues/14445 for more information. env.backends.onnx.wasm.numThreads = 1; class PipelineSingleton { static instance = { "text-classification": null, "question-answering": null, summarization: null, }; static async getInstance(task, progress_callback) { if (this.instance[task] === null) { console.log("create new Instance", task); const item = AVAILABLE_MODELS.find((el) => el.task == task); const model = item.model; this.instance[task] = pipeline(task, model, { progress_callback, }); } return this.instance[task]; } } // Listen for messages from the UI, process it, and send the result back. chrome.runtime.onMessage.addListener( (message, sender, sendResponse) => { // Run model prediction asynchronously (async function () { // Run model prediction let result = await model(message.query); console.log("Service worker result", result); // Send response back t sendResponse(result); })(); // return true to indicate we will send a response asynchronously // see https://stackoverflow.com/a/46628145 for more information return true; } ); }
Now to consume it in React code we can just call:
chrome.runtime.sendMessage("What a dull day", (response: any) => { console.log("Resonse from Service worker", response); alert(`Hello from Service WORKER ${response}`); });
The last and most crucial part is tell our Extension to load this service worker by adding the background property to the manifest file. The path here is for the dist folder.
"background": { "service_worker": "./service-worker.js", "type": "module" },
npm run installnom run buildmanifest.josn file to the dist folder.dist folder and load it as local extension.

