No server needed: running AI in the browser

Recent advancements in browser technologies and optimizations in AI models make it possible to run AI directly in the browser, without the need for heavy server-side computations.

Introducing the built-in web AI APIs

The new Built-in AI web APIs allow developers to run AI models natively in the browser, without relying on external libraries. These APIs provide a simple and consistent interface for working with AI models, making it easy to integrate AI capabilities into web applications. There are currently seven APIs available, of which the first six can already be experimented with in Chrome:

All these APIs provide a high-level and similar interface, making it easy to switch between them.

Checking availability and downloading models

The general flow for using these APIs is as follows:

Check if the API is supported in the browser with an if ("APIName" in window) check.
Check the availability of the API and the model by testing the return value of await APIName.availability();, which can be one of the following:
- unavailable: the API is not supported in the browser or the model is not available.
- available: the API is supported and the model is available and ready to use.
- downloadable: the API is supported but the model is not available and can be downloaded.
- downloading: the API is supported but the model is currently being downloaded.
If the API is available, downloadable, or downloading, you can create an object to interact with the API by using the result of await APIName.create(options). You can optionally pass a callback function in the options to be notified of the download progress. This is particularly useful for large models that are not yet downloaded.
Finally, you can ask the obtained object to perform the desired operation by calling the appropriate method with the required parameters. Many APIs support a stream option to get the result as a stream of events, which is useful for large inputs or outputs.

Basic example

Let’s take a look at an example of using the Translator API to translate a text from English to French:

// Check if the Translator API is available in the browser
if (!("Translator" in window)) {
    alert("Translator API is not available.");
    return;
}
// Set up the options for the Translator API
const options = {
    sourceLanguage: "en",
    targetLanguage: "fr"
};
// Check the availability of the Translator API and the model. You could also pass a callback function to be notified of the download progress.
const availability = await Translator.availability(options);
if (availability === "unavailable") {
    alert("Translator API is not available or the model is not available.");
    return;
}
// Create a Translator object with the desired options
const translator = await Translator.create(options);
// Ask the Translator object to translate a text
const result = await translator.translate("Hello, world!");
console.log(result);
// The output should be: "Bonjour, monde !"

There may be some differences between the APIs, but the flow is generally the same.

A more practical example: seamless international chat experience

The vastness of use cases for these models has yet to be explored, and we will likely see new use cases in the coming years if more models are added. Here’s an example that demonstrates how we can already combine them for richer experiences on the web.

We start from a WebRTC-based text chat application. The goal is to use the Language Detection API and Translator API to translate messages on the fly between users speaking different languages in an instant messaging application.

On every message sent and received, we run the Language Detector model:

async function detectLanguage(text, logElement) {
    // display the detected language above the message
    const languageDetectorElm = document.createElement("p")
    languageDetectorElm.className="language-detection"    
    logElement.insertBefore(languageDetectorElm, logElement.firstChild);

    if ('LanguageDetector' in self === false) {
        languageDetectorElm.textContent = 'Language detection not supported';
        return null
    }
     
    const availability = await LanguageDetector.availability();
    if(availability === "unavailable") {
        languageDetectorElm.textContent = 'Language detection not available';
        return null
    }

    const detector = await LanguageDetector.create();
    const results = await detector.detect(text);
    if(results.length === 0){
        languageDetectorElm.textContent = 'No language detected';
        return null
    }

    const result = results[0];
    let lang = result.detectedLanguage		
    if(Intl && Intl.DisplayNames){
        const languageNames = new Intl.DisplayNames(['en'], { type: 'language' });
        lang = languageNames.of(result.detectedLanguage);
    }

    languageDetectorElm.textContent = `(Detected: ${lang} with ${Math.round(result.confidence * 100)}% confidence)`;
    return result.detectedLanguage
}

For this demo, we indicate to the user which language has been detected and with how much confidence (returned as a percentage). This indication is inserted as a new element before the message content.

The language returned is a language code, like en for English. This isn’t very useful for someone who has a very different word for English in their own language, or uses a completely different alphabet. Fortunately, we can use another recent web API to get the language name in the user’s language: the Intl.DisplayNames API

Now that we can detect languages, we can store the last language used by the user and their peer. To go further, we should store the language used by each person in the conversation and propose manual selection of the language in case there’s ambiguity. Knowing the input and output languages is enough to proceed to the second step: translating received messages.

let translator = null;
async function updateTranslator(){
    const translatorStatusElm = document.getElementById("translator-status");
    
    const availability = await Translator.availability({
        sourceLanguage: lastLanguageReceived,
        targetLanguage: currentLanguageSpoken
    });

    if(availability === "unavailable") {
        translatorStatusElm.textContent = `Translator status: Translation from ${lastLanguageReceived} to ${currentLanguageSpoken} is unavailable`;
        translator = null;
        return null
    }

    translatorStatusElm.textContent = `Translator status: Preparing translation from ${lastLanguageReceived} to ${currentLanguageSpoken}...`;
    translator = await Translator.create({
        sourceLanguage: lastLanguageReceived,
        targetLanguage: currentLanguageSpoken
    });
    translatorStatusElm.textContent = `Translator status: Translation from ${lastLanguageReceived} to ${currentLanguageSpoken} is ready`;
    return translator
}

function translateMessage(text, logElement){
    const languageDetectorElm = logElement.querySelector('.language-detection');
    if(translator === null){
        languageDetectorElm.textContent += ' - Translation not available';
        return
    }

    translator.translate(text).then(translated => {
        const p = logElement.querySelector('p:not(.language-detection)');
        if (p && translated && translated.length > 0) {
            p.textContent = `Peer (translated): ${translated}`;
        }
    }).catch(error => {
        // Optionally handle translation errors
        languageDetectorElm.textContent += ` - Translation failed: ${error}`;
    });
}

The original message content is replaced by the translation on the fly. The time it takes to run the model can be very short, so the user likely won’t have time to read the original message before it’s translated. To further improve this basic demo, we should address content jumps (also known as Cumulative Layout Shift ) and allow users to revert to the original, non-translated message.

The final result:

Demo seamless translation chat experience

For more examples, you can check the official documentation , my demo website , and its source code .

What’s the benefit of local AI models?

There are several valid reasons why you might want to run AI models on the device instead of on a remote server.

The most obvious reason is to ensure functionality while offline, or when the remote server is unavailable. Modern web technologies like Cache API and Service Workers have enabled fully functional offline web applications for many years. The use cases for offline web apps are limited, as many web services rely on Internet connectivity; but in places where connection is uncertain, this can have real value. However, you need to understand the limitations of an offline-first approach. The local AI models are described as built-in, but most still need to be downloaded on the fly. For example, you might have a Summarizer model ready for English, but suddenly need to use it for another language, requiring a new model download. So, when working on an offline-ready application, the scope of what is possible while offline needs to be clearly defined and anticipated, with preloading of resources like models.

Another consideration is data privacy. If an application involves working with sensitive data, it may be more acceptable—and sometimes required—that this data is processed locally. Imagine, for example, a webmail service that summarizes long emails, or a dating website that translates messages written in various languages on the fly.

Then there is the cost. Hosting an AI service is not cheap, and the costs scale with the number of users. You don’t have to worry about that when the feature is offloaded to the user’s device. On the other hand, your users need the proper browser and hardware to use that feature, so it might be more limiting.

Finally, we can predict that the future of AI will be some kind of hybrid approach. Mainstream PC and smartphone manufacturers are already starting to integrate AI-dedicated equipment into consumer hardware, so it’s only a matter of time before we have ways to make use of that power in web applications. However, there will still be technical constraints for a long time that require the use of remote AI models. It seems likely that we will have to use both approaches together in the years to come.

Conclusion

This post explored how to run AI directly in the browser with the built-in web AI APIs. Its biggest advantages are privacy and cost-effectiveness, and we believe we should explore more in this direction. In the meantime, server-side AI remains the most reliable option for running AI workloads, especially for complex tasks that require significant computational resources. However, if built-in web AI continues to improve, it will become a viable alternative for more complex use cases. So, let’s keep an eye on this exciting development.