WebNNUtils/OnnxConverter
WebNNUtils/OnnxConverter is a WebNN compiler for producing pure JavaScript (JS) models from .onnx models, enabling machine learning inference in web browsers using the WebNN API without framework overhead.
Overview
While JS ML frameworks can evaluate models in the browser, they involve shipping a framework and expensive load time preprocessing that impacts latency. During preprocessing, frameworks determine input shapes for operators, partition operators between CPU and accelerated execution, and optimize the model graph.
With OnnxConverter there is no such overhead. At compile time, OnnxConverter takes a static .onnx file and emits JS code that will build an equivalent WebNN graph. The resulting JS code can be used directly in the browser without any additional framework dependencies.
Privacy and Security
🔒 Client-Side Processing Only
All model conversion and JavaScript generation operations run exclusively in your local environment. No model data, weights, or intellectual property is transmitted to or stored on any external servers. This makes the tool safe for:
- Proprietary models
- Sensitive intellectual property
- Enterprise environments with strict data governance
- Any scenario requiring complete data privacy
Generated Output
OnnxConverter generates several files:
- A JavaScript file (
.js
) containing the WebNN graph building function - One or more binary files (
.bin
) containing the model weights - Optional JSON metadata files for debugging
The core output is a JavaScript function with the following signature:
function loadModelGraph(operand_input, weights_buffers, builder) {
// Generated WebNN graph building code
return output_operand; // or {output1: operand1, output2: operand2}
}
Where:
operand_input
- Input operands for the modelweights_buffers
- Array of ArrayBuffers containing model weightsbuilder
- MLGraphBuilder WebNN object
Installation
Install the required Python dependencies:
pip install -U onnx protobuf numpy
Command-Line Interface
Syntax
python OnnxConverter.py MODEL_FILE [OPTIONS]
Required Arguments
Argument | Description |
---|---|
model_file | Path to the input ONNX model file (.onnx ) |
Optional Arguments
Argument | Description |
---|---|
--weights_file | Path to the output weights file (default: weights.bin ) |
--output_file | Path to the output JavaScript file (default: model.js ) |
--max_weights_size | Maximum size of a single weights file in MB (default: 1024) |
Basic Usage
Step 1: Convert the Model
python OnnxConverter.py mobilenetv2-12.onnx --output_file mobilenet/mobilenet.js --weights_file mobilenet/weights.bin
This command generates:
mobilenet/mobilenet.js
- WebNN graph building functionmobilenet/weights0.bin
- Model weights (may be split into multiple files)
Step 2: Post-Processing (Required)
The generated JavaScript requires post-processing to handle code ordering and CPU/GPU partitioning:
# Reorder operations to ensure proper dependency resolution
python ReorderModel.py mobilenet/mobilenet.js
# Partition operations between CPU and GPU execution
python CPUGraphPartitioner.py mobilenet/mobilenet.js
Step 3: Load Weights in Browser
Load the weights file(s) and pass them to your WebNN model:
const weights_file = 'weights0.bin';
let cache = await caches.open("weights");
let weights_response = await cache.match(weights_file);
if (!weights_response) {
await cache.add(weights_file);
weights_response = await cache.match(weights_file);
}
const weights_buffer = await weights_response.arrayBuffer();
// Create WebNN context and builder
const context = await navigator.ml.createContext({'deviceType': 'gpu'});
const builder = new MLGraphBuilder(context);
// Install CPU operations polyfill
InstallCpuOps(builder);
// Load the model graph
const model_output = loadModelGraph(input_operand, [weights_buffer], builder);
Architecture and Processing Pipeline
Shape Inference Handling
OnnxConverter does not perform traditional shape inference. Instead, it generates WebNN graph nodes that handle dynamic shapes at runtime. The WebNN graph builder is augmented with polyfilled methods that can work with JavaScript numbers (not just tensors).
For example, an ONNX graph with shape operations:
// Pseudocode
a = tensor.shape();
b = a * 2;
Generates WebNN nodes that call shape()
and mul()
operations, where mul
is polyfilled to handle JavaScript numbers returned by shape()
.
CPU Operations Polyfill
Before calling loadModelGraph
, install the CPU operations polyfill:
const context = await navigator.ml.createContext({'deviceType': 'gpu'});
const builder = new MLGraphBuilder(context);
InstallCpuOps(builder);
The polyfill (defined in CpuOps.js
) augments MLGraphBuilder
with operators that work on JavaScript numbers, enabling operations like shape manipulation and constant folding.
Processing Pipeline
The complete conversion process involves three stages:
- OnnxConverter.py - Main compiler that emits JS graph building code
- ReorderModel.py - Fixes code generation where input operands are computed after being needed
- CPUGraphPartitioner.py - Annotates operations that need CPU execution with
cpu_
prefix
Multi-File Weight Support
For large models, weights can be automatically split across multiple files:
python OnnxConverter.py large_model.onnx --max_weights_size 512
This creates multiple weight files (weights0.bin
, weights1.bin
, etc.) that must all be loaded:
const weight_files = ['weights0.bin', 'weights1.bin'];
const weights_buffers = [];
for (const file of weight_files) {
const response = await fetch(file);
weights_buffers.push(await response.arrayBuffer());
}
const model_output = loadModelGraph(input_operand, weights_buffers, builder);
Supported ONNX Operations
OnnxConverter supports a comprehensive set of ONNX operations including:
Core Operations
- Convolution: Conv (with groups, padding, strides, dilations)
- Activation: Relu, Sigmoid, Softmax, LeakyRelu
- Pooling: GlobalAveragePool
- Arithmetic: Add, Sub, Mul, Div, Pow, Sqrt
Tensor Operations
- Shape Manipulation: Reshape, Transpose, Unsqueeze, Squeeze
- Data Movement: Concat, Gather, Slice, Expand
- Comparison: Equal, Less, Where
Advanced Operations
- Quantization: DequantizeLinear, DynamicQuantizeLinear, MatMulInteger
- Matrix Operations: MatMul
- Specialized: DepthToSpace, ConstantOfShape, ReduceMean
Software-Implemented Operations
Some operations are implemented in JavaScript rather than native WebNN:
- Slice: Limited implementation for constant slicing
- Squeeze: Dynamic axis squeezing
- Range: Sequence generation
Debugging and Analysis
Code Generation Issues
If the generated code has dependency issues, use the processing pipeline:
python OnnxConverter.py model.onnx --output_file model.js
python ReorderModel.py model.js
python CPUGraphPartitioner.py model.js
Understanding Generated Code
The generated JavaScript contains:
- Constant definitions for model weights
- WebNN operation calls building the computation graph
- Input/output handling for the model
Best Practices
- Always Post-Process: Run ReorderModel.py and CPUGraphPartitioner.py on generated code
- Install Polyfills: Use InstallCpuOps() before loading models
- Handle Multiple Weights: Account for split weight files in large models
- Cache Weights: Use browser caching for weight files to improve loading performance
- Test Locally: Serve files from HTTP server for proper loading behavior
Limitations
- No dynamic shape inference (shapes resolved at runtime)
- Requires post-processing pipeline for complex models
- Limited support for some ONNX operations
- CPU fallback required for certain operation types