We propose TESLA Time-Efficient Sparse and Lightweight Adaptation strategy for real-time mobile applications, which skips adaptation for specific batches to increase the inference sample rate. Our method balances model accuracy and inference speed by accumulating domain-informative samples from non-adapted batches and sparsely adapting them.
Jun 3, 2024
Exploring model adaptation methods that enable real-time inference on extremely resource-constrained devices
Apr 27, 2024