DNN partitioning for inference throughput acceleration at the edge

Thomas Feltin; Leo Marche; Juan-Antonio Cordero-Fuertes; Frank Brockners; Thomas Heide Clausen

doi:10.1109/ACCESS.2023.3244497

Article Dans Une Revue IEEE Access Année : 2023

DNN partitioning for inference throughput acceleration at the edge

(1) , (2) , (1) , (2) , (1)

1
2

Thomas Feltin

Fonction : Auteur
PersonId : 1232081
ORCID : 0000-0002-8708-7422

École polytechnique

Leo Marche

Fonction : Auteur

Cisco Systems

Juan-Antonio Cordero-Fuertes

Fonction : Auteur
PersonId : 806326
ORCID : 0000-0001-5771-3122
IdRef : 140052453

École polytechnique

Frank Brockners

Fonction : Auteur

Cisco Systems

Thomas Heide Clausen

Fonction : Auteur
PersonId : 176822
IdHAL : thomas-heide-clausen
ORCID : 0000-0002-7400-8887
IdRef : 232884560

École polytechnique

Résumé

Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments.

Domaines

Informatique [cs]

Fichier principal

IEEE Access.pdf (32.41 Mo)

Origine : Fichiers produits par l'(les) auteur(s)
Licence : CC BY - Paternité

Thomas Heide Clausen : Connectez-vous pour contacter le contributeur

https://polytechnique.hal.science/hal-04008199

Soumis le : mardi 28 février 2023-14:24:56

Dernière modification le : samedi 18 mars 2023-03:53:58

Archivage à long terme le : lundi 29 mai 2023-19:09:58

Dates et versions

hal-04008199 , version 1 (28-02-2023)

Licence

Paternité

Identifiants

HAL Id : hal-04008199 , version 1
DOI : 10.1109/ACCESS.2023.3244497

Citer

Thomas Feltin, Leo Marche, Juan-Antonio Cordero-Fuertes, Frank Brockners, Thomas Heide Clausen. DNN partitioning for inference throughput acceleration at the edge. IEEE Access, In press, pp.1-1. ⟨10.1109/ACCESS.2023.3244497⟩. ⟨hal-04008199⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X IP_PARIS

4 Consultations

16 Téléchargements

DNN partitioning for inference throughput acceleration at the edge

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager