عنوان

Efficient Inference using Deep Convolutional Neural Networks on Resource-Constrained Platforms

پدید آورنده

Motamedi, Mohammad

موضوع

Computer engineering,Electrical engineering

رده

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number

TLpq2269078980

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

Efficient Inference using Deep Convolutional Neural Networks on Resource-Constrained Platforms

General Material Designation

[Thesis]

First Statement of Responsibility

Motamedi, Mohammad

Subsequent Statement of Responsibility

Ghiasi, Soheil

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.

University of California, Davis

Date of Publication, Distribution, etc.

2019

PHYSICAL DESCRIPTION

Specific Material Designation and Extent of Item

123

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree

Ph.D.

Body granting the degree

University of California, Davis

Text preceding or following the note

2019

SUMMARY OR ABSTRACT

Text of Note

Deep Convolutional Neural Networks (CNNs) exhibit remarkable performance in many pattern recognition, segmentation, classification, and comprehension tasks that were widely considered open problems for most of the computing history. For example, CNNs are shown to outperform humans in certain visual object recognition tasks. Given the significant potential of CNNs in advancing autonomy and intelligence in systems, the Internet of Things (IoT) research community has witnessed a surge in demand for CNN-enabled data processing, technically referred to as inference, for critical tasks, such as visual, voice and language comprehension. Inference using modern CNNs involves billions of operations on millions of parameters, and thus their deployment requires significant compute, storage, and energy resources. However, such resources are scarce in many resource-constrained IoT applications. Designing an efficient CNN architecture is the first step in alleviating this problem. Use of asymmetric kernels, breadth control techniques, and reduce-expand structures are among the most important approaches that can effectively decrease CNNs parameter budget and their computational intensity. The architectural efficiency can be further improved by eliminating ineffective neurons using pruning algorithms, and quantizing the parameters to decrease the model size. Hardware-driven optimization is the subsequent step in addressing the computational demands of deep neural networks. Mobile System on Chips (SoCs), which usually include a mobile GPU, a DSP, and a number of CPU cores, are great candidates for CNN inference on embedded platforms. Depending on the application, it is also possible to develop customized FPGA-based and ASIC-based accelerators. ASIC-based acceleration drastically outperforms other approaches in terms of both power consumption and execution time. However, using this approach is reasonable only if designing a new chip is economically justifiable for the target application. This dissertation aims to bridge the gap between computational demands of CNNs and computational capabilities of embedded platforms. We contend that one has to strike a judicious balance between functional requirements of a CNN, and its resource requirements, for an IoT application to be able to utilize the CNN. We investigate several concrete formulations of this broad concept, and propose effective approaches for addressing the identified challenges. First, we target platforms that are equipped with reconfigurable fabric, such as Field Programmable Gate Arrays (FPGA), and offer a framework for generation of optimized FPGA-based CNN accelerators. Our solution leverages an analytical approach to characterization and exploration of the accelerator design space through which, it synthesizes an efficient accelerator for a given CNN on a specific FPGA. Second, we investigate the problem of CNN inference on mobile SoCs, propose effective approaches for CNN parallelization targeting such platforms, and explore the underlying tradeoffs. Finally, in the last part of this dissertation, we investigate utilization of an existing optimized CNN model to automatically generate a competitive CNN for an IoT application whose objects of interest are a fraction of categories that the original CNN was designed to classify, such that the resource requirement of inference using the synthesized CNN is proportionally scaled down. We use the term resource scalability to refer to this concept and propose solutions for automated synthesis of context-aware, resource-scalable CNNs that meet the functional requirements of the target IoT application at fraction of the resource requirements of the original CNN.

TOPICAL NAME USED AS SUBJECT

Computer engineering

Electrical engineering

PERSONAL NAME - PRIMARY RESPONSIBILITY

Ghiasi, Soheil

Motamedi, Mohammad

ELECTRONIC LOCATION AND ACCESS

Electronic name