عنوان

Hardware Architecture Design for Regular Convolutional Neural Networks Targeting Resource-Constrained Devices with an Automated Framework

پدید آورنده

Hailesellasie, Muluken T.

موضوع

Computer engineering,Electrical engineering

رده

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number

TL51381

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

Hardware Architecture Design for Regular Convolutional Neural Networks Targeting Resource-Constrained Devices with an Automated Framework

General Material Designation

[Thesis]

First Statement of Responsibility

Hailesellasie, Muluken T.

Subsequent Statement of Responsibility

Hasan, Syed Rafay

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.

Tennessee Technological University

Date of Publication, Distribution, etc.

2019

GENERAL NOTES

Text of Note

158 p.

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree

Ph.D.

Body granting the degree

Tennessee Technological University

Text preceding or following the note

2019

SUMMARY OR ABSTRACT

Text of Note

The popularity of deep learning has radically increased in the past few years due to its promising results. Convolutional Neural Network (CNN) is one of the most widely used deep learning algorithms in various computer vision applications. While the performance of CNN is impressive, its deployment in the current embedded technology is a challenge since CNN models are computation-intensive and memory-intensive. Hence, there is a growing need for hardware-based solutions in these embedded technologies that can make the dream of real-time computer vision in resource-constrained devices a reality. Due to their reconfigurability, high-performance and low-power features, embedded systems with Field Programmable Gate Arrays (FPGAs) are becoming a hardware platform choice for many deep learning applications. In this work, we explore various hardware architectures and design strategies to improve computation time and minimize the required computing resources. To alleviate the computation intensiveness of CNN models, first, we propose an efficient convolutional layer architecture with improved computation time per convolution. The proposed architecture finds a trade-off between latency and resource consumption through a technique of distributing the input data into a number of memory blocks. By distributing the input data into parallel memory blocks, where each memory block can be read simultaneously, clock cycle reduction is achieved. Subsequently, we propose an architecture that performs parallel computation of feature maps using a custom designed data flow. The strategy proposed obtained substantial computation speedup compared to the state-of-the-art for the same CNN model. On the other hand, while there is a need for flexible architectures that can be used for various models, the existing architectures are tailored or optimized to a particular CNN architecture. To address this limitation, we propose a novel and highly flexible hardware architecture that can process most regular CNN variants and achieves better resource utilization. We proposed processing cores implemented with multipliers and without multipliers. A fixed-point and power-of-2 quantization schemes are also developed to significantly reduce the on-chip memory space and the logic needed in the targeted device. With substantial on-chip memory reduction and an increase in performance and power efficiency, our results demonstrate that the proposed architecture can be very expedient for resource-constrained devices. To enhance the usability of our proposed architecture for deep learning practitioners and to improve the scalability of the proposed design, a framework that auto-generates a CNN processor in the form of a synthesized hardware intellectual property (IP) is proposed. The proposed framework optimizes the hardware IP based on the model workload and the target device specifications. A memory traffic optimization algorithm that results in higher performance and on-chip fitting optimization that results in higher resource utilization efficiency are employed. Our results demonstrate that the proposed framework is effective in reducing the design time and optimizing the performance and the resource consumption of the hardware IP.

UNCONTROLLED SUBJECT TERMS

Subject Term

Computer engineering

Subject Term

Electrical engineering

PERSONAL NAME - PRIMARY RESPONSIBILITY

Hailesellasie, Muluken T.

PERSONAL NAME - SECONDARY RESPONSIBILITY

Hasan, Syed Rafay

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

Tennessee Technological University

ELECTRONIC LOCATION AND ACCESS

Electronic name

[Thesis]

276903

عنوان Hardware Architecture Design for Regular Convolutional Neural Networks Targeting Resource-Constrained Devices with an Automated Framework

پدید آورنده Hailesellasie, Muluken T.

موضوع Computer engineering,Electrical engineering

رده

کتابخانه Center and Library of Islamic Studies in European Languages

محل استقرار استان: Qom ـ شهر: Qom

NATIONAL BIBLIOGRAPHY NUMBER

LANGUAGE OF THE ITEM

TITLE AND STATEMENT OF RESPONSIBILITY

.PUBLICATION, DISTRIBUTION, ETC

GENERAL NOTES

DISSERTATION (THESIS) NOTE

SUMMARY OR ABSTRACT

UNCONTROLLED SUBJECT TERMS

PERSONAL NAME - PRIMARY RESPONSIBILITY

PERSONAL NAME - SECONDARY RESPONSIBILITY

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

ELECTRONIC LOCATION AND ACCESS

عنوان

Hardware Architecture Design for Regular Convolutional Neural Networks Targeting Resource-Constrained Devices with an Automated Framework

پدید آورنده

Hailesellasie, Muluken T.

موضوع

Computer engineering,Electrical engineering

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom