• Home
  • Advanced Search
  • Directory of Libraries
  • About lib.ir
  • Contact Us
  • History

عنوان
Hardware Architecture Design for Regular Convolutional Neural Networks Targeting Resource-Constrained Devices with an Automated Framework

پدید آورنده
Hailesellasie, Muluken T.

موضوع
Computer engineering,Electrical engineering

رده

کتابخانه
Center and Library of Islamic Studies in European Languages

محل استقرار
استان: Qom ـ شهر: Qom

Center and Library of Islamic Studies in European Languages

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number
TL51381

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc
انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper
Hardware Architecture Design for Regular Convolutional Neural Networks Targeting Resource-Constrained Devices with an Automated Framework
General Material Designation
[Thesis]
First Statement of Responsibility
Hailesellasie, Muluken T.
Subsequent Statement of Responsibility
Hasan, Syed Rafay

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.
Tennessee Technological University
Date of Publication, Distribution, etc.
2019

GENERAL NOTES

Text of Note
158 p.

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree
Ph.D.
Body granting the degree
Tennessee Technological University
Text preceding or following the note
2019

SUMMARY OR ABSTRACT

Text of Note
The popularity of deep learning has radically increased in the past few years due to its promising results. Convolutional Neural Network (CNN) is one of the most widely used deep learning algorithms in various computer vision applications. While the performance of CNN is impressive, its deployment in the current embedded technology is a challenge since CNN models are computation-intensive and memory-intensive. Hence, there is a growing need for hardware-based solutions in these embedded technologies that can make the dream of real-time computer vision in resource-constrained devices a reality. Due to their reconfigurability, high-performance and low-power features, embedded systems with Field Programmable Gate Arrays (FPGAs) are becoming a hardware platform choice for many deep learning applications. In this work, we explore various hardware architectures and design strategies to improve computation time and minimize the required computing resources. To alleviate the computation intensiveness of CNN models, first, we propose an efficient convolutional layer architecture with improved computation time per convolution. The proposed architecture finds a trade-off between latency and resource consumption through a technique of distributing the input data into a number of memory blocks. By distributing the input data into parallel memory blocks, where each memory block can be read simultaneously, clock cycle reduction is achieved. Subsequently, we propose an architecture that performs parallel computation of feature maps using a custom designed data flow. The strategy proposed obtained substantial computation speedup compared to the state-of-the-art for the same CNN model. On the other hand, while there is a need for flexible architectures that can be used for various models, the existing architectures are tailored or optimized to a particular CNN architecture. To address this limitation, we propose a novel and highly flexible hardware architecture that can process most regular CNN variants and achieves better resource utilization. We proposed processing cores implemented with multipliers and without multipliers. A fixed-point and power-of-2 quantization schemes are also developed to significantly reduce the on-chip memory space and the logic needed in the targeted device. With substantial on-chip memory reduction and an increase in performance and power efficiency, our results demonstrate that the proposed architecture can be very expedient for resource-constrained devices. To enhance the usability of our proposed architecture for deep learning practitioners and to improve the scalability of the proposed design, a framework that auto-generates a CNN processor in the form of a synthesized hardware intellectual property (IP) is proposed. The proposed framework optimizes the hardware IP based on the model workload and the target device specifications. A memory traffic optimization algorithm that results in higher performance and on-chip fitting optimization that results in higher resource utilization efficiency are employed. Our results demonstrate that the proposed framework is effective in reducing the design time and optimizing the performance and the resource consumption of the hardware IP.

UNCONTROLLED SUBJECT TERMS

Subject Term
Computer engineering
Subject Term
Electrical engineering

PERSONAL NAME - PRIMARY RESPONSIBILITY

Hailesellasie, Muluken T.

PERSONAL NAME - SECONDARY RESPONSIBILITY

Hasan, Syed Rafay

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

Tennessee Technological University

ELECTRONIC LOCATION AND ACCESS

Electronic name
 مطالعه متن کتاب 

p

[Thesis]
276903

a
Y

Proposal/Bug Report

Warning! Enter The Information Carefully
Send Cancel
This website is managed by Dar Al-Hadith Scientific-Cultural Institute and Computer Research Center of Islamic Sciences (also known as Noor)
Libraries are responsible for the validity of information, and the spiritual rights of information are reserved for them
Best Searcher - The 5th Digital Media Festival