عنوان

Power, Performance and Scalability for Big Data Query Languages:

پدید آورنده

Wang, Jin

موضوع

رده

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number

TL8m50s7jz

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

Power, Performance and Scalability for Big Data Query Languages:

General Material Designation

[Thesis]

First Statement of Responsibility

Wang, Jin

Title Proper by Another Author

The Machine Learning Challenge

Subsequent Statement of Responsibility

Zaniolo, Carlo

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.

UCLA

Date of Publication, Distribution, etc.

2020

DISSERTATION (THESIS) NOTE

Body granting the degree

UCLA

Text preceding or following the note

2020

SUMMARY OR ABSTRACT

Text of Note

In the Big Data era, there is a resurgence of interest in using Datalog to express data analysis applications that require recursive computations. However, the use of non-monotonic aggregates in recursion raises difficult semantic issues. Recent theoretical advances like monotonic aggregation and Pre-Mapability (PreM) provide the formal semantics for the usage of aggregates in recursive Datalog rules enabling the expression of a wide spectrum of advanced analytical tasks, such as graph analysis, data mining, machine learning and stream processing. In this dissertation, we explore opportunities and issues created by these advances, including the expressiveness of Datalog in advanced applications and their optimization to achieve superior performance and scalability. Firstly, we find that Datalog serves as an efficient query language that simplifies the writing of machine learning applications and provides a unified environment for their development and deployment on multiple platforms. Following this route, we propose a declarative machine learning framework of tested effectiveness on top of Apache Spark. We present an in-depth theoretical analysis that shows how key ML algorithms can be expressed and efficiently implemented by recursive Datalog programs that use aggregates in recursion, whereby achieving both formal and efficient operational semantics. We also present the compilation and optimization techniques we developed to support the complex recursive queries required by ML applications in distributed share-nothing architectures. Next we share some theoretical results to show that programs computing any aggregates on sets of facts of predictable cardinality are equivalent to stratified programs where the pre-computation of cardinality of the set is followed by a stratum where recursive rules only use monotonic constructs. Finally, we investigate how to improve the parallelism of semi-naive evaluation of recursive Datalog programs on shared-memory multi-core machines, and discuss the prototype system we have developed and the high performance levels it delivers.

PERSONAL NAME - PRIMARY RESPONSIBILITY

Wang, Jin

PERSONAL NAME - SECONDARY RESPONSIBILITY

Zaniolo, Carlo

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

UCLA

ELECTRONIC LOCATION AND ACCESS

Electronic name

[Thesis]

276903

عنوان Power, Performance and Scalability for Big Data Query Languages:

پدید آورنده Wang, Jin

موضوع

رده

کتابخانه Center and Library of Islamic Studies in European Languages

محل استقرار استان: Qom ـ شهر: Qom

NATIONAL BIBLIOGRAPHY NUMBER

LANGUAGE OF THE ITEM

TITLE AND STATEMENT OF RESPONSIBILITY

.PUBLICATION, DISTRIBUTION, ETC

DISSERTATION (THESIS) NOTE

SUMMARY OR ABSTRACT

PERSONAL NAME - PRIMARY RESPONSIBILITY

PERSONAL NAME - SECONDARY RESPONSIBILITY

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

ELECTRONIC LOCATION AND ACCESS

عنوان

Power, Performance and Scalability for Big Data Query Languages:

پدید آورنده

Wang, Jin

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom