عنوان

Taming Evasions in Machine Learning Based Detection Pipelines

پدید آورنده

Kantchelian, Alex

موضوع

رده

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number

TL1n7599wp

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

Taming Evasions in Machine Learning Based Detection Pipelines

General Material Designation

[Thesis]

First Statement of Responsibility

Kantchelian, Alex

Subsequent Statement of Responsibility

Joseph, Anthony DTygar, J. D.

.PUBLICATION, DISTRIBUTION, ETC

Date of Publication, Distribution, etc.

2016

DISSERTATION (THESIS) NOTE

Body granting the degree

Joseph, Anthony DTygar, J. D.

Text preceding or following the note

2016

SUMMARY OR ABSTRACT

Text of Note

This thesis presents and evaluates three mitigation techniques for evasion attacks against machine learning based detection pipelines. Machine learning based detection pipelines provide much of the security in modern computerized system. For instance, these pipelines are responsible for the detection of undesirable content on computing platforms and Internet-based services, such as malicious software and email spam. By its adversarial nature, the security application domain exhibits a permanent arms race between attackers who aim to avoid, or evade, detection and the pipeline's maintainers whose aim is to catch all undesirable content. The first part of this thesis examines a defense technique for the concrete application domain of comment spam on social media. We propose content complexity, a compression-based normalized measure of textual redundancy that is mostly insensitive to the underlying language used and adversarial word spelling variations. We demonstrate on a real dataset of tens of millions of comments that content complexity alone achieves 15 percentage points higher precision than a state-of-the-art detection system. The second part of this thesis takes a quantitative approach to evasion and introduces one machine learning algorithm and one learning framework for building hardened detection pipelines. Both techniques are generic and suitable for a large class of application domains. We propose the convex polytope machine, a non-linear large-scale learning algorithm which aims at finding a large-margin polytope separator and thereby decrease the effectiveness of evasion attacks. We show that as a general purpose machine learning algorithm, the convex polytope machine displays an outstanding trade-off between classification accuracy and computational efficiency. We also demonstrate on a benchmark handwritten digit recognition task that the convex polytope machine is quantitatively as evasion-resistant as a classic neural network. We finally introduce adversarial boosting, a boosting-inspired framework for iteratively building ensemble classifiers that are hardened against evasion attacks. Adversarial boosting operates by repeatedly constructing evasion attacks and adding the corresponding corrective sub-classifiers to the ensemble. We implement this technique for decision tree sub-classifiers by constructing the first exact and approximate automatic evasion algorithms for tree ensembles. For our benchmark task, the adversarially boosted tree ensemble is respectively five times and two times less evasion-susceptible than regular tree ensembles and the convex polytope machine.

PERSONAL NAME - PRIMARY RESPONSIBILITY

Graim, Kiley Schmidt

PERSONAL NAME - SECONDARY RESPONSIBILITY

Kantchelian, Alex

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

UC Berkeley

ELECTRONIC LOCATION AND ACCESS

Electronic name