• Home
  • Advanced Search
  • Directory of Libraries
  • About lib.ir
  • Contact Us
  • History

عنوان
Convex Approaches to Text Summarization

پدید آورنده
Gawalt, Brian

موضوع

رده

کتابخانه
Center and Library of Islamic Studies in European Languages

محل استقرار
استان: Qom ـ شهر: Qom

Center and Library of Islamic Studies in European Languages

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number
TL5px8t6xj

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc
انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper
Convex Approaches to Text Summarization
General Material Designation
[Thesis]
First Statement of Responsibility
Gawalt, Brian
Subsequent Statement of Responsibility
El Ghaoui, Laurent

.PUBLICATION, DISTRIBUTION, ETC

Date of Publication, Distribution, etc.
2012

DISSERTATION (THESIS) NOTE

Body granting the degree
El Ghaoui, Laurent
Text preceding or following the note
2012

SUMMARY OR ABSTRACT

Text of Note
This dissertation presents techniques for the summarization and exploration of text documents. Many approaches taken towards analysis of news media can be analogized to well-defined, well-studied problems from statistical machine learning. The problem of feature selection, for classification and dimensionality reduction tasks, is formulated to help assist with these media analysis tasks. Taking advantage of L1 regularization, convex programs can be used to efficiently solve these feature selection problems efficiently. There is a demonstrated potential to conduct media analysis at a scale commensurate with the growing volume of data available to news consumers. There is first a presentation of an example text mining over a vector space model. Given two news articles on a related theme, a series of additional articles are pulled from a large pool of candidates to help link these two input items. The novel algorithm used is based on finding the documents whose vector representations are nearest the convex combinations of the inputs. Comparisons to competing algorithms show performance matching a state-of-the-art method, at a lower computational complexity. Design of a relational database for typical text mining tasks is discussed. The architecture trades off the organizational and data quality advantages of normalization versus the performance boosts from replicating entity attributes across tables. The vector space model of text is implemented explicitly as a three-column table. The predictive framework, connecting news analysis tasks to feature selection and classification problems, is then explicitly explored. The validity of this analogy is tested with a particular task: given a query term and a corpus of news articles, provide a short list of word tokens which distinguish how this word appears within the corpus. Example summary lists were produced by five algorithms, and presented to volunteer readers. Evidence suggests that an implementation of L1-regularized logistic regression model, trained over the documents with labels indicating the presence or absence of the query word, selected word-features best summarizing the query. To contend with tasks that do not lend themselves this a predictive framework, a sparse variant of latent semantic indexing is investigated. [cont.]

PERSONAL NAME - PRIMARY RESPONSIBILITY

Wyman, Dana Elizabeth

PERSONAL NAME - SECONDARY RESPONSIBILITY

Gawalt, Brian

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

UC Berkeley

ELECTRONIC LOCATION AND ACCESS

Electronic name
 مطالعه متن کتاب 

p

[Thesis]
276903

a
Y

Proposal/Bug Report

Warning! Enter The Information Carefully
Send Cancel
This website is managed by Dar Al-Hadith Scientific-Cultural Institute and Computer Research Center of Islamic Sciences (also known as Noor)
Libraries are responsible for the validity of information, and the spiritual rights of information are reserved for them
Best Searcher - The 5th Digital Media Festival