عنوان

Applied text analysis with Python :

پدید آورنده

Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda.

موضوع

Machine learning.,Natural language processing (Computer science),Python (Computer program language),COMPUTERS-- Programming Languages-- Python.,Machine learning.,Natural language processing (Computer science),Python (Computer program language)

رده

QA76
.
73
.
P98

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom

تماس با کتابخانه : 32910706-025

INTERNATIONAL STANDARD BOOK NUMBER

(Number (ISBN

1491962992

(Number (ISBN

1491963018

(Number (ISBN

1491963042

(Number (ISBN

9781491962992

(Number (ISBN

9781491963012

(Number (ISBN

9781491963043

Erroneous ISBN

9781491963043

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

Applied text analysis with Python :

General Material Designation

[Book]

Other Title Information

enabling language-aware data products with machine learning /

First Statement of Responsibility

Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda.

EDITION STATEMENT

Edition Statement

First edition.

.PUBLICATION, DISTRIBUTION, ETC

Place of Publication, Distribution, etc.

Sebastopol, CA :

Name of Publisher, Distributor, etc.

O'Reilly Media,

Date of Publication, Distribution, etc.

[2018]

Date of Publication, Distribution, etc.

PHYSICAL DESCRIPTION

Specific Material Designation and Extent of Item

1 online resource (xviii, 310 pages) :

Other Physical Details

illustrations

INTERNAL BIBLIOGRAPHIES/INDEXES NOTE

Text of Note

Includes bibliographical references and index.

CONTENTS NOTE

Text of Note

Cover; Copyright; Table of Contents; Preface; Computational Challenges of Natural Language; Linguistic Data: Tokens and Words; Enter Machine Learning; Tools for Text Analysis; What to Expect from This Book; Who This Book Is For; Code Examples and GitHub Repository; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgments; Chapter 1. Language and Computation; The Data Science Paradigm; Language-Aware Data Products; The Data Product Pipeline; Language as Data; A Computational Model of Language; Language Features; Contextual Features.

Text of Note

Corpus TransformationIntermediate Preprocessing and Storage; Reading the Processed Corpus; Conclusion; Chapter 4. Text Vectorization and Transformation Pipelines; Words in Space; Frequency Vectors; One-Hot Encoding; Term Frequency-Inverse Document Frequency; Distributed Representation; The Scikit-Learn API; The BaseEstimator Interface; Extending TransformerMixin; Pipelines; Pipeline Basics; Grid Search for Hyperparameter Optimization; Enriching Feature Extraction with Feature Unions; Conclusion; Chapter 5. Classification for Text Analysis; Text Classification.

Text of Note

Identifying Classification ProblemsClassifier Models; Building a Text Classification Application; Cross-Validation; Model Construction; Model Evaluation; Model Operationalization; Conclusion; Chapter 6. Clustering for Text Similarity; Unsupervised Learning on Text; Clustering by Document Similarity; Distance Metrics; Partitive Clustering; Hierarchical Clustering; Modeling Document Topics; Latent Dirichlet Allocation; Latent Semantic Analysis; Non-Negative Matrix Factorization; Conclusion; Chapter 7. Context-Aware Text Analysis; Grammar-Based Feature Extraction; Context-Free Grammars.

Text of Note

Structural FeaturesConclusion; Chapter 2. Building a Custom Corpus; What Is a Corpus?; Domain-Specific Corpora; The Baleen Ingestion Engine; Corpus Data Management; Corpus Disk Structure; Corpus Readers; Streaming Data Access with NLTK; Reading an HTML Corpus; Reading a Corpus from a Database; Conclusion; Chapter 3. Corpus Preprocessing and Wrangling; Breaking Down Documents; Identifying and Extracting Core Content; Deconstructing Documents into Paragraphs; Segmentation: Breaking Out Sentences; Tokenization: Identifying Individual Tokens; Part-of-Speech Tagging; Intermediate Corpus Analytics.

Text of Note

Syntactic ParsersExtracting Keyphrases; Extracting Entities; n-Gram Feature Extraction; An n-Gram-Aware CorpusReader; Choosing the Right n-Gram Window; Significant Collocations; n-Gram Language Models; Frequency and Conditional Frequency; Estimating Maximum Likelihood; Unknown Words: Back-off and Smoothing; Language Generation; Conclusion; Chapter 8. Text Visualization; Visualizing Feature Space; Visual Feature Analysis; Guided Feature Engineering; Model Diagnostics; Visualizing Clusters; Visualizing Classes; Diagnosing Classification Error; Visual Steering; Silhouette Scores and Elbow Curves.

SUMMARY OR ABSTRACT

Text of Note

From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist's approach to building language-aware products with applied machine learning. You'll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering. By the end of the book, you'll be equipped with practical methods to solve any number of complex real-world problems. Preprocess and vectorize text into high-dimensional feature representations. Perform document classification and topic modeling. Steer the model selection process with visual diagnostics. Extract key phrases, named entities, and graph structures to reason about data in text. Build a dialog framework to enable chatbots and language-driven interaction. Use Spark to scale processing power and neural networks to scale model complexity.--Provided by publisher.

ACQUISITION INFORMATION NOTE

Source for Acquisition/Subscription Address

Safari Books Online

Stock Number

CL0500000981

OTHER EDITION IN ANOTHER MEDIUM

Title

Applied text analysis with Python.

TOPICAL NAME USED AS SUBJECT

Machine learning.

Natural language processing (Computer science)

Python (Computer program language)

COMPUTERS-- Programming Languages-- Python.

Machine learning.

Natural language processing (Computer science)

Python (Computer program language)

(SUBJECT CATEGORY (Provisional

COM-- 051360

DEWEY DECIMAL CLASSIFICATION

Number

005

133

Edition

LIBRARY OF CONGRESS CLASSIFICATION

Class number

QA76

P98

PERSONAL NAME - PRIMARY RESPONSIBILITY

Bengfort, Benjamin,1984-

PERSONAL NAME - ALTERNATIVE RESPONSIBILITY

Bilbro, Rebecca

Ojeda, Tony

ORIGINATING SOURCE

Date of Transaction

20200823033040.0

Cataloguing Rules (Descriptive Conventions))

ELECTRONIC LOCATION AND ACCESS

Electronic name

[Book]

عنوان Applied text analysis with Python :

پدید آورنده Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda.

موضوع Machine learning.,Natural language processing (Computer science),Python (Computer program language),COMPUTERS-- Programming Languages-- Python.,Machine learning.,Natural language processing (Computer science),Python (Computer program language)

رده QA76.73.P98

کتابخانه Center and Library of Islamic Studies in European Languages

محل استقرار استان: Qom ـ شهر: Qom

INTERNATIONAL STANDARD BOOK NUMBER

TITLE AND STATEMENT OF RESPONSIBILITY

EDITION STATEMENT

.PUBLICATION, DISTRIBUTION, ETC

PHYSICAL DESCRIPTION

INTERNAL BIBLIOGRAPHIES/INDEXES NOTE

CONTENTS NOTE

SUMMARY OR ABSTRACT

ACQUISITION INFORMATION NOTE

OTHER EDITION IN ANOTHER MEDIUM

TOPICAL NAME USED AS SUBJECT

(SUBJECT CATEGORY (Provisional

DEWEY DECIMAL CLASSIFICATION

LIBRARY OF CONGRESS CLASSIFICATION

PERSONAL NAME - PRIMARY RESPONSIBILITY

PERSONAL NAME - ALTERNATIVE RESPONSIBILITY

ORIGINATING SOURCE

ELECTRONIC LOCATION AND ACCESS

عنوان

Applied text analysis with Python :

پدید آورنده

Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda.

موضوع

Machine learning.,Natural language processing (Computer science),Python (Computer program language),COMPUTERS-- Programming Languages-- Python.,Machine learning.,Natural language processing (Computer science),Python (Computer program language)

رده

QA76
.
73
.
P98

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom