[RubyML | RubyDataScience | RubyInterop]

Awesome NLP with Ruby Awesome Awesome RubyNLP

Useful resources for text processing in Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines.

This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

Our main goal is to promote Ruby as a tool for NLP related tasks. Your help, suggestions and contributions are welcome! We kindly ask you to study the Contribution section. Follow us on Twitter and please spread the word using the #RubyNLP hash tag!

NLP Pipeline Subtasks

An NLP Pipeline starts with a plain text.

Pipeline Generation

Multipurpose Engines

On-line APIs

Language Identification

Language Identification is one of the first crucial steps in every NLP Pipeline.

Segmentation

Tools for Tokenization, Word and Sentence Boundary Detection and Disambiguation.

Lexical Processing

Stemming

Stemming is the term used in information retrieval to describe the process for reducing wordforms to some base representation. Stemming should be distinguished from Lemmatization since stems are not necessarily have linguistic motivation.

Lemmatization

Lemmatization is considered a process of finding a base form of a word. Lemmas are often collected in dictionaries.

Lexical Statistics: Counting Types and Tokens

Filtering Stop Words

Phrasal Level Processing

Syntactic Processing

Constituency Parsing

Semantic Analysis

Pragmatical Analysis

High Level Tasks

Spelling and Error Correction

Text Alignment

Machine Translation

Dialog Systems

Sentiment Analysis

Numbers, Dates, and Time Parsing

Named Entity Recognition

Text-to-Speech-to-Text

Linguistic Resources

Machine Learning Libraries

Machine Learning Algorithms in pure Ruby or written in other programming languages with appropriate bindings for Ruby.

For more up-to-date list please look at the Awesome ML with Ruby list.

Data Visualization

Please refer to the Data Visualization section on the Data Science with Ruby list.

Optical Character Recognition

Text Extraction

Full Text Search, Information Retrieval, Indexing

Language Aware String Manipulation

Libraries for language aware string manipulation, i.e. search, pattern matching, case conversion, transcoding, regular expressions which need information about the underlying language.

Articles, Posts, Talks, and Presentations

Projects and Code Examples

Books

Community

Needs your Help!

All projects in this section are really important for the community but need more attention. Please if you have spare time and dedication spend some hours on the code here.

Contributing

We are very glad to see you in this section and highly appreciate any help!

But we also take care about the quality of this list. If you want to contribute please:

Some of the open tasks for contributors are listed in the todo file. You may want to start there.

License

Creative Commons Zero 1.0 Awesome NLP with Ruby by Andrei Beliankou and Contributors.

To the extent possible under law, the person who associated CC0 with Awesome NLP with Ruby has waived all copyright and related or neighboring rights to Awesome NLP with Ruby.

You should have received a copy of the CC0 legalcode along with this work. If not, see https://creativecommons.org/publicdomain/zero/1.0/.