Browse free open source Text Processing software and projects below. Use the toggles on the left to filter open source Text Processing software by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • The All-in-One Commerce Platform for Businesses - Shopify Icon
    The All-in-One Commerce Platform for Businesses - Shopify

    Shopify offers plans for anyone that wants to sell products online and build an ecommerce store, small to mid-sized businesses as well as enterprise

    Shopify is a leading all-in-one commerce platform that enables businesses to start, build, and grow their online and physical stores. It offers tools to create customized websites, manage inventory, process payments, and sell across multiple channels including online, in-person, wholesale, and global markets. The platform includes integrated marketing tools, analytics, and customer engagement features to help merchants reach and retain customers. Shopify supports thousands of third-party apps and offers developer-friendly APIs for custom solutions. With world-class checkout technology, Shopify powers over 150 million high-intent shoppers worldwide. Its reliable, scalable infrastructure ensures fast performance and seamless operations at any business size.
    Learn More
  • 1
    fastText

    fastText

    Library for fast text classification and representation

    FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. ext classification is a core problem to many applications, like spam detection, sentiment analysis or smart replies. In this tutorial, we describe how to build a text classifier with the fastText tool. The goal of text classification is to assign documents (such as emails, posts, text messages, product reviews, etc...) to one or multiple categories. Such categories can be review scores, spam v.s. non-spam, or the language in which the document was typed. Nowadays, the dominant approach to build such classifiers is machine learning, that is learning classification rules from examples. In order to build such classifiers, we need labeled data, which consists of documents and their corresponding categories (or tags, or labels).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Stanford CoreNLP

    Stanford CoreNLP

    Stanford CoreNLP, a Java suite of core NLP tools

    CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text, run a series of NLP annotators on the text, and produce a final set of annotations. Pipelines produce CoreDocuments, data objects that contain all of the annotation information, accessible with a simple API, and serializable to a Google Protocol Buffer. CoreNLP generates a variety of linguistic annotations, including parts of speech, named entities, dependency parses, and coreference.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. DocWire SDK is dedicated to streamlining data processing, reducing development time and costs, and harnessing the potential of AI. Its advancements promise a superior experience compared to its predecessor, DocToText.
    Leader badge
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4

    dbacl - digramic Bayesian classifier

    commandline multiclass email and text filter

    dbacl is a general purpose digramic Bayesian text classifier. It can learn text documents you provide, and then compare new input with the learned categories. It can be used for spam filtering, or within your own shell scripts. Sometimes it plays che
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 5
    ArabicDiacritizer

    ArabicDiacritizer

    An automatic restoration of Arabic diacritic marks

    This is a software of Arabic diacritical marks restoration. It is based mainly on deep architectures using deep neural network. The algorithm generates diacritized text with determined end case. The algorithm is described in detail in: Ilyes Rebai, and Yassine BenAyed 'Text-to-speech synthesis system with Arabic diacritic recognition system', Computer Speech & Language, 2015. We appreciate it very much if you can cite our related work. ************** Installation *************** - Extract the archive "ArabicDiacritizer Setup.rar". - Install the application using "Setup.exe". - Put an Arabic text in the Text Box. - Start the diacritization process. If the following problem occured: <Access to the path '..\ArabicDiacritizer v1.0\text.data' is denied> - Access to the path "Program Files\ArabicDiacritizer\ArabicDiacritizer v1.0\", - Right click on "ArabicDiacritizer" - Choose "Run as administrator" For further information, please contact: rebai_ily
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    Voice is a text to speech program with many features. Some of the features include: Reads Text, Rich Text and Word Documents aloud. Custom greeting. Professional document editor. Clipboard monitoring and processing. Good looking animated character.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    TextMarker
    TextMarker is now developed and hosted at Apache UIMA (http://uima.apache.org/textmarker.html). TextMarker is a UIMA-based tool for information extraction and more. The full featured editor of the rule language and the build process of UIMA descriptors are complemented with components for visualization, explanation, testing and rule learning.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Pylero
    Pylero is an open-source Python-based text generator.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Enterprise and Small Business CRM Solution | Clear C2 C2CRM Icon
    Enterprise and Small Business CRM Solution | Clear C2 C2CRM

    Voted Best CRM System with Top Ranked Customer Support. CRM Management includes Sales, Marketing, Relationship Management, and Help Desk.

    C2CRM consists of four modules that integrate to provide a comprehensive CRM solution: Relationship Management, Sales Automation, Marketing Automation, and Customer Service. Only buy what each user needs.
    Learn More
  • 10
    Machine translation engine based on a dependency grammar and XML interchange format. The Spanish-Basque (es-eu) translation is ready.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    A python module that provides algorithms for advanced search - basically all you need to build a search engine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Auvai is a Java API and Java Swing based application for Text to Speech conversion of Unicode Tamil. Future direction of this API and application is to support Text to Speech conversion for all "Indic" languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Bi-gram applications based on language models produced by SRILM from Chinese Wikipedia corpus, include Chinese word segmenter, word-based (not character-based) Traditional-Simplified Chinese converter and Chinese syllable-to-word converter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    The book index generator generate the back-of-book index for Thai book automatically.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Concrete Voice is a text to speech program. It can read the time, anounce weather, read text file, save text files to audio files, open any text file (supports all text encoding formats) and many more advance stuff!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Consilium Sentence Suggestions Tools

    Consilium Sentence Suggestions Tools

    Consilium – User Defined sentence Suggestion Tool.

    There are many tools available in market which will provide spell correction or grammer correction while making documents, but very few tools are available which are providing sentence completion according to previously entered text. But this all are providing sentence complition suggestion for sentences which are oftenly or regularly used by all people in same manner. But in reality style of writing changes person to person. While our aim is to provide a sentence suggestion tool which will give suggestion to complete the sentence according previously enterd data by the user. Output or suggestion for same sentence or word will change person to person according to previously entered data by that person. So, it will be very easy to type any document, sms, mail, chatting etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    A Java application for statistical analysis and systematic manipulation of natural language texts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    A simple intelligent editor.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    "Java Artificial Intelligence Markup Language PAD" is a tool that manages ProgramD AI (on local or remote machines) and AIML files with real-time previews and it provides a network support to test AI capabilities over many network protocols.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    When translating becomes a game ! Text to translate can be graphically selected. Several dictionnaries can be sorted according to the context. A large choice of matching strategies is available. The OCR engine is tunable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    The Information Extraction Plugin allows the use of information extraction techniques within RapidMiner. It can be seen as an interface between natural language and IE- or datamining-methods, by extracting interesting information out of documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    SemNotes

    SemNotes

    Semantic Note-taking tool for KDE

    SemNotes is a semantic note taking tool for KDE4, built on top of Nepomuk-KDE. The tool is still under development, but it is already usable, provided that KDE4 is installed and the Nepomuk running.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    TextBlob

    TextBlob

    TextBlob is a Python library for processing textual data

    Simple, Pythonic, text processing, Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both. Supports word inflection (pluralization and singularization) and lemmatization, as well as spelling correction. Add new models or languages through extensions. Also, it comes with a WordNet integration. If you only intend to use TextBlob’s default models (no model overrides), you can pass the lite argument. This downloads only those corpora needed for basic functionality. TextBlob is also available as a conda package.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next