Showing 157 open source projects for "spark"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser.
    Download Chrome
  • 1
    Spark NLP

    Spark NLP

    State of the Art Natural Language Processing

    Experience the power of large language models like never before, unleashing the full potential of Natural Language Processing (NLP) with Spark NLP, the open source library that delivers scalable LLMs. The full code base is open under the Apache 2.0 license, including pre-trained models and pipelines. The only NLP library built natively on Apache Spark. The most widely used NLP library in the enterprise. Spark ML provides a set of machine learning applications that can be built using two main...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    Apache Spark is a unified engine for large-scale data processing, offering APIs for batch jobs, streaming, machine learning, and graph computation. It builds on resilient distributed datasets (RDDs) and the newer DataFrame/Dataset abstractions to provide fault-tolerant, in-memory computation across clusters. Spark’s execution engine handles scheduling, shuffles, caching, and data locality so users can focus on transformations rather than infrastructure plumbing. With Spark Streaming...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    SageMaker Spark

    SageMaker Spark

    A Spark library for Amazon SageMaker

    SageMaker Spark is an open-source Spark library for Amazon SageMaker. With SageMaker Spark you construct Spark ML Pipelines using Amazon SageMaker stages. These pipelines interleave native Spark ML stages and stages that interact with SageMaker training and model hosting. With SageMaker Spark, you can train on Amazon SageMaker from Spark DataFrames using Amazon-provided ML algorithms like K-Means clustering or XGBoost, and make predictions on DataFrames against SageMaker endpoints hosting your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Cassandra Spark Connector

    Cassandra Spark Connector

    Apache Spark to Apache Cassandra connector

    The Apache Cassandra Spark Connector allows Spark jobs (RDDs or DataFrames/Datasets) to read from and write to Cassandra tables. Compatible with Apache Cassandra (v2.1+), Spark 1.0–3.5, and Scala 2.11–2.13, it supports mapping Cassandra rows to Scala case classes, saving results back to Cassandra, and executing arbitrary CQL within Spark applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    Spark

    Web-based, cross-platform and full-featured Remote Administration Tool

    Spark is a web-based, cross-platform and full-featured Remote Administration Tool (RAT) written in Go that allows you control all your devices anywhere. Spark是一个Go编写的,网页UI、跨平台以及多功能的远程控制和监控工具,你可以随时随地监控和控制所有设备。
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Downloads: 8 This Week
    Last Update:
    See Project
  • The All-in-One Commerce Platform for Businesses - Shopify Icon
    The All-in-One Commerce Platform for Businesses - Shopify

    Shopify offers plans for anyone that wants to sell products online and build an ecommerce store, small to mid-sized businesses as well as enterprise

    Shopify is a leading all-in-one commerce platform that enables businesses to start, build, and grow their online and physical stores. It offers tools to create customized websites, manage inventory, process payments, and sell across multiple channels including online, in-person, wholesale, and global markets. The platform includes integrated marketing tools, analytics, and customer engagement features to help merchants reach and retain customers. Shopify supports thousands of third-party apps and offers developer-friendly APIs for custom solutions. With world-class checkout technology, Shopify powers over 150 million high-intent shoppers worldwide. Its reliable, scalable infrastructure ensures fast performance and seamless operations at any business size.
    Learn More
  • 10
    Deequ

    Deequ

    Deequ is a library built on top of Apache Spark

    Deequ is a library built atop Apache Spark that enables defining “unit tests for data” — that is, formal constraints or checks on datasets to ensure data quality along dimensions such as completeness, uniqueness, value ranges, correlations, etc. It can scale to large datasets (billions of rows) by translating those data checks into Spark jobs. Deequ supports advanced features like a metrics repository for storing computed statistics over time, anomaly detection of data quality metrics...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    fugue

    fugue

    A unified interface for distributed computing

    Fugue is a unified interface for distributed computing that lets users execute Python, Pandas, and SQL code on Spark, Dask, and Ray with minimal rewrites.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Volcano

    Volcano

    A Cloud Native Batch System (Project under CNCF)

    Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including machine learning/deep learning, bioinformatics/genomics, and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, Ray, PyTorch, MPI, etc, which Volcano integrates with. Volcano builds upon a decade and a half of experience running a wide variety of high...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    Apache Kyuubi

    Apache Kyuubi

    Apache Kyuubi is a distributed and multi-tenant gateway

    Apache Kyuubi™ is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for end-users to manipulate large-scale data with pre-programmed and extensible Spark SQL engines. This "out-of-the-box" model minimizes the barriers and costs for end-users to use Spark at the client side. At the server-side, Kyuubi server and engines' multi-tenant architecture provides the administrators...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Alire

    Alire

    Command-line tool from the Alire project and supporting library

    Alire is a source-based package manager for the Ada and SPARK programming languages. It facilitates the building and sharing of projects within the Ada community, allowing developers to easily manage dependencies and publish their own libraries or programs. Alire aims to streamline the development process for Ada and SPARK by providing a standardized approach to package management. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    ChatALL

    ChatALL

    Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vincuna, etc.

    Concurrently chat with ChatGPT, Bing Chat, bard, Alpaca, Vincuna, Claude, ChatGLM, MOSS, iFlytek Spark, ERNIE and more, discover the best answers. Large Language Models (LLMs) based AI bots are amazing. However, their behavior can be random and different bots excel at different tasks. If you want the best experience, don't try them one by one. ChatALL (Chinese name: 齐叨) can send prompt to several AI bots concurrently, help you to discover the best results.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    MLflow

    MLflow

    Open source platform for the machine learning lifecycle

    MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Daft

    Daft

    Distributed DataFrame for Python designed for the cloud

    Daft is a framework for ETL, analytics and ML/AI at scale. Its familiar Python Dataframe API is built to outperform Spark in performance and ease of use. Daft plugs directly into your ML/AI stack through efficient zero-copy integrations with essential Python libraries such as Pytorch and Ray. It also allows requesting GPUs as a resource for running models. Daft runs locally with a lightweight multithreaded backend. When your local machine is no longer sufficient, it scales seamlessly to run out...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    sparkmagic

    sparkmagic

    Jupyter magics and kernels for working with remote Spark clusters

    Sparkmagic is a set of tools for interactively working with remote Spark clusters in Jupyter notebooks. Sparkmagic interacts with remote Spark clusters through a REST server. Automatic visualization of SQL queries in the PySpark, Spark and SparkR kernels; use an easy visual interface to interactively construct visualizations, no code required. Ability to capture the output of SQL queries as Pandas dataframes to interact with other Python libraries (e.g. matplotlib). Send local files...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Numba

    Numba

    NumPy aware dynamic Python compiler using LLVM

    ... function, and Numba does the rest. Numba is designed to be used with NumPy arrays and functions. Numba generates specialized code for different array data types and layouts to optimize performance. Special decorators can create universal functions that broadcast over NumPy arrays just like NumPy functions do. Numba also works great with Jupyter notebooks for interactive computing, and with distributed execution frameworks, like Dask and Spark.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    sparklyr

    sparklyr

    R interface for Apache Spark

    sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Explorer

    Explorer

    Series (one-dimensional) and dataframes (two-dimensional)

    Explorer brings series (one-dimensional) and data frames (two-dimensional) to Elixir for fast data exploration.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    ... can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    almond

    almond

    A Scala kernel for Jupyter

    ..., and vice versa. Almond exposes APIs to interact with Jupyter front-ends. Call them from notebooks… or from your own libraries. Several plotting libraries are already available to plot things from notebooks, such as plotly-scala or Vegas. Load the Spark version of your choice, create a Spark session, and start using it from your notebooks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Apache Iceberg

    Apache Iceberg

    Apache Iceberg

    Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data while making it possible for engines like Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time. The core Java library that tracks table snapshots and metadata is complete, but still evolving. Current work is focused on adding row-level deletes and upserts, and integration work with new engines like Flink and Hive...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.