spark free download - SourceForge

Showing 20 open source projects for "spark"

View related business solutions

Scala Mac Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Gen AI apps are built with MongoDB Atlas
Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.

Start Free
1

Spark NLP

State of the Art Natural Language Processing

Experience the power of large language models like never before, unleashing the full potential of Natural Language Processing (NLP) with Spark NLP, the open source library that delivers scalable LLMs. The full code base is open under the Apache 2.0 license, including pre-trained models and pipelines. The only NLP library built natively on Apache Spark. The most widely used NLP library in the enterprise. Spark ML provides a set of machine learning applications that can be built using two main...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
2

Apache Spark

A unified analytics engine for large-scale data processing

Apache Spark is a unified engine for large-scale data processing, offering APIs for batch jobs, streaming, machine learning, and graph computation. It builds on resilient distributed datasets (RDDs) and the newer DataFrame/Dataset abstractions to provide fault-tolerant, in-memory computation across clusters. Spark’s execution engine handles scheduling, shuffles, caching, and data locality so users can focus on transformations rather than infrastructure plumbing. With Spark Streaming...

Downloads: 1 This Week

Last Update: 2025-10-26
See Project
3

SageMaker Spark

A Spark library for Amazon SageMaker

SageMaker Spark is an open-source Spark library for Amazon SageMaker. With SageMaker Spark you construct Spark ML Pipelines using Amazon SageMaker stages. These pipelines interleave native Spark ML stages and stages that interact with SageMaker training and model hosting. With SageMaker Spark, you can train on Amazon SageMaker from Spark DataFrames using Amazon-provided ML algorithms like K-Means clustering or XGBoost, and make predictions on DataFrames against SageMaker endpoints hosting your...

Downloads: 1 This Week

Last Update: 2024-02-22
See Project
4

Cassandra Spark Connector

Apache Spark to Apache Cassandra connector

The Apache Cassandra Spark Connector allows Spark jobs (RDDs or DataFrames/Datasets) to read from and write to Cassandra tables. Compatible with Apache Cassandra (v2.1+), Spark 1.0–3.5, and Scala 2.11–2.13, it supports mapping Cassandra rows to Scala case classes, saving results back to Cassandra, and executing arbitrary CQL within Spark applications.

Downloads: 0 This Week

Last Update: 2025-08-04
See Project
Simple, Secure Domain Registration
Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.

Sign up for free
5

Deequ

Deequ is a library built on top of Apache Spark

Deequ is a library built atop Apache Spark that enables defining “unit tests for data” — that is, formal constraints or checks on datasets to ensure data quality along dimensions such as completeness, uniqueness, value ranges, correlations, etc. It can scale to large datasets (billions of rows) by translating those data checks into Spark jobs. Deequ supports advanced features like a metrics repository for storing computed statistics over time, anomaly detection of data quality metrics...

Downloads: 6 This Week

Last Update: 2025-11-03
See Project
6

Apache Kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway

Apache Kyuubi™ is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for end-users to manipulate large-scale data with pre-programmed and extensible Spark SQL engines. This "out-of-the-box" model minimizes the barriers and costs for end-users to use Spark at the client side. At the server-side, Kyuubi server and engines' multi-tenant architecture provides the administrators...

Downloads: 3 This Week

Last Update: 2025-05-27
See Project
7

Synapse Machine Learning

Simple and distributed Machine Learning

SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit...

Downloads: 2 This Week

Last Update: 2025-09-16
See Project
8

Scio

A Scala API for Apache Beam and Google Cloud Dataflow

Scio is a Scala API developed by Spotify that builds on Apache Beam to enable expressive batch and streaming data pipelines, optimized for running on Google Cloud Dataflow. Inspired by Spark and Scalding, it provides scalable, type‑safe, and production-grade data processing, with built-in support for BigQuery, Pub/Sub, Cassandra, Elasticsearch, Redis, TensorFlow IO, and more.

Downloads: 2 This Week

Last Update: 2025-10-23
See Project
9

almond

A Scala kernel for Jupyter

..., and vice versa. Almond exposes APIs to interact with Jupyter front-ends. Call them from notebooks… or from your own libraries. Several plotting libraries are already available to plot things from notebooks, such as plotly-scala or Vegas. Load the Spark version of your choice, create a Spark session, and start using it from your notebooks.

Downloads: 0 This Week

Last Update: 3 days ago
See Project
Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

Feathr

A scalable, unified data and AI engineering platform for enterprise

Feathr is a data and AI engineering platform that is widely used in production at LinkedIn for many years and was open sourced in 2022. It is currently a project under LF AI & Data Foundation. Define data and feature transformations based on raw data sources (batch and streaming) using Pythonic APIs. Register transformations by names and get transformed data(features) for various use cases including AI modeling, compliance, go-to-market and more. Share transformations and data(features)...

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
11

SnappyData

Memory optimized analytics database, based on Apache Spark

SnappyData (aka TIBCO ComputeDB) is a distributed, in-memory optimized analytics database. SnappyData delivers high throughput, low latency, and high concurrency for a unified analytics workload. By fusing an in-memory hybrid database inside Apache Spark, it provides analytic query processing, mutability/transactions, access to virtually all big data sources and stream processing all in one unified cluster. One common use case for SnappyData is to provide analytics at interactive speeds over...

Downloads: 0 This Week

Last Update: 2024-10-15
See Project
12

CoolplaySpark

Spark Cool Play: Spark source code analysis, Spark class library, etc.

CoolplaySpark is a learning and practice repository designed to help users understand and work with Apache Spark. It serves as a companion resource for the book 深入理解Spark核心思想与源码分析 (In-Depth Understanding of Spark’s Core Concepts and Source Code Analysis). The project contains annotated examples, explanations, and exercises that guide learners through Spark’s architecture, execution model, and source code internals. It is particularly valuable for developers who want to strengthen...

Downloads: 2 This Week

Last Update: 3 days ago
See Project
13

Spark JobServer

REST job server for Apache Spark

Spark Job Server offers a RESTful interface for submitting, managing, and running jobs or contexts on Apache Spark. Rather than requiring every application to embed Spark or manage Spark contexts manually, this server abstracts a long-lived service where clients can upload JARs, start and stop contexts, submit jobs synchronously or asynchronously, and manage named objects (RDDs / DataFrames) across job executions. It supports multiple modes (transient jobs, persistent contexts for reuse...

Downloads: 0 This Week

Last Update: 2025-09-18
See Project
14

osm4scala

Reading OpenStreetMap Pbf files.

Scala and polyglot Spark library (Scala, PySpark, SparkSQL, ... ) focused on reading OpenStreetMap Pbf files.

Downloads: 0 This Week

Last Update: 2022-12-26
See Project
15

SZT-bigdata

SZT‑bigdata is an open source project

SZT‑bigdata is an open-source project analyzing real Shenzhen metro (subway) card usage data using big‑data frameworks like Spark, Hadoop, Hive, Kafka, Flink, ClickHouse, HBase, and Elasticsearch. Aimed at exploring transit passenger flow patterns and system optimization using a variety of Scala-based technologies.

Downloads: 0 This Week

Last Update: 2025-08-04
See Project
16

TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Apache Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.

Downloads: 4 This Week

Last Update: 2024-08-09
See Project
17

apache spark data pipeline osDQ

osDQ dedicated to create apache spark based data pipeline using JSON

This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to run Unzip...

Downloads: 0 This Week

Last Update: 2019-01-20
See Project
18

Cosmos DB Spark

Apache Spark Connector for Azure Cosmos DB

Azure Cosmos DB Spark is the official connector for Azure CosmosDB and Apache Spark. The connector allows you to easily read to and write from Azure Cosmos DB via Apache Spark DataFrames in Python and Scala. It also allows you to easily create a lambda architecture for batch-processing, stream-processing, and a serving layer while being globally replicated and minimizing the latency involved in working with big data.

Downloads: 0 This Week

Last Update: 2023-12-21
See Project
19

PredictionIO

Machine learning server for building predictive applications

Apache PredictionIO is an open-source machine learning server designed to simplify the process of building and deploying predictive engines. It offers a scalable infrastructure with support for multiple ML algorithms, event data collection, and deployment workflows. Developers can use templates or build custom engines, making it a flexible solution for integrating machine learning into applications.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
20

Apache PredictionIO

Machine learning server for developers and ML engineers

... for comprehensive predictive analytics; speed up machine learning modeling with systematic processes and pre-built evaluation measures; support machine learning and data processing libraries such as Spark MLLib and OpenNLP; implement your own machine learning models and seamlessly incorporate them into your engine; simplify data infrastructure management.

Downloads: 0 This Week

Last Update: 2021-05-10
See Project