.NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer. .NET for Apache Spark runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework. It also runs on all major cloud providers including Azure HDInsight Spark, Amazon EMR Spark, AWS & Azure Databricks.
Features
- Apache Spark™ is a general-purpose distributed processing engine for analytics over large data sets
- Processing tasks are distributed over a cluster of nodes
- Data is cached in-memory, to reduce computation time
- Apache Spark is often used for high-volume data preparation pipelines, such as extract, transform, and load (ETL) processes
- Large streams of data can be processed in real-time with Apache Spark, such as monitoring streams of sensor data or analyzing financial transactions to detect fraud
- Apache Spark can reduce the cost and time involved in building machine learning models