Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,337 public repositories matching this topic...
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
-
Updated
Jun 12, 2024 - Python
A large-scale entity and relation database supporting aggregation of properties
-
Updated
Jun 12, 2024 - Java
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
Jun 12, 2024 - C++
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
-
Updated
Jun 12, 2024 - Java
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Jun 12, 2024 - Scala
IT Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public
-
Updated
Jun 12, 2024 - Shell
Spark Accelerator framework ; It enables secondary indices to remote data stores.
-
Updated
Jun 12, 2024 - Scala
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Jun 12, 2024 - Python
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
Updated
Jun 12, 2024 - Python
Prime Number Generator using PySpark
-
Updated
Jun 12, 2024 - Python
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 417 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia