#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,337 public repositories matching this topic...

apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

rss spark mapreduce shuffle tez remote-shuffle-service

Updated Jun 12, 2024
Java

nessie

projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

git java data spark aws-lambda iceberg

Updated Jun 12, 2024
Java

apache / spark-kubernetes-operator

Apache Spark Kubernetes Operator

java kubernetes spark

Updated Jun 12, 2024
Java

apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator

rust spark arrow datafusion

Updated Jun 12, 2024
Rust

listenbrainz-server

metabrainz / listenbrainz-server

Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.

react python music typescript database web big-data spark listenbrainz-server

Updated Jun 12, 2024
Python

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated Jun 12, 2024
Java

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated Jun 12, 2024
Python

gchq / Gaffer

A large-scale entity and relation database supporting aggregation of properties

big-data spark hadoop graph accumulo hbase graph-database parquet aggregation

Updated Jun 12, 2024
Java

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated Jun 12, 2024
C++

apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated Jun 12, 2024
Java

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated Jun 12, 2024
Scala

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated Jun 12, 2024
Scala

HariSekhon / Knowledge-Base

IT Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public

Updated Jun 12, 2024
Shell

apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

spark bigdata shuffle

Updated Jun 12, 2024
Java

opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.

spark compute opensearch secondary-index

Updated Jun 12, 2024
Scala

apachecn / .github

ApacheCN 开源组织：公告、介绍、成员、活动、交流方式

python spark ml pytorch solidity dl

Updated Jun 12, 2024
CSS

J-sephB-lt-n / useful-code-snippets

A searchable collection of useful little pieces of code

python shell bash cloud spark ec2 graph virtual-machine gcp pyspark dataproc streamlit rustworkx

Updated Jun 12, 2024
Python

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Jun 12, 2024
Python

moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-science spark record-linkage entity-resolution fuzzy-matching deduplication em-algorithm data-matching deduplicate-data duckdb uk-gov-data-science

Updated Jun 12, 2024
Python

FistGang / PrimeSpark

Prime Number Generator using PySpark

spark pyspark prime-numbers sieve-of-eratosthenes

Updated Jun 12, 2024
Python

Created by Matei Zaharia

Released May 26, 2014

Followers: 417 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics