Performance
Benchmarks

We compared Xorbits to Dask, Pandas API on Spark, and Modin on Ray with TPC-H benchmarks at scale factor 100 (~100 GB dataset) and 1000 (~1 TB dataset). The cluster for TPC-H SF 100 consists of an r6i.large instance as the supervisor, and 5 r6i.4xlarge instances as the workers. The cluster for TPC-H SF 1000 consists of an r6i.large instance as the supervisor, and 16 r6i.8xlarge instances.

replace pandas

TPC-H SF100:  Xorbits vs. Dask

Dask is a well-known "Pandas-like" framework for scaling Python workloads. The graph below illustrates the computing times of Xorbits and Dask for the TPC-H queries (excluding I/O). Q21 was excluded since Dask ran out of memory. Across all queries, Xorbits was found to be 7.3x faster than Dask.

dask benchmark

TPC-H SF100:  Xorbits vs. Pandas API on Spark

Spark is a well-known framework for fast, large-scale data processing. The graph below illustrates the computing times of Xorbits and Spark Pandas API for the TPC-H queries (excluding I/O). Across all queries, the two systems have roughly similar performance, but Xorbits provided much better API compatibility. Spark Pandas API failed on Q1, Q4, Q7, Q21, and ran out of memory on Q20.

spark benchmark

TPC-H SF100:  Xorbits vs. Modin

Modin is another "Pandas-like" framework that claims to scale Pandas by "changing a single line of code." The graph below illustrates the computing times of Xorbits and Modin for the TPC-H queries (excluding I/O). Since Modin hanged at the first query, we tried running queries individually to lower the memory usage. However, Modin still ran out of memory for most of the queries that involve heavy data shuffles, making the performance difference less obvious. Xorbits was still found to be 3.2x faster than Modin.

modin benchmark

TPC-H SF1000:  Xorbits

Although Xorbits is able to pass all the queries in a row, Dask, Pandas API on Spark and Modin failed on most of the queries. Thus, we are not able to compare the performance difference now, and we plan to try again later.

spark benchmark

© 2022-2023 Xprobe Inc. All Rights Reserved.