Xorbits aims to be as user-friendly, intuitive, and efficient as possible for deployment and execution.

Amazon EKS is a managed Kubernetes service that allows you to run Kubernetes on the AWS cloud and on-premises data centers. As a cloud-native, standard solution, EKS has become the mainstream choice for deploying applications and services. This tutorial will help you quickly become familiar with how to deploy and run Xorbits on Amazon EKS and analyze your running jobs with an interesting example. I hope you enjoy this journey.
Prepare EKS cluster
Install kubectl
kubectl
is a command-line tool that allows you to communicate with the Kubernetes API server, which you can use to conveniently manage your Kubernetes cluster.
Refer to Install kubectl to install it.
Install eksctl
eksctl
is a command-line tool that makes it easy to create and manage Kubernetes clusters on Amazon EKS. It provides some simple and intuitive options to create and manage an EKS cluster compared to doing it manually through the AWS Console.
Refer to Install eksctl to install it.
Create an EKS cluster
Refer to Create cluster to create an EKS cluster using eksctl
.
Please note that in this tutorial, we use the Xorbits image that is available in our Dockerhub namespace. It is important to ensure that your EKS cluster has access to Dockerhub to be able to pull the image.
By default, EKS clusters created by eksctl
have access to public networks. For more information and specific configuration options, refer to the eksctl documentation.
Install AWS Load Balancer
The final step in preparation is to set up an AWS Load Balancer. Xorbits uses the Ingress service to expose its web endpoint on EKS. Regardless of whether your EKS cluster is using AWS Fargate or AWS EC2, the Ingress service will proxy a URL for Xorbits.
Deploy Xorbits
Now you have a complete EKS cluster, let’s start deploying Xorbits. Deploying Xorbits to an existing EKS cluster is quite simple.
- Install Xorbits 0.1.1 or above version with Kubernetes SDK and S3 dependency
$ pip install 'xorbits[kubernetes,aws]>=0.1.1'
- Deploy
from kubernetes import config from xorbits.deploy.kubernetes import new_cluster cluster = new_cluster( config.new_client_from_config(), worker_num=5, worker_cpu=16, worker_mem='128g', supervisor_cpu=4, supervisor_mem='8g')
That’s it! Once the deployment is complete, you should be able to see Xorbits endpoint http://<ingress_service_ip>:80 is ready!
log in the console. This confirms that a Xorbits cluster with 1 supervisor and 5 workers has been successfully deployed to your EKS cluster.
Xorbits uses the Kubernetes API client for deployment, which is created from the config.new_client_from_config()
API. Ensure that your current kubectl
context is set to your EKS context if you have multiple Kubernetes contexts configured. Run this command to check:
$ kubectl config current-context
Manager Xorbits cluster
Get Xorbits namespace
$ kubectl get namespaces
Get the namespace string which starts with the prefix xorbits-ns-
. For example:
$ kubectl get namespaces
NAME STATUS AGE
default Active 38d
kube-node-lease Active 38d
kube-public Active 38d
kube-system Active 38d
xorbits-ns-cc53e351744f4394b20180a0dafd8b91 Active 4m5s
The last one in the picture above is the namespace that Xorbits exists. If there is more than one, it is usually judged according to the third AGE
column.
Check Xorbits pods status
$ kubectl get po -n <your_namespace>
For example:
$ kubectl get po -n xorbits-ns-cc53e351744f4394b20180a0dafd8b91
NAME READY STATUS RESTARTS AGE
xorbitssupervisor-7589b8ff4b-2kw9j 1/1 Running 0 5m23s
xorbitsworker-5f8db4f798-b9xlk 1/1 Running 0 5m22s
xorbitsworker-5f8db4f798-hx2zx 1/1 Running 0 5m22s
xorbitsworker-5f8db4f798-jzxm6 1/1 Running 0 5m22s
xorbitsworker-5f8db4f798-pr9lc 1/1 Running 0 5m22s
xorbitsworker-5f8db4f798-xlc25 1/1 Running 0 5m22s
For supervisor and worker, the pod name starts with prefix xorbitssupervisor
and xorbitsworker
respectively.
In case of any errors that occur during pod startup, you can use all the commands supported by kubectl
to view and troubleshoot them. For example, you can use kubectl describe pod <pod_name>
and kubectl logs <pod_name>
to check the details.
Check Xorbits endpoint
$ kubectl get ingress -n <your_namespace>
For example:
$ kubectl get ingress -n xorbits-ns-cc53e351744f4394b20180a0dafd8b91
NAME CLASS HOSTS ADDRESS PORTS AGE
xorbits-ingress alb * k8s-xorbitsn-xorbitsi-25a70c9131-156674448.us-east-2.elb.amazonaws.com 80 5m28s
Xorbits service endpoint is exposed by the Ingress service in EKS. The endpoint shown in the ADDRESS
column in the picture above is the same as the endpoint in the log that appears in the console. You can open the endpoint in your web browser to access the Xorbits web UI. Additionally, you can use the xorbits.init()
method to start a session and submit jobs to your Xorbits cluster. For example:
import xorbits
xorbits.init('http://<ingress_service_ip>:80')
# your codes here, for example:
import xorbits.pandas as pd
print(pd.DataFrame({'col': [1, 2, 3]}).sum())
Check Web UI
Open the Ingress address in your browser to access the Xorbits web UI. The web UI provides an overview of the cluster status, resource usage, session monitoring, and task view. For example:
This page displays information about the status of all the workers, resource usage, and total usage amount.
Shutdown Xorbits cluster
To delete your Xorbits cluster, you can simply delete the Kubernetes namespace. This will remove all the resources related to the cluster.
$ kubectl delete namespace <your_namespace>
Running Example
Here we provide an example which reads MovieLens dataset from AWS S3, processes and calculates it using Xorbits, and obtains some simple results of data analysis. In order to successfully read data from S3, s3fs dependency need to be installed on Xorbits Kubernetes cluster, which requires adding pip
option in the deployment interface. Note that this requires Xorbits version v0.1.1 or above.
For example:
from kubernetes import config
from xorbits.deploy.kubernetes import new_cluster
cluster = new_cluster(
config.new_client_from_config(),
worker_num=5,
worker_cpu=4, worker_mem='16g',
supervisor_cpu=4, supervisor_mem='8g',
pip=['s3fs'])
The data analysis example code is as follows, and requires ensuring your Kubernetes cluster can access PyPI and the AWS S3 bucket where your data is stored. You can download the MovieLens dataset from here.
import xorbits
import xorbits.pandas as pd
xorbits.init('<your Xorbits endpoint>')
storage_options = {
"key": "<your AWS access key id>",
"secret": "<your AWS secret access key>",
}
bucket = "s3://<your S3 bucket>/"
movies = pd.read_csv(
bucket + "movies.csv",
storage_options=storage_options)
ratings = pd.read_csv(
bucket + "ratings.csv",
storage_options=storage_options)
movie_ratings = ratings.groupby(
'movieId', as_index=False)\
.agg({'rating': ['mean', 'count']})
movie_ratings.columns = ['movieId', 'rating', 'count']
movie_ratings = movie_ratings[movie_ratings['count'] > 100]
top_100_movies = movie_ratings.sort_values(
'rating', ascending=False)[:100]
top_100_movies_detail = \
top_100_movies.merge(movies[['movieId', 'title']])
print(top_100_movies_detail)
This code analyzes the ratings of all movies in the MovieLens dataset, and outputs the information of the top 100 movies with the highest ratings and more than 100 ratings.
Example of output:
movieId rating count title
0 171011 4.483096 1124 Planet Earth II (2016)
1 159817 4.464797 1747 Planet Earth (2006)
2 318 4.413576 81482 Shawshank Redemption, The (1994)
3 170705 4.398599 1356 Band of Brothers (2001)
4 171495 4.326715 277 Cosmos
.. ... ... ... ...
95 48516 4.121237 25343 Departed, The (2006)
96 903 4.120822 15945 Vertigo (1958)
97 1280 4.120717 3181 Raise the Red Lantern (Da hong deng long gao g...
98 260 4.120189 68717 Star Wars: Episode IV - A New Hope (1977)
99 191997 4.117021 329 The Hounds of Baskerville (2012)
[100 rows x 4 columns]
You can open the Xorbits endpoint in your browser to analyze the code that was just executed. On the Sessions -> Task ID
section of the webpage, you can see the execution graph corresponding to this code in Xorbits. If there are multiple sessions and tasks, the latest Session ID
and Task ID
will always be at the bottom of the page.
On the Supervisors / Workers -> endpoint -> LOGS
section of the webpage, you can see the execution logs. Xorbits on Kubernetes enables DEBUG
level logs, and you can download these logs for analysis and debugging by clicking the SAVE
button.
Feel free to explore the Xorbits web UI, and we will continue to enhance its capabilities in the future.
Conclusion
This tutorial has provided a detailed explanation of how to deploy, manage and run a Xorbits cluster on Amazon EKS. It also provides a simple example of data analysis to show how to run specific code and view detailed information of execution such as running logs and execution graph on the Web UI. Xorbits aims to be as user-friendly, intuitive, and efficient as possible for deployment and execution. We are looking forward to your usage.