TUTORIALS
Tutorial: Deploy and Run Xorbits on Amazon EKS

Xorbits aims to be as user-friendly, intuitive, and efficient as possible for deployment and execution.

Chengjie Li
  16 January 2023

Amazon EKS is a managed Kubernetes service that allows you to run Kubernetes on the AWS cloud and on-premises data centers. As a cloud-native, standard solution, EKS has become the mainstream choice for deploying applications and services. This tutorial will help you quickly become familiar with how to deploy and run Xorbits on Amazon EKS and analyze your running jobs with an interesting example. I hope you enjoy this journey.

Prepare EKS cluster

Install kubectl

kubectl is a command-line tool that allows you to communicate with the Kubernetes API server, which you can use to conveniently manage your Kubernetes cluster.

Refer to Install kubectl to install it.

Install eksctl

eksctl is a command-line tool that makes it easy to create and manage Kubernetes clusters on Amazon EKS. It provides some simple and intuitive options to create and manage an EKS cluster compared to doing it manually through the AWS Console.

Refer to Install eksctl to install it.

Create an EKS cluster

Refer to Create cluster to create an EKS cluster using eksctl.

Please note that in this tutorial, we use the Xorbits image that is available in our Dockerhub namespace. It is important to ensure that your EKS cluster has access to Dockerhub to be able to pull the image.

By default, EKS clusters created by eksctl have access to public networks. For more information and specific configuration options, refer to the eksctl documentation.

Install AWS Load Balancer

The final step in preparation is to set up an AWS Load Balancer. Xorbits uses the Ingress service to expose its web endpoint on EKS. Regardless of whether your EKS cluster is using AWS Fargate or AWS EC2, the Ingress service will proxy a URL for Xorbits.

Deploy Xorbits

Now you have a complete EKS cluster, let’s start deploying Xorbits. Deploying Xorbits to an existing EKS cluster is quite simple.

  1. Install Xorbits 0.1.1 or above version with Kubernetes SDK and S3 dependency
    $ pip install 'xorbits[kubernetes,aws]>=0.1.1'
    
  2. Deploy
    from kubernetes import config
    from xorbits.deploy.kubernetes import new_cluster
    cluster = new_cluster(
     config.new_client_from_config(), 
     worker_num=5, 
     worker_cpu=16, worker_mem='128g', 
     supervisor_cpu=4, supervisor_mem='8g')
    

That’s it! Once the deployment is complete, you should be able to see Xorbits endpoint http://<ingress_service_ip>:80 is ready! log in the console. This confirms that a Xorbits cluster with 1 supervisor and 5 workers has been successfully deployed to your EKS cluster.

Xorbits uses the Kubernetes API client for deployment, which is created from the config.new_client_from_config() API. Ensure that your current kubectl context is set to your EKS context if you have multiple Kubernetes contexts configured. Run this command to check:

$ kubectl config current-context

Manager Xorbits cluster

Get Xorbits namespace

$ kubectl get namespaces

Get the namespace string which starts with the prefix xorbits-ns-. For example:

$ kubectl get namespaces
NAME                                          STATUS   AGE
default                                       Active   38d
kube-node-lease                               Active   38d
kube-public                                   Active   38d
kube-system                                   Active   38d
xorbits-ns-cc53e351744f4394b20180a0dafd8b91   Active   4m5s

The last one in the picture above is the namespace that Xorbits exists. If there is more than one, it is usually judged according to the third AGE column.

Check Xorbits pods status

$ kubectl get po -n <your_namespace>

For example:

$ kubectl get po -n xorbits-ns-cc53e351744f4394b20180a0dafd8b91
NAME                                 READY   STATUS    RESTARTS   AGE
xorbitssupervisor-7589b8ff4b-2kw9j   1/1     Running   0          5m23s
xorbitsworker-5f8db4f798-b9xlk       1/1     Running   0          5m22s
xorbitsworker-5f8db4f798-hx2zx       1/1     Running   0          5m22s
xorbitsworker-5f8db4f798-jzxm6       1/1     Running   0          5m22s
xorbitsworker-5f8db4f798-pr9lc       1/1     Running   0          5m22s
xorbitsworker-5f8db4f798-xlc25       1/1     Running   0          5m22s

For supervisor and worker, the pod name starts with prefix xorbitssupervisor and xorbitsworker respectively.

In case of any errors that occur during pod startup, you can use all the commands supported by kubectl to view and troubleshoot them. For example, you can use kubectl describe pod <pod_name> and kubectl logs <pod_name> to check the details.

Check Xorbits endpoint

$ kubectl get ingress -n <your_namespace>

For example:

$ kubectl get ingress -n xorbits-ns-cc53e351744f4394b20180a0dafd8b91
NAME              CLASS   HOSTS   ADDRESS                                                                  PORTS   AGE
xorbits-ingress   alb     *       k8s-xorbitsn-xorbitsi-25a70c9131-156674448.us-east-2.elb.amazonaws.com   80      5m28s

Xorbits service endpoint is exposed by the Ingress service in EKS. The endpoint shown in the ADDRESS column in the picture above is the same as the endpoint in the log that appears in the console. You can open the endpoint in your web browser to access the Xorbits web UI. Additionally, you can use the xorbits.init() method to start a session and submit jobs to your Xorbits cluster. For example:

import xorbits

xorbits.init('http://<ingress_service_ip>:80')

# your codes here, for example:
import xorbits.pandas as pd
print(pd.DataFrame({'col': [1, 2, 3]}).sum())

Check Web UI

Open the Ingress address in your browser to access the Xorbits web UI. The web UI provides an overview of the cluster status, resource usage, session monitoring, and task view. For example:

ui_workers

This page displays information about the status of all the workers, resource usage, and total usage amount.

Shutdown Xorbits cluster

To delete your Xorbits cluster, you can simply delete the Kubernetes namespace. This will remove all the resources related to the cluster.

$ kubectl delete namespace <your_namespace>

Running Example

Here we provide an example which reads MovieLens dataset from AWS S3, processes and calculates it using Xorbits, and obtains some simple results of data analysis. In order to successfully read data from S3, s3fs dependency need to be installed on Xorbits Kubernetes cluster, which requires adding pip option in the deployment interface. Note that this requires Xorbits version v0.1.1 or above. For example:

from kubernetes import config
from xorbits.deploy.kubernetes import new_cluster
cluster = new_cluster(
    config.new_client_from_config(), 
    worker_num=5, 
    worker_cpu=4, worker_mem='16g', 
    supervisor_cpu=4, supervisor_mem='8g', 
    pip=['s3fs'])

The data analysis example code is as follows, and requires ensuring your Kubernetes cluster can access PyPI and the AWS S3 bucket where your data is stored. You can download the MovieLens dataset from here.

import xorbits
import xorbits.pandas as pd

xorbits.init('<your Xorbits endpoint>')

storage_options = {
    "key": "<your AWS access key id>",
    "secret": "<your AWS secret access key>",
}

bucket = "s3://<your S3 bucket>/"

movies = pd.read_csv(
    bucket + "movies.csv", 
    storage_options=storage_options)

ratings = pd.read_csv(
    bucket + "ratings.csv", 
    storage_options=storage_options)

movie_ratings = ratings.groupby(
    'movieId', as_index=False)\
    .agg({'rating': ['mean', 'count']})
movie_ratings.columns = ['movieId', 'rating', 'count']
movie_ratings = movie_ratings[movie_ratings['count'] > 100]
top_100_movies = movie_ratings.sort_values(
    'rating', ascending=False)[:100]
top_100_movies_detail = \
    top_100_movies.merge(movies[['movieId', 'title']])

print(top_100_movies_detail)

This code analyzes the ratings of all movies in the MovieLens dataset, and outputs the information of the top 100 movies with the highest ratings and more than 100 ratings.

Example of output:

    movieId    rating  count                                              title
0    171011  4.483096   1124                             Planet Earth II (2016)
1    159817  4.464797   1747                                Planet Earth (2006)
2       318  4.413576  81482                   Shawshank Redemption, The (1994)
3    170705  4.398599   1356                            Band of Brothers (2001)
4    171495  4.326715    277                                             Cosmos
..      ...       ...    ...                                                ...
95    48516  4.121237  25343                               Departed, The (2006)
96      903  4.120822  15945                                     Vertigo (1958)
97     1280  4.120717   3181  Raise the Red Lantern (Da hong deng long gao g...
98      260  4.120189  68717          Star Wars: Episode IV - A New Hope (1977)
99   191997  4.117021    329                   The Hounds of Baskerville (2012)

[100 rows x 4 columns]

You can open the Xorbits endpoint in your browser to analyze the code that was just executed. On the Sessions -> Task ID section of the webpage, you can see the execution graph corresponding to this code in Xorbits. If there are multiple sessions and tasks, the latest Session ID and Task ID will always be at the bottom of the page.

task_graph

On the Supervisors / Workers -> endpoint -> LOGS section of the webpage, you can see the execution logs. Xorbits on Kubernetes enables DEBUG level logs, and you can download these logs for analysis and debugging by clicking the SAVE button.

logs

Feel free to explore the Xorbits web UI, and we will continue to enhance its capabilities in the future.

Conclusion

This tutorial has provided a detailed explanation of how to deploy, manage and run a Xorbits cluster on Amazon EKS. It also provides a simple example of data analysis to show how to run specific code and view detailed information of execution such as running logs and execution graph on the Web UI. Xorbits aims to be as user-friendly, intuitive, and efficient as possible for deployment and execution. We are looking forward to your usage.


© 2022-2023 Xprobe Inc. All Rights Reserved.