Skip to content

KlonetAI: An AI agent for intelligent interaction with Klonet.

License

Notifications You must be signed in to change notification settings

Lizonghang/KlonetAI

Repository files navigation


Icon

Klonet is designed to support the development and testing of new network protocols and applications in a realistic environment, such as distributed artificial intelligence and federated learning.  Klonet can emulate various network scenarios, such as wireless, mobile, satellite, and optical networks, and provide fine-grained control over the network parameters, such as bandwidth, delay, jitter, and packet loss. Klonet can also integrate with real devices and applications, such as routers, switches, sensors, and smartphones, to create hybrid network experiments.

To appear on NSDI '24:

Tie Ma, Long Luo, Hongfang Yu, et al., "Klonet: An Easy-to-Use and Scalable Platform for Computer Networks Education." In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24), Santa Clara, CA, USA, 2024.

KlonetAI provides an AI agent for intelligent interaction with the Klonet platform. Here we give some examples to show the magic of KlonetAI.

NOTE: You may need a VPN service to access 🤗.

import tool
import tutorial
from tool.klonet import kai

# For Klonet backend.
project_name = "klonetai"  # Replace it with your own project name.
user_name = "<Registered-Username>"  # Replace it with your own user name.
host_ip = "<Host-IP>"  # Replace it with Klonet backend IP.
port = <Port>  # Replace it with Klonet backend port number.

# Collect the tools you need.
tools = [obj() for obj in tool.base]
tools += [obj() for obj in tool.klonet]
tools += [obj() for obj in tool.topo]
tools += [obj() for obj in tool.gpt]
tools += [obj() for obj in tutorial.base]

# Login to Klonet backend.
kai.klonet_login(project_name, user_name, host_ip, port)

# Create a KlonetAI agent.

# ============ Use OpenAI ChatGPT model ============
# Replace `OpenAI_API_Key` in key.py with your own.
# The model name could be "gpt-3.5-turbo", "gpt-3.5-turbo-16k", and "gpt-4".
kai.create_agent(agent_name="openai", tools=tools, openai_model="gpt-3.5-turbo-16k")

# ============ Use 🤗 models for free ! ============
# Replace `Huggingface_API_Key` in key.py with your own.
# The agent name could be "starcoder" (recommended), "codellama-13b-hf" (4k), "codeLlama-34b-instruct-hf" (4k), 
# or "phind-codellama-34b-v2".
# kai.create_agent(agent_name="starcoder", tools=tools)

# ======== Use customized cloud LLM service ========
# prompt = "LIKirin/klonetai-prompts"

# Replace `ZhiPuAI_API_Key` in key.py with your own.
# The model name could be "chatglm_pro", "chatglm_std", "chatglm_lite", or "chatglm_lite_32k".
# from agent.chatglm import ChatGLMAgent
# custom_agent = ChatGLMAgent("chatglm_lite_32k", chat_prompt_template=prompt, run_prompt_template=prompt, additional_tools=tools)

# Replace `AI_Studio_APT_Key` in key.py with your own.
# The model name could be "ernie-bot" or "ernie-bot-turbo".
# from agent.erniebot import ErnieBotAgent
# custom_agent = ErnieBotAgent("ernie-bot-turbo", chat_prompt_template=prompt, run_prompt_template=prompt, additional_tools=tools)

# Replace `DashScope_API_Key` in key.py with your own.
# The model name could be "qwen-turbo", "qwen-plus".
# from agent.dashscope import DashScopeAgent
# custom_agent = DashScopeAgent("qwen-turbo", chat_prompt_template=prompt, run_prompt_template=prompt, additional_tools=tools)

# Replace `SparkDesk_AppId`, `SparkDesk_API_Secret`, `SparkDesk_API_Key` in key.py with your own.
# The model is forced to spark v2.
# from agent.sparkdesk import SparkDesk
# custom_agent = SparkDesk(chat_prompt_template=prompt, run_prompt_template=prompt, additional_tools=tools)

# kai.create_agent(agent=custom_agent)

# ========== Use customized local model ===========
# Coming soon!

Next, we can use agent to interact with KlonetAI. For example:

Basic Usage

Check the image list

>>> kai.chat("List all available images.")
Outputs:

==Explanation from the agent==
I will use the tool `klonet_get_all_images` to fetch a list of all available images.


==Code generated by the agent==
images = klonet_get_all_images()
print(images)


==Result==
nvidia_klonet_image_latest
floodlight
ryu
l2fwd
ubuntu
snort
udt
quagga
ovs

Create a network with star topology

>>> kai.chat("Create a star network with 5 host nodes and 1 ovs switch, and deploy it. "
"Set node positions and link IPs to form a star shape.")
Outputs:

==Explanation from the agent==
To create a star network with 5 host nodes and 1 OVS switch, you can use the following code:


==Code generated by the agent==
# Add the OVS switch
klonet_add_node("s1", "ovs", x=350, y=350)

# Add the host nodes
klonet_add_node("h1", "ubuntu", x=200, y=200)
klonet_add_node("h2", "ubuntu", x=500, y=200)
klonet_add_node("h3", "ubuntu", x=200, y=500)
klonet_add_node("h4", "ubuntu", x=500, y=500)
klonet_add_node("h5", "ubuntu", x=350, y=50)

# Add links between the host nodes and the switch
klonet_add_link("h1", "s1", "l1", "10.0.0.2/24")
klonet_add_link("h2", "s1", "l2", "10.0.0.3/24")
klonet_add_link("h3", "s1", "l3", "10.0.0.4/24")
klonet_add_link("h4", "s1", "l4", "10.0.0.5/24")
klonet_add_link("h5", "s1", "l5", "10.0.0.6/24")

# Deploy the network
klonet_deploy_network()


==Result==
A new node (name: s1, image: switch/ovs, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h1, image: host/ubuntu, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h2, image: host/ubuntu, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h3, image: host/ubuntu, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h4, image: host/ubuntu, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h5, image: host/ubuntu, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A link with name (l1) was added between nodes h1 (IP: 10.0.0.2/24) and s1
A link with name (l2) was added between nodes h2 (IP: 10.0.0.3/24) and s1
A link with name (l3) was added between nodes h3 (IP: 10.0.0.4/24) and s1
A link with name (l4) was added between nodes h4 (IP: 10.0.0.5/24) and s1
A link with name (l5) was added between nodes h5 (IP: 10.0.0.6/24) and s1
Deployment progress: 0 %
...
Deployment progress: 100 %
Deploy project klonetai success.
>>> kai.chat("Show the network.")
Outputs:

==Explanation from the agent==
To view the current network topology, you can use the `klonet_view_topo` tool. Here's an example code:


==Code generated by the agent==
klonet_view_topo()


==Result==
Nodes: {'h1': 'host/ubuntu', 'h2': 'host/ubuntu', 'h3': 'host/ubuntu', 'h4': 'host/ubuntu', 'h5': 'host/ubuntu', 's1': 'switch/ovs'}
Links: {'l1': ('h1', 's1'), 'l2': ('h2', 's1'), 'l3': ('h3', 's1'), 'l4': ('h4', 's1'), 'l5': ('h5', 's1')}

Klonet Screenshot

Command execution

# Now, let us execute some commands in these hosts:
>>> kai.chat("All host nodes ping h1.")
Outputs:

==Explanation from the agent==
To ping host `h1` from all the other host nodes, you can use the `klonet_command_exec` tool. Here's the code:


==Code generated by the agent==
output_h2 = klonet_command_exec("h2", "ping -c 3 10.0.0.2")
output_h3 = klonet_command_exec("h3", "ping -c 3 10.0.0.2")
output_h4 = klonet_command_exec("h4", "ping -c 3 10.0.0.2")
output_h5 = klonet_command_exec("h5", "ping -c 3 10.0.0.2")

print(output_h2)
print(output_h3)
print(output_h4)
print(output_h5)


==Result==
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.780 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.073 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.064 ms

--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2055ms
rtt min/avg/max/mdev = 0.064/0.305/0.780/0.336 ms
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.699 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.085 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.062 ms

--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2041ms
rtt min/avg/max/mdev = 0.062/0.282/0.699/0.295 ms
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.756 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.054 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.063 ms

--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2028ms
rtt min/avg/max/mdev = 0.054/0.291/0.756/0.328 ms
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.809 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.057 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.089 ms

--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2055ms
rtt min/avg/max/mdev = 0.057/0.318/0.809/0.347 ms

Link Configuration

kai.chat("Set the bandwidth and the delay between h1 and s1 to 5mbps and 10us, respectively.")
Outputs:

==Explanation from the agent==
To set the bandwidth and delay between `h1` and `s1`, you can use the `klonet_configure_link` tool. Here's the code:


==Code generated by the agent==
klonet_configure_link(link_name="l1", node_name="h1", bandwidth=5000, delay=10)


==Result==
{'code': 1, 'msg': '链路TC配置成功'}
Link l1 (on the h1 side) is configured with: {'link': 'l1', 'ne': 'h1', 'bw_kbps': 5000, 'delay_us': 10}
>>> kai.chat("Start iperf server on h2 and measure the bandwidth between h1 and h2.")
Outputs:

==Explanation from the agent==
To start the iperf server on `h2` and measure the bandwidth between `h1` and `h2`, you can use the following code:


==Code generated by the agent==
# Start the iperf server on h2
klonet_command_exec("h2", "iperf -s -D")

# Measure the bandwidth between h1 and h2
output = klonet_command_exec("h1", "iperf -c 10.0.0.3")
print(output)


==Result==
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 50366 connected with 10.0.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  6.75 MBytes  5.65 Mbits/sec
>>> kai.chat("Reset link l1.")
Outputs:

==Explanation from the agent==
To reset link `l1`, you can use the `klonet_reset_link` tool. Here's the code:


==Code generated by the agent==
klonet_reset_link("l1")


==Result==
Link l1 has been reset.
>>> kai.chat("Start iperf server on h2 and measure the bandwidth between h1 and h2.")
Outputs:

==Explanation from the agent==
To start the iperf server on `h2` and measure the bandwidth between `h1` and `h2`, you can use the following code:


==Code generated by the agent==
# Start the iperf server on h2
klonet_command_exec("h2", "iperf -s -D")

# Measure the bandwidth between h1 and h2
output = klonet_command_exec("h1", "iperf -c 10.0.0.3")
print(output)


==Result==
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 49338 connected with 10.0.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  31.4 GBytes  26.9 Gbits/sec

SSH Service

>>> kai.chat("Start ssh service on all host nodes.")
Outputs:

==Explanation from the agent==
To start the SSH service on all host nodes, you can use the following code:


==Code generated by the agent==
klonet_enable_ssh_service('h1')
klonet_enable_ssh_service('h2')
klonet_enable_ssh_service('h3')
klonet_enable_ssh_service('h4')
klonet_enable_ssh_service('h5')


==Result==
Starting SSH service is time-consuming, please be patient...
SSH service on h1 started success.
Starting SSH service is time-consuming, please be patient...
SSH service on h2 started success.
Starting SSH service is time-consuming, please be patient...
SSH service on h3 started success.
Starting SSH service is time-consuming, please be patient...
SSH service on h4 started success.
Starting SSH service is time-consuming, please be patient...
SSH service on h5 started success.

Now, we can enter these host nodes through SSH.

Port Mapping

>>> kai.chat("Export port 5678 of all host nodes to the host machine, start by port 9200.")
Outputs:

==Explanation from the agent==
To export port 5678 of the host nodes to the host machine, starting from port 9200, you can use the following code:


==Code generated by the agent==
klonet_port_mapping('h1', 5678, 9200)
klonet_port_mapping('h2', 5678, 9201)
klonet_port_mapping('h3', 5678, 9202)
klonet_port_mapping('h4', 5678, 9203)
klonet_port_mapping('h5', 5678, 9204)


==Result==
Port mapping on h1 success.
Port mapping on h2 success.
Port mapping on h3 success.
Port mapping on h4 success.
Port mapping on h5 success.

Now, let us check the port map of h1:

>>> kai.chat("Show the port map of h1.")
Outputs:

==Explanation from the agent==
To show the port map of host node h1, you can use the following code:


==Code generated by the agent==
klonet_get_port_mapping("h1")


==Result==
Port map: {'5678': [9200]}

Query the IP address

>>> kai.chat("What is the IP of h1?")
Outputs:

==Explanation from the agent==
To retrieve the IP address of `h1`, you can use the `klonet_get_ip` tool. Here's the code:


==Code generated by the agent==
ip_address = klonet_get_ip('h1')
print(f"The IP address of h1 is {ip_address}.")


==Result==
The IP address of h1 is 10.0.0.2.

Query the host machine IPs of given nodes

>>> kai.chat("Which host machine is h1 deployed on?")
Outputs:

==Explanation from the agent==
To find out which host machine host node h1 is deployed on, you can use the following code:


==Code generated by the agent==
klonet_get_worker_ip("h1")


==Result==
Worker IP: {'h1': '192.168.1.16'}
>>> kai.chat("Where are the nodes deployed on?")
Outputs:

==Explanation from the agent==
To find out where the nodes are deployed on, you can use the following code:


==Code generated by the agent==
klonet_get_worker_ip()


==Result==
Worker IP: {'h1': '192.168.1.16', 'h2': '192.168.1.16', 'h3': '192.168.1.16', 'h4': '192.168.1.16', 'h5': '192.168.1.16', 's1': '192.168.1.16'}

Traffic injection

>>> kai.chat("Use iperf on all hosts to inject traffic to h1. Launch them asynchronously by running them at the background.")
Outputs:

==Explanation from the agent==
To use iperf on all hosts to inject traffic to h1 and launch them asynchronously by running them in the background, you can use the following code:


==Code generated by the agent==
# Start the iperf server on h1
klonet_command_exec("h1", "iperf -s -D")

# Launch the iperf client on each host to inject traffic to h1 in the background
klonet_command_exec("h2", "iperf -c 10.0.0.2 -t 10 &")
klonet_command_exec("h3", "iperf -c 10.0.0.2 -t 10 &")
klonet_command_exec("h4", "iperf -c 10.0.0.2 -t 10 &")
klonet_command_exec("h5", "iperf -c 10.0.0.2 -t 10 &")


==Result==
iperf: ignoring extra argument -- &
------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.6 port 50498 connected with 10.0.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  31.4 GBytes  26.9 Gbits/sec

File Upload

>>> kai.chat("Upload /Users/lizh/Downloads/config.txt to h1.")
Outputs:

==Explanation from the agent==
To upload the file "config.txt" to host node h1, I will use the `klonet_file_upload` tool. Here's the code:


==Code generated by the agent==
klonet_file_upload("/Users/lizh/Downloads/config.txt", "h1", "/home")


==Result==
h1's config.txt upload success!

Delete this network

>>> kai.run("Delete project.")
Outputs:

==Explanation from the agent==
I will use the following tool: `klonet_destroy_project` to delete the project.


==Code generated by the agent==
klonet_destroy_project()


==Result==
Destruction progress: 0 %
...
Destruction progress: 100 %
This project has been deleted.

KAI Dashboard

For easy to use, we provide a Web UI called the KAI Dashboard. Use python chatbox.py to start it.

KAI Dashboard

Let's explore the KAI Dashboard through a practical use case that launches MXNET distributed training.

First, we create a star network with 6 nodes and interconnect them with an ovs switch.

You: Create a star network with 6 nodes from the image mxnet_latest, and interconnect them with an ovs switch.
KAI: 

==Explanation from the agent==
To create a star network with 6 nodes from the image mxnet_latest and interconnect them with an OVS switch, I will use the following code:

==Code generated by the agent==

# Add the OVS switch
klonet_add_node("s1", "ovs", x=350, y=350)

# Add the host nodes
klonet_add_node("h1", "mxnet_latest", x=200, y=200)
klonet_add_node("h2", "mxnet_latest", x=500, y=200)
klonet_add_node("h3", "mxnet_latest", x=200, y=500)
klonet_add_node("h4", "mxnet_latest", x=500, y=500)
klonet_add_node("h5", "mxnet_latest", x=350, y=50)
klonet_add_node("h6", "mxnet_latest", x=350, y=650)

# Add links between the host nodes and the switch
klonet_add_link("h1", "s1", "l1", "10.0.0.2/24")
klonet_add_link("h2", "s1", "l2", "10.0.0.3/24")
klonet_add_link("h3", "s1", "l3", "10.0.0.4/24")
klonet_add_link("h4", "s1", "l4", "10.0.0.5/24")
klonet_add_link("h5", "s1", "l5", "10.0.0.6/24")
klonet_add_link("h6", "s1", "l6", "10.0.0.7/24")

# Deploy the network
klonet_deploy_network()

==Result==
A new node (name: s1, image: switch/ovs, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h1, image: 192.168.1.16:5024/wudx/mxnet:latest, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h2, image: 192.168.1.16:5024/wudx/mxnet:latest, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h3, image: 192.168.1.16:5024/wudx/mxnet:latest, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h4, image: 192.168.1.16:5024/wudx/mxnet:latest, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h5, image: 192.168.1.16:5024/wudx/mxnet:latest, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A new node (name: h6, image: 192.168.1.16:5024/wudx/mxnet:latest, resource limit: {'cpu': None, 'mem': None}) have been added to the network.
A link with name (l1) was added between nodes h1 (IP: 10.0.0.2/24) and s1
A link with name (l2) was added between nodes h2 (IP: 10.0.0.3/24) and s1
A link with name (l3) was added between nodes h3 (IP: 10.0.0.4/24) and s1
A link with name (l4) was added between nodes h4 (IP: 10.0.0.5/24) and s1
A link with name (l5) was added between nodes h5 (IP: 10.0.0.6/24) and s1
A link with name (l6) was added between nodes h6 (IP: 10.0.0.7/24) and s1
Deployment progress: 0 %
...
Deployment progress: 100 %
Deploy project klonetai success.

Topo for MXNET

Then, we launch MXNET distributed training jobs on these nodes.

You: Launch 1 mxnet scheduler, 1 mxnet server, and 4 mxnet workers on these nodes. The scheduler should be lanuched on the node with IP 10.0.0.2. The executable script is at /root/cnn.py. Run them as deamon processes and redirect the outputs and error info to /root/output.log.
KAI:

==Explanation from the agent==
To launch 1 MXNet scheduler, 1 MXNet server, and 4 MXNet workers on the nodes, you can use the following code:

==Code generated by the agent==

# Start the MXNet scheduler
klonet_command_exec("h1", "DMLC_ROLE=scheduler DMLC_PS_ROOT_URI=10.0.0.2 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=1 DMLC_NUM_WORKER=4 python /root/cnn.py > /root/output.log 2>&1 &")

# Start the MXNet server
klonet_command_exec("h2", "DMLC_ROLE=server DMLC_PS_ROOT_URI=10.0.0.2 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=1 DMLC_NUM_WORKER=4 python /root/cnn.py > /root/output.log 2>&1 &")

# Start the MXNet workers
klonet_command_exec("h3", "DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.2 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=1 DMLC_NUM_WORKER=4 python /root/cnn.py > /root/output.log 2>&1 &")
klonet_command_exec("h4", "DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.2 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=1 DMLC_NUM_WORKER=4 python /root/cnn.py > /root/output.log 2>&1 &")
klonet_command_exec("h5", "DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.2 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=1 DMLC_NUM_WORKER=4 python /root/cnn.py > /root/output.log 2>&1 &")
klonet_command_exec("h6", "DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.2 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=1 DMLC_NUM_WORKER=4 python /root/cnn.py > /root/output.log 2>&1 &")

==Result==

Finally, let's see what happens. Here we input cat ~/output.log in the tool card named "Command Execution", input node h3 and click "Run".

You: Run commands cat ~/output.log on nodes h3.
KAI: 

h3$ cat ~/output.log
Start training, my rank is 3.
[Time 3.529][Epoch 0][Iteration 1] Test Acc 0.1010
[Time 6.602][Epoch 0][Iteration 2] Test Acc 0.2156
[Time 9.709][Epoch 0][Iteration 3] Test Acc 0.1214
[Time 12.774][Epoch 0][Iteration 4] Test Acc 0.1980
[Time 15.984][Epoch 0][Iteration 5] Test Acc 0.1743
...
[Time 315.532][Epoch 0][Iteration 100] Test Acc 0.9017

Now, we terminate the MXNET jobs using the similar way, but check the hosts item in the selector.

You: pkill python
KAI:

h1$ pkill python
...
h6$ pkill python

Read Online Tutorials

>>> doc = kai.run("Summarize how to manually launch MXNET jobs from "
"https://mxnet.apache.org/versions/1.9.1/api/faq/distributed_training")
print(len(doc))
==Explanation from the agent==
I will use the following tool: `summarizer` to summarize the tutorial text from the given URL.


==Code generated by the agent==
url = "https://mxnet.apache.org/versions/1.9.1/api/faq/distributed_training"
tutorial = text_downloader(url)
summary = summarizer(context=tutorial, query="How to manually launch MXNET jobs?")
print(f"The summary is: {summary}")


==Result==
The summary is: To manually launch MXNet jobs, you can set the required environment variables and execute the training command on each machine. Here is an example for launching jobs locally on Linux/Mac machines:


shell
export COMMAND='python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync'
DMLC_ROLE=server DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 $COMMAND &
DMLC_ROLE=server DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 $COMMAND &
DMLC_ROLE=scheduler DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 $COMMAND &
DMLC_ROLE=worker DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 $COMMAND &
DMLC_ROLE=worker DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 $COMMAND



Make sure to replace `$COMMAND` with the appropriate command for your training job. This example assumes that you have the `image_classification.py` script accessible on all machines in the cluster.

Note that starting all jobs on the same machine is not recommended for production environments.
1199
>>> kai.run(f'''
=== Tutorials ===
{doc}

=== Task to be executed ===
Create a star network with 6 host nodes and 1 ovs node, then run
a MXNET scheduler, two MXNET server, and three MXNET workers on 
the host nodes. The scheduler is launched on host node with IP 192.168.1.22. 
Do not use ListComp and BinOp, write the codes line by line instead.
''')
==Explanation from the agent==
I will use the following tools: `klonet_add_node` to add the host and ovs nodes, `klonet_add_link` to create the network links, and `klonet_command_exec` to run the MXNET scheduler, servers, and workers on the host nodes.


==Code generated by the agent==
# Add the host nodes
h1 = klonet_add_node(name="h1", image="ubuntu")
h2 = klonet_add_node(name="h2", image="ubuntu")
h3 = klonet_add_node(name="h3", image="ubuntu")
h4 = klonet_add_node(name="h4", image="ubuntu")
h5 = klonet_add_node(name="h5", image="ubuntu")
h6 = klonet_add_node(name="h6", image="ubuntu")

# Add the ovs node
s1 = klonet_add_node(name="s1", image="ovs")

# Create network links
klonet_add_link(src_node="h1", dst_node="s1")
klonet_add_link(src_node="h2", dst_node="s1")
klonet_add_link(src_node="h3", dst_node="s1")
klonet_add_link(src_node="h4", dst_node="s1")
klonet_add_link(src_node="h5", dst_node="s1")
klonet_add_link(src_node="h6", dst_node="s1")

# Run MXNET scheduler, servers, and workers on the host nodes
klonet_command_exec(node_name="h1", command="DMLC_ROLE=scheduler DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync")
klonet_command_exec(node_name="h2", command="DMLC_ROLE=server DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync")
klonet_command_exec(node_name="h3", command="DMLC_ROLE=server DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync")
klonet_command_exec(node_name="h4", command="DMLC_ROLE=worker DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync")
klonet_command_exec(node_name="h5", command="DMLC_ROLE=worker DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync")
klonet_command_exec(node_name="h6", command="DMLC_ROLE=worker DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync")


==Result==
Calling Klonet Add Node API ...
A new node (name: h1, image: ubuntu, cpu_limit: None, mem_limit: None) have been added to the network.
Calling Klonet Add Node API ...
A new node (name: h2, image: ubuntu, cpu_limit: None, mem_limit: None) have been added to the network.
Calling Klonet Add Node API ...
A new node (name: h3, image: ubuntu, cpu_limit: None, mem_limit: None) have been added to the network.
Calling Klonet Add Node API ...
A new node (name: h4, image: ubuntu, cpu_limit: None, mem_limit: None) have been added to the network.
Calling Klonet Add Node API ...
A new node (name: h5, image: ubuntu, cpu_limit: None, mem_limit: None) have been added to the network.
Calling Klonet Add Node API ...
A new node (name: h6, image: ubuntu, cpu_limit: None, mem_limit: None) have been added to the network.
Calling Klonet Add Node API ...
A new node (name: s1, image: ovs, cpu_limit: None, mem_limit: None) have been added to the network.
Calling Klonet Add Link API ...
A link with name (None) was added between nodes h1 (ip: ) and s1 (ip: )
Calling Klonet Add Link API ...
A link with name (None) was added between nodes h2 (ip: ) and s1 (ip: )
Calling Klonet Add Link API ...
A link with name (None) was added between nodes h3 (ip: ) and s1 (ip: )
Calling Klonet Add Link API ...
A link with name (None) was added between nodes h4 (ip: ) and s1 (ip: )
Calling Klonet Add Link API ...
A link with name (None) was added between nodes h5 (ip: ) and s1 (ip: )
Calling Klonet Add Link API ...
A link with name (None) was added between nodes h6 (ip: ) and s1 (ip: )
Calling Klonet Exec Command API ...
Feedback from h1 after calling DMLC_ROLE=scheduler DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync.
Calling Klonet Exec Command API ...
Feedback from h2 after calling DMLC_ROLE=server DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync.
Calling Klonet Exec Command API ...
Feedback from h3 after calling DMLC_ROLE=server DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync.
Calling Klonet Exec Command API ...
Feedback from h4 after calling DMLC_ROLE=worker DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync.
Calling Klonet Exec Command API ...
Feedback from h5 after calling DMLC_ROLE=worker DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync.
Calling Klonet Exec Command API ...
Feedback from h6 after calling DMLC_ROLE=worker DMLC_PS_ROOT_URI=192.168.1.22 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=3 python example/gluon/image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync.

This tool is currently under development. Please stay tuned for updates!

About

KlonetAI: An AI agent for intelligent interaction with Klonet.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages