AutoGLM Open-Source Tutorial: How to Build a Free AI-Powered Mobile Agent

Zhipu AI has recently open-sourced its mobile AI Agent framework AutoGLM, enabling developers to build intelligent systems that can visually understand smartphone screens and execute real interactions on Android devices using natural language commands.

Unlike traditional automation tools that rely on fixed scripts, AutoGLM allows an AI model to observe the screen, reason about UI elements, plan actions, and operate real mobile applications dynamically — all through a fully open-source workflow.

This guide explains how to deploy and use AutoGLM entirely for free, without requiring any commercial API subscriptions.

Project Repository:
https://github.com/zai-org/Open-AutoGLM

1. What Makes AutoGLM Different?

AutoGLM is not a standard chatbot or macro tool. It combines:

Multimodal large language models
Visual UI understanding
Real-time Android control
Autonomous task planning

With AutoGLM, an AI agent can:

Recognize buttons, text, and layout on a phone screen
Decide what step to take next
Perform real taps, scrolls, and text input
Complete full workflows across mobile apps

This transforms mobile devices into AI-operable systems, rather than manually driven tools.

2. System Requirements

To run AutoGLM locally, you need:

A computer running Windows, macOS, or Linux
An Android phone (Android 7.0 or later)
Python 3.10 or higher
ADB (Android Debug Bridge)
A stable network connection

A GPU improves inference speed but is not mandatory for basic testing.

3. Preparing the Python Environment

First, verify that Python is installed:

python --version

If not installed, download it from:

https://www.python.org

4. Installing Android Debug Bridge (ADB)

ADB allows direct interaction between your computer and mobile device.

Download Android Platform Tools from:

https://developer.android.com/tools/releases/platform-tools

Verify installation:

adb version

5. Enabling USB Debugging on Android

Follow these steps on your phone:

Open Settings
Navigate to About Phone
Tap Build Number repeatedly until Developer Mode activates
Enable USB Debugging inside Developer Options

Connect your phone and confirm recognition:

adb devices

6. Installing ADB Keyboard for Automated Input

AutoGLM requires a special virtual keyboard for programmatic typing.

Steps:

Download the ADB Keyboard APK from the AutoGLM GitHub repo
Install it on your device
Enable it as a system input method

This allows the AI agent to enter text across different applications.

7. Downloading and Installing AutoGLM

Clone the repository:

git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM

Install dependencies:

pip install -r requirements.txt
pip install -e .

8. Running the AutoGLM Model Locally

AutoGLM currently provides two main models:

AutoGLM-Phone-9B (Chinese-tuned)
AutoGLM-Phone-9B-Multilingual (global language support)

To serve the model locally using vLLM:

python3 -m vllm.entrypoints.openai.api_server \
  --served-model-name autoglm-phone \
  --model zai-org/AutoGLM-Phone-9B \
  --port 8000

Once running, your local inference API endpoint becomes:

http://localhost:8000/v1

9. Sending Commands to the AI Agent

Interactive Execution Mode

python main.py --base-url http://localhost:8000/v1 --model autoglm-phone

Then issue commands such as:

Open Google Maps and search nearby restaurants

Your Android device will perform the action automatically.

One-Shot Command Execution

python main.py --base-url http://localhost:8000/v1 "Open YouTube and play trending videos"

Python SDK Integration Example

from phone_agent import PhoneAgent
from phone_agent.model import ModelConfig

config = ModelConfig(
    base_url="http://localhost:8000/v1",
    model_name="autoglm-phone"
)

agent = PhoneAgent(model_config=config)
agent.run("Open Gmail and refresh inbox")

This mode is ideal for building automation pipelines and experiments.

10. Wireless Control Over Local Network

To operate the phone without USB:

adb connect 192.168.1.10:5555

After connection, AutoGLM can operate the device remotely. This setup is widely used for:

Remote device labs
Distributed testing
Continuous mobile automation

11. Supported Application Types

AutoGLM works with a growing list of mainstream applications, including:

Messaging platforms
E-commerce apps
Navigation and travel services
Short-video and streaming platforms

You can display all supported apps using:

python main.py --list-apps

12. Practical Use Cases

AutoGLM can be applied to:

Automated mobile testing
Cross-app workflow automation
Data collection and monitoring
Accessibility assistance
AI research on mobile interaction
Large-scale device orchestration

It enables real-world AI-driven mobile operation, not just simulated environments.

Frequently Asked Questions (FAQ)

Is AutoGLM free for personal and commercial use? Yes. AutoGLM is fully open-source and can be used for both personal and commercial projects under its license terms.

Does AutoGLM require cloud APIs? No. All inference can run locally. You may optionally connect external APIs if needed.

Can AutoGLM run on CPU only? Yes, but execution and reasoning will be slower than GPU-based setups.

Is iOS supported? No. AutoGLM relies on ADB and currently supports Android devices only.

Does AutoGLM upload phone data automatically? No. All visual analysis and reasoning occur locally unless explicitly routed to external services.