AutoGLM Open-Source Tutorial: How to Build a Free AI-Powered Mobile Agent
Zhipu AI has recently open-sourced its mobile AI Agent framework AutoGLM, enabling developers to build intelligent systems that can visually understand smartphone screens and execute real interactions on Android devices using natural language commands.
Unlike traditional automation tools that rely on fixed scripts, AutoGLM allows an AI model to observe the screen, reason about UI elements, plan actions, and operate real mobile applications dynamically — all through a fully open-source workflow.
This guide explains how to deploy and use AutoGLM entirely for free, without requiring any commercial API subscriptions.
Project Repository:
https://github.com/zai-org/Open-AutoGLM
1. What Makes AutoGLM Different?
AutoGLM is not a standard chatbot or macro tool. It combines:
- Multimodal large language models
- Visual UI understanding
- Real-time Android control
- Autonomous task planning
With AutoGLM, an AI agent can:
- Recognize buttons, text, and layout on a phone screen
- Decide what step to take next
- Perform real taps, scrolls, and text input
- Complete full workflows across mobile apps
This transforms mobile devices into AI-operable systems, rather than manually driven tools.
2. System Requirements
To run AutoGLM locally, you need:
- A computer running Windows, macOS, or Linux
- An Android phone (Android 7.0 or later)
- Python 3.10 or higher
- ADB (Android Debug Bridge)
- A stable network connection
A GPU improves inference speed but is not mandatory for basic testing.
3. Preparing the Python Environment
First, verify that Python is installed:
python --version
If not installed, download it from:
4. Installing Android Debug Bridge (ADB)
ADB allows direct interaction between your computer and mobile device.
Download Android Platform Tools from:
https://developer.android.com/tools/releases/platform-tools
Verify installation:
adb version
5. Enabling USB Debugging on Android
Follow these steps on your phone:
-
Open Settings
-
Navigate to About Phone
-
Tap Build Number repeatedly until Developer Mode activates
-
Enable USB Debugging inside Developer Options
Connect your phone and confirm recognition:
adb devices
6. Installing ADB Keyboard for Automated Input
AutoGLM requires a special virtual keyboard for programmatic typing.
Steps:
-
Download the ADB Keyboard APK from the AutoGLM GitHub repo
-
Install it on your device
-
Enable it as a system input method
This allows the AI agent to enter text across different applications.
7. Downloading and Installing AutoGLM
Clone the repository:
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
Install dependencies:
pip install -r requirements.txt
pip install -e .
8. Running the AutoGLM Model Locally
AutoGLM currently provides two main models:
-
AutoGLM-Phone-9B (Chinese-tuned)
-
AutoGLM-Phone-9B-Multilingual (global language support)
To serve the model locally using vLLM:
python3 -m vllm.entrypoints.openai.api_server \
--served-model-name autoglm-phone \
--model zai-org/AutoGLM-Phone-9B \
--port 8000
Once running, your local inference API endpoint becomes:
http://localhost:8000/v1
9. Sending Commands to the AI Agent
Interactive Execution Mode
python main.py --base-url http://localhost:8000/v1 --model autoglm-phone
Then issue commands such as:
Open Google Maps and search nearby restaurants
Your Android device will perform the action automatically.
One-Shot Command Execution
python main.py --base-url http://localhost:8000/v1 "Open YouTube and play trending videos"
Python SDK Integration Example
from phone_agent import PhoneAgent
from phone_agent.model import ModelConfig
config = ModelConfig(
base_url="http://localhost:8000/v1",
model_name="autoglm-phone"
)
agent = PhoneAgent(model_config=config)
agent.run("Open Gmail and refresh inbox")
This mode is ideal for building automation pipelines and experiments.
10. Wireless Control Over Local Network
To operate the phone without USB:
adb connect 192.168.1.10:5555
After connection, AutoGLM can operate the device remotely. This setup is widely used for:
-
Remote device labs
-
Distributed testing
-
Continuous mobile automation
11. Supported Application Types
AutoGLM works with a growing list of mainstream applications, including:
-
Messaging platforms
-
E-commerce apps
-
Navigation and travel services
-
Short-video and streaming platforms
You can display all supported apps using:
python main.py --list-apps
12. Practical Use Cases
AutoGLM can be applied to:
-
Automated mobile testing
-
Cross-app workflow automation
-
Data collection and monitoring
-
Accessibility assistance
-
AI research on mobile interaction
-
Large-scale device orchestration
It enables real-world AI-driven mobile operation, not just simulated environments.
Frequently Asked Questions (FAQ)
Is AutoGLM free for personal and commercial use? Yes. AutoGLM is fully open-source and can be used for both personal and commercial projects under its license terms.
Does AutoGLM require cloud APIs? No. All inference can run locally. You may optionally connect external APIs if needed.
Can AutoGLM run on CPU only? Yes, but execution and reasoning will be slower than GPU-based setups.
Is iOS supported? No. AutoGLM relies on ADB and currently supports Android devices only.
Does AutoGLM upload phone data automatically? No. All visual analysis and reasoning occur locally unless explicitly routed to external services.
