Framework for building LLM agent benchmark environments in a Python-centric way.
Crab is a comprehensive framework designed by Camel AI for building and benchmarking environments tailored for large language model (LLM) agents. The platform supports the creation of cross-platform environments, allowing for deployment across in-memory systems, Docker-hosted environments, virtual machines, or distributed physical machines. Crab provides an easy-to-use Python-centric interface for defining agent environments and actions, making it flexible for various use cases. Additionally, it includes a novel benchmarking suite that provides fine-grained evaluation metrics.
-
Cross-platform & Multi-environment Deployment,
-
Unified Interface for Environment Access,
-
Python-native Configuration,
-
Novel Benchmarking Suite,
-
Fine-grained Graph Evaluator.
-
Benchmarking LLM Agents,
-
Cross-environment Testing,
-
Multimodal Data Handling,
-
Agent Environment Simulation,
-
Python-based Agent Development.