End-to-end web agent powered by large multimodal models for real-world task automation
WebVoyager is an innovative web agent that utilizes large multimodal models (LMM) to autonomously complete complex web tasks. It processes user instructions, observes screenshots and textual content, formulates actions, and executes them on real websites. WebVoyager outperforms existing solutions by handling multiple input modalities and interacting with actual web environments, making it highly effective for various real-world applications