Anthropic Automates Prompt Engineering with New Claude Features

In 2023, prompt engineering emerged as a sought-after role in the AI sector. However, Anthropic appears to be developing tools that could partially automate this process.

On Tuesday, Anthropic introduced new features designed to help developers create more effective applications using their language model, Claude. These features, detailed in a company blog post, allow developers to utilize Claude 3.5 Sonnet to generate, test, and evaluate prompts. Developers can craft improved inputs and enhance Claude’s responses for specialized tasks by employing prompt engineering techniques.

While language models are generally adaptable when given various tasks, minor adjustments to prompt wording can significantly improve outcomes. Typically, one would need to determine this optimal wording independently or employ a prompt engineer. However, this new feature provides rapid feedback, potentially streamlining the process of identifying improvements.

Anthropic has introduced a new Evaluate tab within their Console, which serves as an experimental platform for developers aiming to attract businesses looking to create products using Claude. This addition complements the prompt generator feature launched in May, which expands brief task descriptions into comprehensive prompts using Anthropic’s prompt engineering methods. While these tools may not completely replace prompt engineers, Anthropic suggests they will assist newcomers and streamline work for experienced professionals.

The Evaluate section allows developers to assess their AI application’s prompt effectiveness across various scenarios. Users can either upload real-world examples to a test suite or request Claude to generate diverse AI-created test cases. This feature enables side-by-side comparisons of different prompts’ effectiveness and includes a five-point scale for rating sample responses.

prompt being fed generated data to find good and bad responses

A prompt being fed generated data to find good and bad responses.

Anthropic’s blog post illustrates a scenario where a developer noticed their application was producing overly brief responses across multiple test cases. The developer was able to modify a single line in their prompt to generate longer answers and apply this change simultaneously to all test cases. This capability could significantly reduce time and effort for developers, particularly those with limited prompt engineering expertise.

In an interview at Google Cloud Next earlier this year, Anthropic’s CEO and co-founder, Dario Amodei, emphasized the critical role of prompt engineering in driving widespread enterprise adoption of generative AI. Amodei stated, “It sounds simple, but 30 minutes with a prompt engineer can often make an application work when it wasn’t before,”