Gen AI Application Evaluation with OpenAI Platform: A Practical Tutorial

Dec 05, 2024

Before deploying your application to production, it is critical to run evaluations to make sure your application prompt is accurate and reliable. In this post, let us learn how to run automated evaluations on our Gen AI application “ClearComms” using the Open AI API platform. (what is ClearComms? — see here).

First and foremost, create an account on the OpenAI platform and get an API key.

Test Data

Here is the ClearComms Sample Data with input/output pairs in a Google sheet.

Evaluation Steps

Step 1: Provide the system prompt in the “Playground” tab and dry run with some test data

Option1: Use Gen AI to create your prompt
Option2: If you already have a prompt, just use it directly

Step 2: Create a new evaluation and import the test data into the platform

Under the Dashboard tab, click evaluations in the left side panel to create a new evaluation

Import the test data into the platform

Step 3: Create some test criteria

I have created string checks, semantic checks and criteria match checks. Here are the configurations of some of these checks I used for those evaluation criteria —

Semantic Similarity Check

Salutation Check

Signature Check

String Check

Step 4: Run the tests

Click the Run button below to run the test criteria against the Google sheet test data

Evaluation Round 1

Evaluation Summary

Evaluation detailed results: Some tests failed!

Evaluation Round 2

I refined the prompt a couple of times to handle the checks and re-ran the tests.

Evaluation Summary:

Evaluation Detailed Results:

All tests passed!

_________________________________

How to build ClearComms App

Coming soon…

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts