Evaluating search results#

The Tonita API also allows you to easily evaluate search performance for a batch of search requests. Our evaluation API determines the ground-truth relevance of returned listings and computes relevant summary statistics (e.g., precision@k).

To perform an evaluation, simply package the search requests you want to evaluate into an array of tonita.datatypes.search.SearchRequests:

from tonita.datatypes.eval import EvalStatus
from tonita.datatypes.search_request import SearchRequest

# Create array of search requests to evaluate.
search_requests = [
    SearchRequest(
        query='sunny 1 bedroom on a quiet street near parks',
        categories=["apartment"]
    ),
    SearchRequest(
        query="brownstone with a large parlor floor for entertaining", 
        categories=["townhouse"]
    )
]

You’ll notice that the fields of SearchRequest are identical to the arguments you pass when performing search using tonita.search(). The exception is that, here, we do not provide a value for max_results. As of this writing, the Tonita evaluations API automatically computes precision@k for the following pre-selected values of k: 1 through 10, then in intervals of 5 until 25.

With your SearchRequests in hand, they can then be packaged into a SubmitEvalRequest (along with any email addresses we should notify of the evaluation’s completion) and submitted:

tonita.eval.submit(
    search_requests=search_requests,
    notification_email_addresses=["[email protected]"]
)

# Example return value:
# SubmitEvalResponse(
#     eval_id="b8c2abf34217"
# )

Evaluations can take time, especially for evaluations that contain many search requests. Therefore, when you submit an evaluation, we return to you an ID for the evaluation. In the example above, this ID is "b8c2abf34217". This ID can then be passed to tonita.eval.retrieve() to check the status of an evaluation:

tonita.eval.retrieve(eval_id="b8c2abf34217")

# Example return value:
# RetrieveEvalResponse(
# status=<EvalStatus.SUBMITTED: 1>, query_results=None
# )

Note, however, that if any email addresses were provided when submitting the evaluation, Tonita will send a notification email to those addresses as soon as evaluation is complete.

If the evaluation is complete, passing the ID will retrieve the evaluation results. (Note that the following example does not correspond to the search requests we submitted above.)

tonita.eval.retrieve(eval_id="ee776e02c4a7")

# RetrieveEvalResponse(
#     status=<EvalStatus.COMPLETED: 3>,
#     query_results=[
#         QueryResult(
#             search_request=SearchRequest(
#                 query="asian-inspired vegetarian dish",
#                 max_results=10,
#                 categories=None,
#                 facet_restrictions=None,
#             ),
#             metrics=Metrics(precision_at_k={"1": 1.0, "2": 1.0}),
#             listing_results=[
#                 ListingResult(
#                     listing_id="887123",
#                     rank=0,
#                     score=1.206,
#                     rating=True,
#                 ),
#                 ListingResult(
#                     listing_id="592823",
#                     rank=1,
#                     score=1.183,
#                     rating=True,
#                 )
#             ]
#         ),
#         QueryResult(
#             search_request=SearchRequest(
#                 query="light lunch option with bread but is not a sandwich",
#                 max_results=10,
#                 categories=["beginner-friendly"],
#                 facet_restrictions=None,
#             ),
#             metrics=Metrics(
#                 precision_at_k={
#                     "1": 1.0,
#                     "2": 0.5,
#                 }
#             ),
#             listing_results=[
#                 ListingResult(
#                     listing_id="222905",
#                     rank=0,
#                     score=1.0438610315322876,
#                     rating=True,
#                 ),
#                 ListingResult(
#                     listing_id="17824",
#                     rank=1,
#                     score=1.0326957702636719,
#                     rating=False,
#                 )
#             ]
#         )
#     ]
# )

Here, we have the evaluation results for two search requests performed on a corpus containing cooking recipes. Each search request’s information is packaged in a QueryResult dataclass, which contains the search request, the search results, and precision@k for multiple values of k.