Performing search#
The Tonita search API offers the ability to perform search in two ways:
Given a query string, Tonita will return the most relevant listings. Restrictions based on facet value and category can also be applied to narrow the set of eligible listings and make search more precise.
Given the ID of a listing, Tonita will return the listings most similar to it in the vector space we’ve constructed for your data.
Before proceeding with search, we first need a corpus populated with listings to search over. Please see our guides on corpora and listings, or see the Quickstart.
Note
Search is a corpus-specific operation. This means that whenever a search is performed, it is performed only over the listings in a single specified corpus. In particular, you cannot perform search over the listings of multiple corpora in a single API call.
However, developers have complete control over how their corpora are organized, and are therefore free to create a corpus that contains whichever listings they like.
Performing search given a query string#
Suppose you want to find the listings most relevant to a given query string. Let’s start with a simple example:
tonita.search(
query='sunny 1 bedroom on a quiet street near parks',
max_results=2,
categories=["apartment"],
corpus_id="new_york"
)
Here, we simply want the two listings most relevant to our query string. The results will be returned in a SearchResponse
object as follows:
# Example return value:
# SearchResponse(
# items=[
# SearchResponseItem(
# listing_id="qd12309mc",
# score=0.92,
# categories=["apartment", "co-op"],
# snippets=[
# Snippet(
# display_string="Oversized windows face south."
# )
# Snippet(
# display_string="Only two blocks from Central Park!"
# )
# ],
# ),
# SearchResponseItem(
# listing_id="bn32358ss",
# score=0.88,
# categories=["apartment", "condo"],
# snippets=[
# Snippet(
# display_string="Overlooks a beautiful park."
# )
# Snippet(
# display_string="Located on a high floor."
# )
# ]
# )
# ]
# )
A SearchResponse
contains the search results in the form of an array of SearchResponseItem
s. Each SearchResponseItem
contains the following information for a single relevant listing:
listing_id
: The ID of the listing.score
: The relevance score of the listing.categories
: The matching categories that the listings belong to (currently these are not populated inSearchResponse
).snippets
: Information extracted from the listing’s data that explain the listing’s relevance. Note thatSearchResponseItem
s are sorted in descending order of relevance score.
Performing retrieval only#
Search can be thought of as progressing in two stages:
A retrieval stage, where listings are retrieved along with raw scores;
A rescoring stage, where we refine the scores of the listings that were retrieved.
The retrieval stage is very fast, whereas the rescoring stage can take more time. In order to perform retrieval only, set the retrieval_only
flag of tonita.search()
to True
in a given call:
tonita.search(
query='sunny 1 bedroom on a quiet street near parks',
max_results=2,
categories=["apartment"],
retrieval_only=True,
corpus_id="new_york"
)
Attention
At this time, the retrieval_only
flag is applicable only for searches with a query, and its default value is False
.
For searches where a listing ID is specified, only raw scores will ever be returned. Therefore, the retrieval_only
flag does not apply; richer rescoring options are coming soon.
Note, however, that the ranking of the listings will typically change after scores are refined in the rescoring stage.
Performing search given a listing ID#
The search API also allows us to search by providing a listing ID (in place of a query). In this case, we will return the listings most like the one specified. More techncially, the listings we retrieve will be similar in a real-world sense, as captured in the vector space we’ve constructed for your data.
To find listings similar to some listing (say, with ID "foo"
), simply pass that listing’s ID to tonita.search()
:
tonita.search(
listing_id="foo",
max_results=2,
corpus_id="new_york"
)
Here, we are asking for the two listings most similar to the listing with ID “foo” in vector space. The return value will be a SearchResponse
, just as above.
Restrictions#
Category restrictions#
Search can be made more precise by specifying category restrictions. Recall that we can specify categories to associate with each listing we upload (see Managing listings). We can specify the categories whose listings we’re interested in for a given search in our call to tonita.search()
.
Let’s go back to our previous search, and suppose that we’re interested in condominium apartments specifically. We specify this in the call:
tonita.search(
query='sunny 1 bedroom on a quiet street near parks',
max_results=2,
categories=["condo"],
corpus_id="my_corpus_id"
)
The results will now only contain those listings that belong to the “condo” category:
# Example return value:
# SearchResponse(
# items=[
# SearchResponseItem(
# listing_id="bn32358ss",
# score=0.88,
# categories=["apartment", "condo"],
# snippets=[
# Snippet(
# display_string="Overlooks a beautiful park."
# )
# Snippet(
# display_string="Located on a high floor."
# )
# ]
# ),
# SearchResponseItem(
# listing_id="ql81799cs",
# score=0.71,
# categories=["apartment", "condo"],
# snippets=[
# Snippet(
# display_string="Quiet with nearby green space."
# )
# ],
# ),
# ]
# )
We can also apply category restrictions for similar listings search:
tonita.search(
listing_id="wb78321mn",
max_results=2,
categories=["apartment", "condo"],
corpus_id="my_corpus_id"
)
Facet restrictions#
We can also make our search results more precise by specifying facet restrictions.
Facets can be thought of as properties of the entity that a listing represents. For example, consider a listing for an apartment. Facets might include the number of bedrooms, the number of bathrooms, its price, whether it has central A/C, etc. You can thus think of facet restrictions as applying filters based on these properties.
Recall that facet data is provided when you first upload a listing. Tonita will work with you to determine exactly which data you upload will be used as facets that are eligible for restrictions.
Facet restrictions are a way to express preference for listings with certain facet values; listings that satisfy more facet restrictions will tend to have higher scores. We also optionally provide a way to specify exactly how much weight search should assign to each restriction, allowing you greater control over your search results.
Facet restrictions are specified by passing an array of dictionaries to the facet_restrictions
parameter when calling tonita.search()
. Each restriction dictionary can have the following keys:
"name"
: Required. The name of the facet. For example, this might be “price” or “num_bedrooms” or “genre”."type"
: Required. The type of the facet’s value across listings. Valid values are:"STRING"
"NUMERIC"
"BOOLEAN"
This will determine how the facet value is handled (i.e., what operations are valid for it). For example, a numeric “price” facet can be compared ordinally, whereas that may not be the case for a string “genre”.
"operation"
: Required. The operation for the restriction. For example, if the value of some numeric facet must be greater than or equal to 9, the operation would be"GREATER_THAN_EQUAL"
. Valid values and the facet types they can be applied to are:"EQUAL"
: numeric, string, boolean"LESS_THAN"
: numeric"LESS_THAN_EQUAL"
: numeric"GREATER_THAN"
: numeric"GREATER_THAN_EQUAL"
: numeric"ONE_OF"
: string
"value"
: Required. The value used for the restriction. Note that this is not necessarily the same as the value of the facet for listings. For example, a restriction that movies can be any one of three genres would have “value” be["comedy", "drama", "thriller"]
, but the facet value for a given movie that satisfies the restriction might just be “drama” (assuming every movie only has one genre). On the other hand, a restriction that the facet for “director” be “Jane Doe” would set the “value” field of the restriction to “Jane Doe”."weight"
: Optional. An importance weight for the facet. This weight does not need to be normalized across facet restrictions. Weights must be provided either for all of the restrictions or for none of them. If no weights are provided, equal weights across restrictions will be assumed. Facets with zero or negative weight will be ignored.
Consider the following example set of facet restrictions for books:
facet_restrictions = [
{
"name": "pages",
"type": "NUMERIC",
"operation": "LESS_THAN_EQUAL",
"value": 500,
"weight": 3.14,
},
{
"name": "language",
"type": "STRING",
"operation": "ONE_OF",
"value": ["english", "portuguese", "korean"],
"weight": 2.72,
},
]
Here, we have two restrictions: one that the number of pages of the book be less than 500, and one that the language of the book must be either English, Portuguese, or Korean. The importance weights denote that the restriction on the number of pages is slightly more important than the restriction on language.
To perform search with these restrictions, simply pass them in with the call:
tonita.search(
query='18th century classic where the main character has a redemption arc',
max_results=10,
facet_restrictions=facet_restrictions,
corpus_id="books"
)