Managing listings#

This section walks through how to work with listings.

Every listing belongs to a corpus. Therefore, before proceeding with this section on listings, see the guide section on corpora.

Adding listings#

Suppose we have an active corpus and we now want to add listings to it.

Note

Recall that a corpus is automatically active at creation, and that listings can only be added to active corpora.

This can be done by calling tonita.listings.add(). But before getting to uploading listings data, let’s talk about how the data should be formatted.

Formatting your data#

No matter whether we’re adding data using an in-memory dictionary or by reading data from a file, the listings data to add must be formatted as a map in JSON format as follows:

Each key should be a string for the ID of the listing. The value associated with a particular key should itself be a dict with the following keys:

"data" is a required key that should map to a dict containing the listing’s data. You have significant flexibility in choosing what type of data about your listings to provide in this dict. For example, you may choose to provide data about facets of your listing in the form of key-value pairs, which can then be used to retrieve more precise search results. You may also choose to provide arrays of text, or structured data in the form of yet more maps. Tonita will work with you to determine how best to handle the data you provide us.
"categories" is an optional key that should specify one or more category strings that can be attached to each listing. These category strings can also be used to restrict searches within a corpus by specifying the list of categories allowed during search. What constitutes a category is entirely up to you; a product search engine may use categories to represent different types of products (e.g., household appliances, furniture, clothing, etc.), while a vacation rental company may use categories to represent different neighborhoods or types of home (e.g., apartment, cabin, villa, etc.).

Attention

The data that you provide must be JSON-serializable. We cannot process arbitrary Python objects.

Let’s take a look at an example JSON.

{
    "h197233sm": {
        "data": {
            "bedrooms": 1,
            "bathrooms": 1,
            "in_unit_laundry": false,
            "neighborhood": "Hell's Kitchen",
            "price": 400,
            "notes": [
                "superintendent is very responsive",
                "rooftop needs maintenance"
            ]
        },
        "categories": ["apartment", "co-op"]
    },
    "h298921md": {
        "data": {
            "bedrooms": 1,
            "bathrooms": 1,
            "in_unit_laundry": true,
            "neighborhood": "Williamsburg",
            "price": 800,
            "reviews": [
                {
                    "user": "foo_user",
                    "rating": 5,
                    "review_text": "really awesome place"
                },
                {
                    "user": "baz_user",
                    "rating": 4.5,
                    "review_text": "Would stay again!"
                }
            ]
        },
        "categories": ["apartment", "condo"]
    }
}

Here, we’re adding two listings: "h197233sm" and "h298921md". You’ll notice that both listings provide values for the "data" and "categories" keys and that the two listings have some facet data in common: the number of bedrooms, the number of bathrooms, whether there’s in-unit laundry, the neighborhood, and the price.

However, notice that while "h197233sm" keeps an array of notes in its data, "h298921md" keeps an array of reviews.

This is fine! While it may be useful in some cases for listings in the same corpus to share identical fields, there’s no requirement that they do. Even with the relatively more complex "notes" and "reviews" data provided, this entire structure is still a valid JSON that respects the rules stated above, and can therefore be processed just fine. Now let’s see how we can upload this data.

Uploading your data#

Listings can be added in one of two ways by calling tonita.listings.add():

By providing an in-memory dictionary.
By providing a path to a JSON file on disk.

Note

Listing IDs must be unique within a corpus. This means that two listings are free to share the same ID as long as they do not belong in the same corpus.

Attention

Uploading data for a listing under an ID that already exists in the corpus will overwrite the previous data; see Updating existing data.

Suppose that we’ve stored our data above for the listings “h197233sm” and “h298921md” in a variable, data_for_two_listings. This data can then be uploaded to the corpus you specify through the following call:

tonita.listings.add(
    data=data_for_two_listings,
    corpus_id="my_corpus_id"
)

Note

To make clear which corpus we are working with, we passed the corpus ID in with the call. However, you may find it convenient to set the corpus ID at the beginning of a series of calls having to do with that corpus. See our guide on providing API keys and corpus IDs for more information.

You may upload any number of listings this way, though if you’re planning to upload a very large batch of listings, we recommend:

Uploading your data in smaller batches (between 1000 and 10000, depending on the size of your data objects);
Passing a requests.Session with your call. See our guide on using Session objects.

You may also upload your data from a JSON file, or from a directory containing JSON files.

For example, suppose that the data for the two listings above is stored in a file with path path/to/data_for_two_listings.json. We can upload the data by passing this file path:

tonita.listings.add(
    json_path="path/to/data_for_two_listings.json",
    corpus_id="my_corpus_id"
)

Alternatively, suppose that data for many listings are split across multiple JSON files in a directory. To upload data from all JSON files in this directory, simply pass the directory’s path to json_path.

Attention

If providing a directory path when uploading listings, note that the directory must contain only valid JSON files. If any file that is unable to be parsed by JSON format is present in the directory, the upload will be halted and an error will be raised.

Note

For splitting a large JSON file into chunks, we provide the tonita.split_listing_batch() utility with our library.

However you choose to upload your data, tonita.listings.add() will return a AddListingsResponse. This object maps the ID of each listing you attempted to upload to a AddSingleListingResult, which indicates whether that listing’s data was successfully uploaded and, if unsuccessful, the reason why the upload failed.

# Example return value:
# AddListingsResponse(
#     results={
#         "h197233sm": AddSingleListingResult(
#                success=True,
#                error_messsage=""
#         ),
#         "h298921md": AddSingleListingResult(
#                success=True,
#                error_messsage=""
#         )
#     }
# )

Updating existing data#

A given listing can be updated by uploading data under the same listing ID. That is, a listing with stale data does not have to be deleted first.

However, it is important to note that uploading data under the same ID as a listing that already exists will OVERWRITE THE PREVIOUS DATA FOR THAT LISTING ID.

The state of a listing#

Every listing has a state, which can either be active or inactive. When a listing is first uploaded, its status is active. Deleting a listing will set its status to inactive. Recovering an inactive listing will set its status to active. After seven (7) days of continuous inactivity, inactive listings “expire”. Expired listings will be removed and can no longer be recovered.

Listing#

We can list all of the listings in a given corpus by calling tonita.listings.list().

For example, suppose our corpus contains three listings: "listing_1", "listing_2", and "listing_3". We can list them:

tonita.listings.list(corpus_id="my_corpus_id")

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_1': <State.ACTIVE: 'ACTIVE'>, 
#         'listing_2': <State.ACTIVE: 'ACTIVE'>,
#         'listing_3': <State.ACTIVE: 'ACTIVE'>
#     },
#     next_listing_id=None
# )

The return value will be a ListListingsResponse whose results field will be a map from listing IDs to their respective states. These results will be sorted in lexicographical order of listing ID.

Limiting the number of listings and pagination#

By default, tonita.listings.list() returns at most 1000 listings. This can be toggled by passing a value for limit in the call. For example, if we only want to list the first two listings (in terms of lexicographical order), we can set limit=2:

tonita.listings.list(limit=2, corpus_id="my_corpus_id")

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_1': <State.ACTIVE: 'ACTIVE'>, 
#         'listing_2': <State.ACTIVE: 'ACTIVE'>,
#     },
#     next_listing_id='listing_3'
# )

Notice that the ListListingsResponse also contains a field called next_listing_id. Since we only listed the first two listings in the corpus in our previous call, this field indicates the next listing ID in lexicographical order. If there are no more listings that come after those displayed in terms of lexicographical order, next_listing_id will be None.

We may also choose to start listing the items in our corpus from somewhere other than the beginning (in terms of lexicographical order). For example, suppose we wanted to list starting from "listing_2". We would indicate this by passing that ID to the start_listing_id parameter:

tonita.listings.list(start_listing_id="listing_2", corpus_id="my_corpus_id")

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_2': <State.ACTIVE: 'ACTIVE'>, 
#         'listing_3': <State.ACTIVE: 'ACTIVE'>,
#     },
#     next_listing_id=None
# )

We see that, because there is no listing with ID after "listing_3" in terms of lexicographical order, next_listing_id is None.

Together, the start_listing_id parameter and the next_listing_id field in the ListListingsResponse allow you to “page” through listings in a given corpus. For example, suppose we have a corpus with the following listings:

tonita.listings.list(corpus_id="my_corpus_id")

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_1': <State.ACTIVE: 'ACTIVE'>, 
#         'listing_2': <State.ACTIVE: 'ACTIVE'>,
#         'listing_3': <State.ACTIVE: 'ACTIVE'>
#         'listing_4': <State.ACTIVE: 'ACTIVE'>
#         'listing_5': <State.ACTIVE: 'ACTIVE'>
#     },
#     next_listing_id=None
# )

We can page through these listings, two at a time, as follows:

tonita.listings.list(limit=2, corpus_id="my_corpus_id")

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_1': <State.ACTIVE: 'ACTIVE'>, 
#         'listing_2': <State.ACTIVE: 'ACTIVE'>,
#     },
#     next_listing_id='listing_3'
# )

tonita.listings.list(
    start_listing_id="listing_id_3", 
    limit=2, 
    corpus_id="my_corpus_id"
)

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_3': <State.ACTIVE: 'ACTIVE'>, 
#         'listing_4': <State.ACTIVE: 'ACTIVE'>,
#     },
#     next_listing_id='listing_5'
# )

tonita.listings.list(
    start_listing_id="listing_id_5", 
    limit=2, 
    corpus_id="my_corpus_id"
)

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_5': <State.ACTIVE: 'ACTIVE'>,
#     },
#     next_listing_id=None
# )

Viewing#

You can view the data for a batch of listings by calling tonita.listings.get(). Suppose, for example, that we added two listings with the following data:

{
    "h801923vs": {
        "data": {
            "bedrooms": 0,
            "bathrooms": 1,
            "price": 200,
        },
        "categories": ["apartment", "co-op"]
    },
    "h662852lg": {
        "data": {
            "bedrooms": 3,
            "bathrooms": 2,
            "price": 4000,
        },
    }
}

tonita.listings.get(
    listing_ids=["h801923vs", "h662852lg"], 
    corpus_id="my_corpus_id"
)

# GetListingsResponse(
#     results={
#         "h801923vs": GetSingleListingResult(
#             success=True,
#             data={
#                 "categories": ["apartment", "co-op"],
#                 "data": {"bedrooms": 0, "bathrooms": 1, "price": 200},
#             },
#             state=<State.ACTIVE: 'ACTIVE'>,
#             seconds_to_expiration=None,
#             error_message="",
#         ),
#         "h662852lg": GetSingleListingResult(
#             success=True,
#             data={
#                 "categories": None,
#                 "data": {"bedrooms": 3, "bathrooms": 2, "price": 4000},
#             },
#             state=<State.ACTIVE: 'ACTIVE'>,
#             seconds_to_expiration=None,
#             error_message="",
#         )
#     }
# )

The return value will be a GetListingsResponse, whose results field will consist of an array of GetSingleListingResults. Each GetSingleListingResult will contain information for a single listing and will contain the following fields:

success: A boolean with value True if this listing was successfully retrieved. This is False if the listing does not exist for this corpus, or if there was an error in retrieving it.
data: This will be a dict that contains two keys: ”data” and “categories”. The corresponding values will be exactly those values provided for this listing when adding it to the corpus. This is None if success is False.
state: The state of the listing. This will be None if success is False.
seconds_to_expiration: If state indicates that the corpus is inactive, this value will indicate the amount of time (in seconds) before the listing becomes unrecoverable.
error_message: If there was an issue with retrieving the data for the listing, this field will contain an error message explaining what went wrong.

Deleting and recovering#

Listings can be deleted in batches. Suppose we have the following listings:

tonita.listings.list(corpus_id="my_corpus_id")

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_1': <State.ACTIVE: 'ACTIVE'>, 
#         'listing_2': <State.ACTIVE: 'ACTIVE'>,
#         'listing_3': <State.ACTIVE: 'ACTIVE'>
#     },
#     next_listing_id=None
# )

Let’s delete the first two listings:

tonita.listings.delete(
    listing_ids=["listing_id1", "listing_id2"], 
    corpus_id="my_corpus_id"
)

If the deletion was successful, you will receive a DeleteListingResponse:

# Example return value:
# DeleteListingsResponse(
#     results={
#         'listing_1': DeleteSingleListingResult(
#             success=True, 
#             error_message=''
#         ),
#         'listing_2': DeleteSingleListingResult(
#             success=True, 
#             error_message=''
#         )
#     }
# )

The sole field of the returned DeleteListingsResponse is results: this will be a dictionary that maps IDs of listings that were (attempted to be) deleted to DeleteSingleListingResults. Each DeleteSingleListingResult has two fields:

success: indicates whether the deletion was successful;
error_message: If the deletion failed, the value of this field will contain a message explaining why.

Strictly speaking, however, calling tonita.listing.delete() only schedules the listings for deletion by making them inactive. Inactive listings are recoverable for seven (7) days after they first become inactive. We can see this by calling tonita.listings.list():

tonita.listings.list(corpus_id="my_corpus_id")

# Example return value:
# ListListingsResponse(
#     results={
#         'listing_1': <State.INACTIVE: 'INACTIVE'>, 
#         'listing_2': <State.INACTIVE: 'INACTIVE'>,
#         'listing_3': <State.ACTIVE: 'ACTIVE'>
#     },
#     next_listing_id=None
# )

Taking a look at one of the data for one of these newly inactive listings, we see how much time it has before it expires and becomes unrecoverable:

tonita.listings.get(listing_ids=["listing_1"], corpus_id="my_corpus_id")

# Example return value:
# GetListingsResponse(
#     results={
#         "listing_1": GetSingleListingResult(
#             success=True,
#             data={"categories": None, "data": {"foo": "bar"}},
#             state=<State.INACTIVE: 'INACTIVE'>,
#             seconds_to_expiration=604787.02,
#             error_message="",
#         )
#     }
# )

To recover a batch of listings (i.e., make listings active), call tonita.listings.recover():

tonita.listings.recover(
    listing_ids=["listing_id1", "listing_id2"],
    corpus_id="my_corpus_id"
)

The return value, RecoverListingsResponse, is similar in schema to the DeleteListingsResponse above:

# Example return value:
# RecoverListingsResponse(
#     results={
#         'listing_1': RecoverSingleListingResult(
#             success=True, 
#             error_message=''
#         ),
#         'listing_2': RecoverSingleListingResult(
#             success=True, 
#             error_message=''
#         )
#     }
# )

If a listing is continuously inactive for seven (7) days, however, it cannot be recovered. Trying to do so will raise a TonitaBadRequestError since the listing no longer exists.

Attention

It might take some time for an expired listing to be removed completely from our databases. Therefore, the ID of a recently expired listing may not be immediately available to re-use.