Retail Search best practices for high performance: Product catalog best practices

Shrish_marnad

retail-search-blog-series.png

Many of us already know about Google Search, which provides relevant results with low latency, all while accommodating the user’s personal preferences and history. The results are quite relevant to each individual, such that they quite rarely browse to the next page of results. Retail Search is a service provided by Google Cloud for retailers to use similar Google Search type capabilities, but with the retailers' own products.

When onboarding onto Retail Search, the primary driver for quality search results and performance is the data. Retail Search performance (relevancy, ranking, and revenue optimization) is extremely sensitive to the data that's uploaded, including catalog and product info, user events, etc.

We have multiple dashboards and data quality checks in place to make sure we're notified of any issues or potential flaws in the data and/or formatting of the data. If this is overlooked, the model will not get trained accurately and if we start with A/B test, the Retail Search is not guaranteed to perform and give the expected outcome. It would seem like Retail Search is not working as expected, while in fact the issue is almost always something to do with the catalog or user events data.

To make this process easier, we've put together a list of best practices when onboarding data to Retail Search. In this blog, we'll cover product catalog best practices. Click the below links to jump to the other articles in this series covering best practices for user events, integrations and configurations, and A/B experiments. 

  1. Product catalog best practices (you're here!)
  2. User events best practices
  3. Integration and configuration best practices
  4. A/B experiments best practices

Retail Search Product Catalog best practices

1. Product structure

There are three product-level types:

  • Primaries can be individual (SKU-level) items and groups of similar items (SKU groups)
  • Variant items are versions of a SKU-group primary product. Variants can only be individual (SKU-level) items. For example, if the primary product is "V-neck shirt", variants could be "Brown V-neck shirt, size XL" and "White V-neck shirt, size S". Primaries and variants are sometimes described as parent and child items.
  • Collection items are collections of products. Collections are bundles of primary products or variant products. For example, a collection might be a jewelry set with a necklace, earrings, and ring. Collections are only available in Retail Search and are not widely used.

Using the three product-level types, there are three main product classification hierarchies:

  1. primary - variant
  2. primary only
  3. collections

In the primary - variant structure, the primary is almost always only a placeholder of (common) information and the variants are the actual SKUs which can be purchased. For example: 

  • A t-shirt, as the primary product, can include the brand, common attributes, description, etc. and the variants have the differentiating info within the primary product, such as color, size, price, etc. If primary product is "V-neck shirt", variants could be "Brown V-neck shirt, size XL" and "White V-neck shirt, size S".
  • A phone, as the primary product, can include the common information about the phone features, edition, brand. etc., and the variants have the differentiating info within the primary product, such as the specific RAM, screen size, battery capacity, etc. If primary product is "Pixel 8 Pro" variants could be "Obsidian Pixel 8 Pro, 256 GB" and "Porcelain Pixel 8 Pro 512 GB". 

In primary only product catalog hierarchies, the primary product itself is the sell-able SKU. For example, a unique pair of headphones that comes only in one color, or a specific design of jewelry item. These items cannot be part of any primary - variant structure.

When planning your catalog hierarchy, you need to decide if your catalog should contain only primaries or primaries and variants. The key point to remember is that prediction and search results only return primary items.

For products that have a variant, it's recommended to structure them as primary - variant, as there are multiple advantages, including:

  • The search page will have diverse results that can be displayed to the end users. Otherwise, if the variants were treated as primary products, the search result page will get filled with the same products.
  • The products will have a richer ranking scheme, as primary with variants will get ranked better if a particular variant is getting more engagement. This will help in re-ranking and revenue optimization.
  • Ease of maintaining the catalog. If an attribute has a change for a group of products that differ only by size, then it can easily be done using a primary-variant structure, i.e changing the attribute at the primary level  instead of changing multiple primaries.
  • API features and search response fields of variant rollup keys and retriable fields are supported only for variants.
  • The search response will contain minimum details of the primary and more details of the variants. So you will always have to augment/enrich the search response with extra details, which can be returned by Retail Search if marked as retrievable.

2. URL correctness

The product.uri field is the canonical URL directly linking to the product detail page. It should be a publicly-crawlable uri and not behind any login/auth wall. This is because the backend crawls the uri webpage and derives as much information as possible, which is used for relevance and popularity scoring. The backend also determines how the uri was interacted with on the web (e.g. backlinks, etc.). It's recommended to have the top level domain name be the same across all the product uris. 

  • If you have the same product listed in multiple banner sites, then please consider using the multi-entity feature. Please contact the account team on this.
  • If you use a different URL in the product catalog than in the actual site, then please make sure the two URLs refer to the same product and have almost identical information.
  • It's also strongly recommended to not re-use a product URL once a product is deleted. Instead, have unique URLs for each product.

3. Product availability correctness

The availability field is set by the inventory update system as the product stock state changes. It's recommended to keep track of all the products that are in IN_STOCK and OUT_OF_STOCK state.

If you have the majority of products as OUT_OF_STOCK, the search response would have many OOS products and on adding a filter, the recall numbers will reduce. If the product has gone out of stock but the catalog state is IN_STOCK, then users will see the product as available, but will probably face issues at the time of purchase / add to cart. This has more of an effect on the customer experience than the model training. It's recommended to keep the Product.availability field as up-to-date as possible using the patchProduct APIs or import APIs with a readMask.

4. Use native fields when possible instead of custom attributes

The Product info structure/schema can be referenced here. The schema is quite extensive and accounts for a wide range of fields that are normally used, including: brands, audience, materials, size, etc. 

Shrish_marnad_0-1697076314555.png

For all other product attributes that are not part of the Product info schema, we recommend using the Product.attributes (custom attributes).

The native Product fields like title, description, brands, etc. have a bigger impact on the searchability and indexability, as compared to the custom attributes.

In other works the backend understands the native fields much better than the custom attributes, and the backend takes into account the native fields info into the optimizations for relevance. Therefore, it's highly recommended to use the native fields (i.e. map your Product information to native fields) as much as possible, and use customer attributes only otherwise.

For example, setting the brands in the Product.brands field has a much higher impact on search and recall, than setting the same info in a custom attribute. For an attribute like “sleeve length,” which is not natively supported, it's recommended to use custom attributes.

5. Importance of the brand field

The brand field in the product info, which is by default searchable, indexable, and facetable, is a strong signal for ranking and relevance. A good percentage of search queries are of the form, “brand query” or “query brand,” and arguably, brand is one of the most heavily used facets.

The click and purchase conversion ratios get affected heavily if the product has the correct brand field. So it's important to have the brand field populated with the correct info and if possible, to never be left blank. What is more detrimental is to fill in random fillers in the brand names like “NA” or “Not available” or “Miscellaneous” etc. This strongly associates the product with the text mentioned in the brands field, which might lead to wrong product understanding and bad recall.

If a particular product is absolutely not associated with any brands, then it's recommended to keep the fields empty. But care needs to be taken that these empty brand products are a small percentage of the catalog products.

6. Importance of the audience field

There are two subfields in the audience info field of the Product info. There are Audience.gender and Audience.ageGroup. It is highly recommended to fill these fields with the appropriate data, which will help the model understand the product's intended audience.

This will play a big part when personalization is enabled. Having gender and ageGroup will help segment the products better and will help the model to recall the right product for the appropriate user.

The Audience data is also helpful when we have queries like “shirts for women” or “mens socks”. With the audience info populated, the product understanding is much better and the model is better to recall the right products for gender-specific queries.

7. Look for products with duplicate titles

The Product.title is probably the most important field, as most of the search queries would have a huge overlap with what is set as the Product.title. It's probably the first information that the end users would see and interact with in the Detail page view, so it's good practice to keep the Product.title unique and have text information that is most relevant to the Product.

Having two products (primary products) with the same title affect the searchability and relevance of the returned results. If there are two separate primary products with significant differences, it's recommended to keep the titles different. If the products are the same but differ only in a few aspects like color, size, etc., then it's recommended to structure the products as PRIMARY and VARIANT types.

8. Language settings

Retail Search supports multiple languages. More info hereThe main thing to note is that the catalog and search query need to be in the same language. There's no cross language translation of query or catalog info. For example, if your catalog is in Spanish, the search query also needs to be in Spanish.

So, it's important to mark the language code in the product info accordingly, otherwise it will default to English (en-US). This is important for search controls like “spellCorrectionSpec”, where if the language is not set, it will lead to undesired behavior. This is also extremely important for query intent understanding.

9. Price info settings

The Product.priceInfo field needs to be as accurate and complete as possible. This price info is used to derive discount-related signals and is used in revenue optimization. This is particularly important for Browse queries.
For a primary-variant product structure, it's recommended to populate the price of at least one of the variants. 

For a product that doesn't have product-level pricing and all the pricing is in the local inventory (i.e. the search is always tied to a local inventory) it's recommended to fill the median price info of all the inventory level pricing at the product level price info.


 In this blog, we covered product catalog best practices for high performance with Retail Search. Click the below links to jump to the other articles in this series covering best practices for user events, integrations and configurations, and A/B experiments. 

  1. Product catalog best practices (you're here!)
  2. User events best practices
  3. Integration and configuration best practices
  4. A/B experiments best practices
4 0 5,221