Retail Search best practices for high performance:...

Shrish_marnad · ‎10-27-2023

Retail Search is a service provided by Google Cloud for retailers to use similar Google Search type capabilities, but with the retailers' own products.

When onboarding onto Retail Search, the primary driver for quality search results and performance is the data. Retail Search performance (relevancy, ranking, and revenue optimization) is extremely sensitive to the data that's uploaded.

To help ensure retailers are utilizing Retail Search effectively, we've put together a list of best practices when onboarding data to Retail Search. In this blog, we cover A/B experiments best practices. Click the below links to jump to the other articles in this series covering best practices for product catalog, user events, and integrations and configurations.

Product catalog best practices
User events best practices
Integration and configuration best practices
A/B experiments best practices (you're here!)

A/B experiments best practices

1. Experiment id mapping recommendation

Experiment ids are used for A/B testing, where you can compare Retail Search against an existing search solution. They can also be used to run experiments with a fully adopted Retail Search site where a new config / control / boost spec etc. needs to be tested against a control group.

The experiment id field in the user events is an array, which allows for more granular measurement. Consider the following use case.

Retail Search performance needs to be compared against a control group
The overall performance needs to be measured
Mobile-only performance also needs to be measured
Desktop-only performance also needs to be measured
Search and Recommendations performance needs to be measured separately as well

To achieve such granular and sliced measurements, we might need a total of 10 experiment ids, of which four experiment ids needs to be sent in the events experiment ids array for every event.

Experiment ids for Control group of events	Experiment ids for Test (Retail Search) group of events	Scope of user events
Control	Google	All events
Control_mobile	Google_mobile	All mobile events
Control_desktop	Google_desktop	All desktop events
Control_search	Google_search	All search and related events
Control_recommendations	Google_recommendations	All recs and related events

If we want to measure the overall performance, we compare the metrics derived from events with experiment ids Control and Google (Retail Search). If we want to measure the mobile search performance, we compare the metrics derived from events with experiment ids Control_mobile + Control_search vs Google_mobile + Google_search.

2. Category hierarchy

Make sure the same products have the same category hierarchy between the control and the test. Say in the control site, a t-shirt product has the category hierarchy as [clothing > mens > tops > t shirts] and the same product is under a different category hierarchy in the test side as [mens > popular > tops], this will result in different search results and different category facets between the control and the test sites. This issue will have an effect on the browser experience, as the page_category is the input to the browse call (along with filters).

3. UX parity before A/B testing

When preparing the site for A/B testing, before real user search/recommendations traffic is served to Retail Search (with correct experiment id mapping), it's important to note the UI/UX parity between the ecommerce site with control/legacy search backend and the site with the Retail Search backend.

Consider conducting a UX parity test that focuses only on the non-relevance and non-search related aspects.

Given a search query, between the search result pages for the Control search backend and the Retail Search backend, some things to test for include:

Are the same number of facets showing up? If not, review the facet specs and attribute settings in Retail Search. This is important because facets help users filter and navigate to the desired product from the initial search results. Better and more meaningful facets mean users will take less time to find the desired product. If not, it will result in more clicks and scrolling, which might hamper the search experience and will ultimately affect the conversion and click through rates. This might also result in search abandonment. So having similar facets between the Control site and Test site will mean there's no unfair advantage to users when searching for products between one over the other.

Sponsors' product placements in search results is often a common feature with many ecommerce sites, and mostly the sponsors' products are not part of the organic search results. Care should be taken to make sure the placement and the products shown in the search results page between the Control site and the Test site are almost same, if not identical. If not, it will result in noise getting added to the revenue performance metrics measurement, and depending on the uniqueness of the sponsored products between the Control and Test sites, the noise may be on the higher side.

Other miscellaneous UI aspects to consider:

Are the price and discount information the same between Control and Test site?
Is the autocomplete suggesting the same completions for the search query?
Are the facet values ordering the same?
Is the listing of the products in the same style (list or grid) etc.?

In this blog, we covered A/B experiments best practices. Click the below links to jump to the other articles in this series covering best practices for product catalog, user events, and integrations and configurations.

Product catalog best practices
User events best practices
Integration and configuration best practices
A/B experiments best practices (you're here!)