Retail Search best practices for high performance: A/B experiments best practices

Retail Search is a service provided by Google Cloud for retailers to use similar Google Search type capabilities, but with the retailers' own products. 

When onboarding onto Retail Search, the primary driver for quality search results and performance is the data. Retail Search performance (relevancy, ranking, and revenue optimization) is extremely sensitive to the data that's uploaded.

To help ensure retailers are utilizing Retail Search effectively, we've put together a list of best practices when onboarding data to Retail Search. In this blog, we cover A/B experiments best practices. Click the below links to jump to the other articles in this series covering best practices for product catalog, user events, and integrations and configurations.

  1. Product catalog best practices 
  2. User events best practices 
  3. Integration and configuration best practices 
  4. A/B experiments best practices (you're here!)

A/B experiments best practices

1. Experiment id mapping recommendation

Experiment ids are used for A/B testing, where you can compare Retail Search against an existing search solution. They can also be used to run experiments with a fully adopted Retail Search site where a new config / control / boost spec etc. needs to be tested against a control group.

The experiment id field in the user events is an array, which allows for more granular measurement. Consider the following use case.

  • Retail Search performance needs to be compared against a control group
  • The overall performance needs to be measured
  • Mobile-only performance also needs to be measured
  • Desktop-only performance also needs to be measured
  • Search and Recommendations performance needs to be measured separately as well

To achieve such granular and sliced measurements, we might need a total of 10 experiment ids, of which four experiment ids needs to be sent in the events experiment ids array for every event.

Experiment ids for Control group of events

Experiment ids for Test (Retail Search) group of events

Scope of user events

Control

Google

All events 

Control_mobile

Google_mobile

All mobile events

Control_desktop

Google_desktop

All desktop events

Control_search

Google_search

All search and related events

Control_recommendations

Google_recommendations

All recs and related events 

If we want to measure the overall performance, we compare the metrics derived from events with experiment ids Control and Google (Retail Search). If we want to measure the mobile search performance, we compare the metrics derived from events with experiment ids Control_mobile + Control_search vs Google_mobile + Google_search.

2. Category hierarchy

Make sure the same products have the same category hierarchy between the control and the test. Say in the control site, a t-shirt product has the category hierarchy as [clothing > mens > tops > t shirts] and the same product is under a different category hierarchy in the test side as [mens > popular > tops], this will result in different search results and different category facets between the control and the test sites. This issue will have an effect on the browser experience, as the page_category is the input to the browse call (along with filters).

3. UX parity before A/B testing

When preparing the site for A/B testing, before real user search/recommendations traffic is served to Retail Search (with correct experiment id mapping), it's important to note the UI/UX parity between the ecommerce site with control/legacy search backend and the site with the Retail Search backend.

Consider conducting a UX parity test that focuses only on the non-relevance and non-search related aspects. 

Given a search query, between the search result pages for the Control search backend and the Retail Search backend, some things to test for include:

Are the same number of facets showing up? If not, review the facet specs and attribute settings in Retail Search. This is important because facets help users filter and navigate to the desired product from the initial search results. Better and more meaningful facets mean users will take less time to find the desired product. If not, it will result in more clicks and scrolling, which might hamper the search experience and will ultimately affect the conversion and click through rates. This might also result in search abandonment. So having similar facets between the Control site and Test site will mean there's no unfair advantage to users when searching for products  between one over the other.

Sponsors' product placements in search results is often a common feature with many ecommerce sites, and mostly the sponsors' products are not part of the organic search results. Care should be taken to make sure the placement and the products shown in the search results page between the Control site and the Test site are almost same, if not identical. If not, it will result in noise getting added to the revenue performance metrics measurement, and depending on the uniqueness of the sponsored products between the Control and Test sites, the noise may be on the higher side.

Other miscellaneous UI aspects to consider:

  • Are the price and discount information the same between Control and Test site?
  • Is the autocomplete suggesting the same completions for the search query?
  • Are the facet values ordering the same?
  • Is the listing of the products in the same style (list or grid) etc.?

In this blog, we covered A/B experiments best practices. Click the below links to jump to the other articles in this series covering best practices for product catalog, user events, and integrations and configurations.

  1. Product catalog best practices 
  2. User events best practices 
  3. Integration and configuration best practices 
  4. A/B experiments best practices (you're here!)
Contributors
Version history
Last update:
‎10-27-2023 03:37 PM
Updated by: