Retail Search is a service provided by Google Cloud for retailers to use similar Google Search type capabilities, but with the retailers' own products.
When onboarding onto Retail Search, the primary driver for quality search results and performance is the data. Retail Search performance (relevancy, ranking, and revenue optimization) is extremely sensitive to the data that's uploaded.
To help ensure retailers are utilizing Retail Search effectively, we've put together a list of best practices when onboarding data to Retail Search. In this blog, we cover A/B experiments best practices. Click the below links to jump to the other articles in this series covering best practices for product catalog, user events, and integrations and configurations.
Experiment ids are used for A/B testing, where you can compare Retail Search against an existing search solution. They can also be used to run experiments with a fully adopted Retail Search site where a new config / control / boost spec etc. needs to be tested against a control group.
The experiment id field in the user events is an array, which allows for more granular measurement. Consider the following use case.
To achieve such granular and sliced measurements, we might need a total of 10 experiment ids, of which four experiment ids needs to be sent in the events experiment ids array for every event.
Experiment ids for Control group of events |
Experiment ids for Test (Retail Search) group of events |
Scope of user events |
Control |
|
All events |
Control_mobile |
Google_mobile |
All mobile events |
Control_desktop |
Google_desktop |
All desktop events |
Control_search |
Google_search |
All search and related events |
Control_recommendations |
Google_recommendations |
All recs and related events |
If we want to measure the overall performance, we compare the metrics derived from events with experiment ids Control and Google (Retail Search). If we want to measure the mobile search performance, we compare the metrics derived from events with experiment ids Control_mobile + Control_search vs Google_mobile + Google_search.
Make sure the same products have the same category hierarchy between the control and the test. Say in the control site, a t-shirt product has the category hierarchy as [clothing > mens > tops > t shirts] and the same product is under a different category hierarchy in the test side as [mens > popular > tops], this will result in different search results and different category facets between the control and the test sites. This issue will have an effect on the browser experience, as the page_category is the input to the browse call (along with filters).
When preparing the site for A/B testing, before real user search/recommendations traffic is served to Retail Search (with correct experiment id mapping), it's important to note the UI/UX parity between the ecommerce site with control/legacy search backend and the site with the Retail Search backend.
Consider conducting a UX parity test that focuses only on the non-relevance and non-search related aspects.
Given a search query, between the search result pages for the Control search backend and the Retail Search backend, some things to test for include:
Are the same number of facets showing up? If not, review the facet specs and attribute settings in Retail Search. This is important because facets help users filter and navigate to the desired product from the initial search results. Better and more meaningful facets mean users will take less time to find the desired product. If not, it will result in more clicks and scrolling, which might hamper the search experience and will ultimately affect the conversion and click through rates. This might also result in search abandonment. So having similar facets between the Control site and Test site will mean there's no unfair advantage to users when searching for products between one over the other.
Sponsors' product placements in search results is often a common feature with many ecommerce sites, and mostly the sponsors' products are not part of the organic search results. Care should be taken to make sure the placement and the products shown in the search results page between the Control site and the Test site are almost same, if not identical. If not, it will result in noise getting added to the revenue performance metrics measurement, and depending on the uniqueness of the sponsored products between the Control and Test sites, the noise may be on the higher side.
Other miscellaneous UI aspects to consider:
In this blog, we covered A/B experiments best practices. Click the below links to jump to the other articles in this series covering best practices for product catalog, user events, and integrations and configurations.