What are the best practices for API Design in Multilingual Applications?

Not applicable

Hi All,

I have a question about multilingual and internationalised (localised) applications, and how (if at all) this would affect API design.

If we take the example of a US news site which has a CMS, and we had a basic page structure with the following fields:

  • heading
  • author
  • firstParagraph
  • content

Now, if we want to provide the content in Spanish (language code "es"), what happens at the CMS side is we essentially create the page again with fields in the database specific to this language. So for Spanish, we would create something like:

  • heading_es
  • author_es
  • firstParagraph_es
  • content_es

Now a common way sites determine which language should be used is via URL parameters:

  • website.com/ for the standard (english) version*
  • website.com/es/ for Spanish-speaking visitors.

What I'm curious to know is, from an API perspective / from an edge management perspective, for each language, do you need to create an API to manage each version of the content?

  • Articles API -
  • Articles - Spanish API - api.website.com/v1/articles_es/
  • ... API
  • ... - Spanish API

or would you break it down by language?

  • APIs
    • Articles - api.website.com/v1/articles/
    • ...
  • APIs Spanish
    • Articles - api.website.com/v1/articles_es/
    • ...

When calling an API, how do you define which API to call? Do you (or can you?) use some sort of witchcraft on the back end which would support the routing of your URL parameters which could potentially give you something like one of the following

  • api.website.com/v1/articles/
  • website.com/api/v1/articles/
  • website.com/api/v1/es/articles/
    or
    website.com/api/v1/articles_es/

Or is there some sort of hybrid thing

  • api.website.com/v1/articles/
  • api.website.com/es/v1/articles_es/
  • website.com/api/es/v1/articles_es/

If it is unclear, please let me know and I can try explain further, but I didn't want to write a million words.

Cheer

David

1 5 22.8K
5 REPLIES 5

Not applicable

Yes, you could embed the language preference in the URL /en/ or a query param ?lang=en. But if you want "best practice", then I'd refer to the HTTP 1.1. specification.

RFC 2616 allows for Accept-Language (client/API) and Content-Language (server) headers. When combined, these two headers allow for natural language negotiation.

For example the client (API) would indicate the user preference:

       Accept-Language: da, en-gb, en

This means that client prefers Danish (da), but will accept British English (en-gb) and other types of English (en).

An additional parameter ("q" for quality) is available - whether you support it or not is optional. Care should be exercised to ensure it doesn't break if "q" if provided or not.

A similar header is available on the server, which indicates what languages are supported:

       Content-Language: en, de

In this example, whilst the user preferred Danish (da), the server only supports English and German (de). So the content should be delivered in English.

The server should support the Content-Language header on a HEAD, so that the client can easily interrogate the server for supported languages, and offer a choice to the user. Then indicate that choice in the Accept-Language header.

As always, refer to the RFC (2616).

Ok, thanks Lee, so to confirm (in layman's terms):

  • so you create / define the language as a header param
  • pass the language value via the header (or multiple languages in a preferred order - da, en, de etc)
  • then you configure your back end server to do some witchcraft which maps the language code from the header (da, en, de etc) to whatever languages are in the back end system
  • and then the back end retrieves that language from the database and returns it as standard content

This means you would only have 1 API which covers all languages, and you would not create an API for each language.

So this means you would maintain your original URI structure for your APIs, ie:

  • api.website.com/v1/articles/

Not applicable

I think the answer from @Lee is a good one, but I don't think there is a single right answer to this question. If you follow Lee's approach, you are really asserting that there is a single article, since there is a single URL. You are implying that the difference between the Spanish, Danish and English versions is really just a question of formatting—the meaning is the same. Ideally, you would be able to prove this with an automatic transformation between them such that, if I fetched the Spanish version, I could accurately produce the Danish version, or at least an equivalent of the Danish version, without going back to the server. For some documents, especially very simple ones, this might be the right model. In your original approach, you are really asserting that each language version is a document in its own right, with its own URL. There may be a relationship between these documents—they have a common provenance—but they are separate and distinct. Some book translations are considered fine works of literature independent of their foreign-language originals. If your documents are more like this, then you might be better following your original model. Note also that you can combine these two approaches. You can have a set of resources at /articles/, and with each GET on one of them with an Accept-Language header, you can either respond with a redirect to the correct language document, or you can just return the correct language document directly and include a Content-Location header with the URL of the one you returned. This last approach is kind of clever, but if you use it you need to be careful about the behavior of internet caches (including the browser itself). You can get the right behavior from an modern cache by including in the response a 'Vary' header that includes the Accept-Language value. If you forget to do this, users requesting Danish may get served up Spanish out of an intermediate caching proxy. The need for the Vary header applies to Lee's solution also.

Not applicable

Ok, so taking what @Lee and @Martin said, would it make sense to structure your data and implement it like this for example (again, I am a DEL, not a technical person, but just trying to understand so apologies if code example isn't perfect)

Scenario: I am a Spanish speaker reading an English news website.

I set my default language (Spanish) in my preferences and then the client would pass this value via:

Accept-Language: es

And then on the server you use / specify the languages available (in the preferred order)

Content-Language: en, es

Now, I had a data structure like below, would take the language value passed from the client ("es" in this instance), parse the data and return the appropriate text in the language based on the corresponding language code.

So when

{ 
    "articleId": "UUID1",
    "articleElements": [{
        "heading" : [{ // this is an array of "heading" values for each language
            "en": "This is the Article Heading", 
            "es": "Este es el Artículo Denominación"
        }],
        "firstParagraph" : [{
            "en": "This is the first paragraph text of the article which is longer than a heading",
            "es": "Esta es la primera de texto de párrafo del artículo que es más largo que un encabezamiento"
        }]
    }],
    "articleId" : "UUID2",
	...
    }]
}

Is this a logical structure?

Would you only do this for the content creation (POST / PUT)?

Would it make sense to return all languages as a part of the get or would you just return the language the client requested?

Is there a better way to do it?

I don't think this would require the use of language code params in the URL, how essential are these?

Maybe it is an SEO question, but would it be considered duplicate content?

Not applicable

Again, there is not a single right answer, but here is what I think I would do in your circumstances. It sounds to me that each of your language versions is created externally by a human translator. This suggests to me that there will be translation errors you will want to be able to fix later. It will also be more convenient if each of the translators can work independently, rather than having to assemble all the translations in the client before uploading to the server. If this is true (only you can decide—this is a question of user scenarios, not technology), I think it is simplest if you make each translation be a separate resource, rather than trying to combine all the translations in a single resource. So, especially for authors/translators, I think it will be easier if you create a hierarchy like:

/articles
  /en
    /article1.html
  /es
    /article1.html
  /da
    /article1.html

Each of these would be created and edited by doing a PUT or POST of a single-language document to the appropriate "folder" (e.g. /articles/en). For readers, you can let them access these documents directly using GET with a URL that includes the language segment (e.g. /articles/es/article1.html). You can also allow readers to access /articles/article1.html (no language segment in the URL). In this case you would use content negotiation as described before to decide which one to return (or redirect to). This is an optional addition. Let me emphasize again that we can't pick the best solution without a good understanding of your usage scenarios, which I'm only guessing at here.