Browser languages, location and their use in Antifraud systems

29.11.24

Modern browsers support multiple languages for content display, interface localization and user preferences. This functionality, originally aimed at user convenience, has become an important part of antifraud systems. Analyzing language settings allows you to identify possible discrepancies between a user's declared profile and their actual device parameters.

#### What are browser languages?
Browser languages define what language is used to display web pages, browser interface and other elements. They are represented as a list of preferred languages that the browser sends to the server in the Accept-Language HTTP header. Language information can also be obtained through the navigator.language or navigator.languages object in JavaScript.

#### How is a language fingerprint formed?
The language fingerprint includes the following data:
   1. **Accept-Language** - a list of preferred languages and their priority.
- Example: en-US,en;q=0.9,ru;q=0.8.
   2. **navigator.language** - the main language of the browser.
- Example: en-US
   3. **navigator.languages** - array of preferred languages.
- Example: [ “en-US”, “en”, “ru-RU” ]
   4. **Browser interface language** - the language in which the browser itself is displayed.
   5. **Operating System Language** - the language set at the OS level.
These parameters form a combination that can be used to identify the user.

#### Use of languages in anti-fraud systems
Anti-fraud systems analyze language data to identify potentially suspicious users. Here are some key scenarios for their use:
   1. **Detecting inconsistencies:**
- If the language of the browser interface does not match the language of the operating system.
- If Accept-Language contains languages not used in the user's region.
- If the user's preferred languages change too frequently.
   2. **Check geolocation matching:**
- The language settings are compared to the user's geographic location as determined by IP address or GPS.
- For example, if the IP address points to France, but the language settings are set to Chinese, this may be suspicious.
   3. **Parameter spoofing detection:**
- If the navigator.language and navigator.languages parameters are conflicting.
- If the browser language changes on each new visit.

###### Methods for spoofing language parameters
Attackers, in an effort to circumvent anti-fraud systems, can use various methods to change language settings:
   1. **Swapping the Accept-Language:** HTTP header.
- Using a proxy or special software, the content of the header is changed to match the desired profile.
   2. **Changing browser settings:**
- Using anti-detect browsers that allow arbitrary values of navigator.language and navigator.languages to be set.
   3. **Localization emulation:**
- Changes the interface language in the browser to create the illusion of matching the region.
   4. **Partialization Masking:**
- Adding additional languages to Accept-Language to create a more universal profile.

###### How do anti-fraud systems detect language spoofing?
   1. **Consistency Analysis:**
- Checking that all language parameters (HTTP header, navigator.language, interface language) are consistent.
   2. **Checking for rare combinations:**
- Identify unusual language preferences that rarely occur in real life (e.g. en-GB,ar-KW,ja-JP).
   3. **Comparing with benchmark data:**
- Using a database to test the likelihood of using certain languages in a particular region.
   4. **Dynamic testing:**
- Changing language settings on a website and checking the browser's response. For example, if the browser does not reflect the change, this may indicate spoofing.
   5. **Localization of errors and messages:**
- Analyzing the language of the browser's system messages. This is difficult to spoof because the language of the messages depends on the OS configuration.

#### Analyze language fingerprints
Let's check what data a website can get about a user's location by opening the <a href="https://browserleaks.com/javascript" target="_blank">browserleaks.com/javascript</a> page in our Chrome browser:

![](/media/mdeditor/1_20241128195706999105.jpg)

This page gives the site information about the date and time on the user's device, time zone, locale, clock format, calendar type, number recording system, primary and preferred languages, and information about synthesized voices in the speech service controller interface.
Let's check the location information by ip address, let's open the <a href="https://browserleaks.com/ip" target="_blank">browserleaks.com/ip</a> page

![](/media/mdeditor/2_20241128195808761231.jpg)

This page gives the site detailed information about the user's ip address and languages in http headers.
Let's see on our test page <a href="https://test-webapi.tech/language" target="_blank">test-webapi.tech/language</a> what other location information the site can collect from the user's browser, Let's open the page in the Chrome browser on the server.
Section T**ime Accuracy Check:**

![](/media/mdeditor/3_20241128195835163205.png)

In this section we get the visitor's ip address, check Location using a third party service against the ip address, get Coordinates and Browser Timezone from the user's browser
**Important:** on the test page, all conclusions about parameter mismatch are not accurate and are presented only for an example of analyzing the obtained data.

In the **Geolocation Services Results** section, the page retrieves ip address information from multiple databases and compares with the retrieved information in the Time Accuracy Check section:

![](/media/mdeditor/4_20241128195917569229.png)

In the **Local Time Information** section, we see the time, date, and time zone directly from the visitor's browser:

![](/media/mdeditor/5_20241128200152671454.png)

The **Browser Language Settings** section defines the browser languages using JavaScript and http headers in the main thread, web worker and service worker, then compares the values to each other:

![](/media/mdeditor/6_20241128200216624491.png)

Note that the presence of Russian language in a user from Finland can already be determined by Antifraud systems as an atypical indicator, especially in case of mass registrations of accounts.

Let's consider the next section - **System Messages Language Analysis:**

![](/media/mdeditor/7_20241128200239152793.png)

This section defines the language of error messages, the language of form message validation text, and the language of system messages in the user's browser.

In the **Internationalization Details** section, we see the regional settings, calendar type, time zone, and browser-based number system from the system settings of the user's device. Date formats and currency display format are also defined:

![](/media/mdeditor/8_20241128200257399041.png)

The currency display format depends on the language settings of the system

![](/media/mdeditor/9_20241128200311647366.png)

![](/media/mdeditor/10_20241128200328909185.png)

Let's move on to the last section **Speech Synthesis Voices**. This section checks the available voices for speech synthesis in the browser:

![](/media/mdeditor/11_20241128200347268087.png)

The set of voices and their number differs across browsers, devices, and this parameter allows Anti-Fraud systems to identify users quite accurately and identify suspicious visits, especially in comparison to the browser's language settings. The following are a few examples from different devices:

Intel PC (Chrome browser)

![](/media/mdeditor/12_20241128200410776819.png)

Intel PC (Chromium browser)

![](/media/mdeditor/13_20241128200433904095.png)

Intel PC (Firefox browser)

![](/media/mdeditor/14_20241128200456147820.png)

AMD PC (Chrome browser)

![](/media/mdeditor/15_20241128200518993933.png)

AMD PC (Chromium browser)

![](/media/mdeditor/16_20241128200538283209.png)

AMD PC (Firefox browser)

![](/media/mdeditor/17_20241128200555713063.png)

In the test results, we see that the list of voices for speech synthesis differs both between different devices and in different browsers.
Let's check the anti-detect browser, let's open the page without profile, with basic settings:

<div class='d-flex gap-8'><img src='/media/mdeditor/18_STYLE!!!_20241128200636945995.jpg' class='mx-0' /></div>.

There are a few differences in the values in the results of this test: 
- The anti-detect browser indicates Time Zone - Europe/Moscow.
- Represents use of en-US language only.
- Uses only 2 Voices from Ru locale (similar to Chromium browser).
- Represented in the User-Agent header by the Chrome browser.

Let's try to apply the profile in the anti-detect browser and repeat the test:

<div class='d-flex gap-8'><img src='/media/mdeditor/19_STYLE!!!_20241128200741865166.jpg' class='mx-0' /></div>.

After applying the profile, the anti-detect browser correctly substitutes all language parameters relative to the specified location, except one: the browser shows form validation messages in En - language, which is suspicious when using the main Ru - language in the browser.

#### Language detection in the user's browser on the example of Amazon and Google

Let's check what information about languages from our browser Amazon site receives. Let's open the home page in the Chrome browser:

![](/media/mdeditor/20_20241128200911396697.png)

With a simple analysis of the JavaScript files, we can see that the site gets information about the preferred language and available languages in our browser.
Let's check what information Google collects about languages. Let's open the search box and look at the code.

![](/media/mdeditor/21_20241128200928492878.png)

The site accurately detected that we are using the ru language, note that the browser is open on a server located in Germany, which Google also detects (ru-DE).

![](/media/mdeditor/22_20241128200951041106.png)

![](/media/mdeditor/23_20241128201007235412.png)

Google also collects information about preferred language and available languages. The site also analyzes voice language change events, target translation language and captures information about the choice of the suggested language.

#### Conclusion
Analyzing browser language parameters provides antifraud systems with a significant amount of information for user authentication. These parameters help detect inconsistencies between browser settings, operating system settings, HTTP headers, and geolocation, which can help recognize potentially suspicious behavior. Tests have shown that methods of spoofing language settings, such as using anti-detect browsers or changing HTTP headers, are not always effective. Even minor differences, such as the language of system messages, can raise suspicions. Thus, language settings become an important component of a comprehensive user profile assessment. Many antifraud systems actively use language fingerprinting as one of the main metrics for behavior analysis, data matching, and spoofing detection.

Browser language parameters are a powerful tool that helps antifraud systems identify inconsistencies and improve the security of checks. Despite attempts by attackers to spoof language data, systems detect such manipulations by analyzing parameter consistency and rare combinations. The language fingerprint generated from multiple parameters is widely used to identify users and detect suspicious activity. Using this data in combination with other mechanisms, such as geolocation and time zones, can significantly improve the accuracy of fraud detection.
In the next article we will continue the analysis of browser technologies used in antifraud systems.

*Bannykh M.V.*

Not with us yet?