SlashID: Building a globally distributed Identity Platform

What do we mean by global distribution?

Global distribution means that you can store user data in any region where our cloud providers (currently GCP, more to come soon) are present.

Whether you want to store all or part of your userbase in Japan, Poland, Australia, or any other region, we have you covered at no extra fee and without a complex multi-tenant, multi-dashboard setup.

A large portion of our engineering team cut their teeth at CrowdStrike where ThreatGraph processes several trillion events per week, so we understand that building large-scale cloud systems is a complex and challenging task. For this reason, we have spent over 18 months building and stress-testing our infrastructure.

The SlashID multi-region environment has been live in production for over a year, with more than 3 million users across different regions and several hundred organizations.

Having evaluated several options including large-scale distributed databases like CockroachDB and PlanetScale, we believe our approach is unique and provides the lowest possible latency.

In a future blog post we’ll discuss the technical architecture in more detail but, for now, let’s see why this matters and how to use it.

Why does global availability matter?

Regulatory compliance

While not all user data is subject to data residency requirements, the regulatory environment is becoming progressively more stringent on data residency and protection.

A non-exhaustive list of countries that enforce data residency requirements for some user data/PII includes China, Russia, Germany, India, Brazil, Australia, Canada, France, South Korea, and Turkey.

Beyond current regulatory requirements, more and more customers prefer to keep their data close to their geography to future-proof their infrastructure in case of further regulatory or legal pressure.

Specific sections of our documentation discuss our approach to compliance:

Availability, performance, and disaster recovery

Authentication and authorization are part of the most critical application paths including:

Logging in to your application
Checking user access to a specific resource
Online JSON Web Token (JWT) verification and session management
Retrieving and storing user data

End-users will immediately notice an outage in these services. If your identity provider is down, your entire application becomes unavailable.

We built SlashID’s identity and data storage around the principles of availability, performance, and disaster recovery from day 1. In this blog post, we’ll cover:

The drawbacks of single-region deployments
SlashID’s developer experience for handling multi-region

In a future blog post, we’ll discuss the technical aspects of how we built our infrastructure.

The drawbacks of single-region deployments

First, let’s take a look at the downsides of using a single-region deployment.

Latency impact

One of the biggest downsides of a single-region deployment is the increased latency for end-users.

Users have become accustomed to edge deployment and low latency whenever they use web apps and services. They expect the same level of performance when it comes to authentication. However, legacy Identity and Access Management (IAM) solutions with single-region deployments are affected by significant latency.

But what does ‘significant’ mean here? To give you a better perspective, we’ve measured cross-region latency across multiple locations.

We run this test using netperf, a popular benchmark tool that can be used to measure the performance of many different types of networking. All machines except one were deployed in the Google Cloud Platform (GCP). One machine was deployed in Digital Ocean in Frankfurt to simulate cross-data center calls within one geographical location.

Data Center	Frankfurt (GCP)	Frankfurt (Digital Ocean)	London	Montreal	Los Angeles	Sydney
Frankfurt (GCP)	x	0.95ms	13.25ms	87.56ms	152.84ms	281.65ms
Frankfurt (Digital Ocean)	1.09ms	x	12.37ms	89.00ms	153.06ms	262.35ms
London	12.99ms	13.7ms	x	77.46ms	139.68ms	267.06ms
Montreal	88.57ms	89.16ms	78.13ms	x	75.52ms	199.45ms
Los Angeles	152.46ms	151.50ms	142.34ms	75.11ms	x	138.27ms
Sydney	281.33ms	265.31ms	272.41ms	199.69ms	138.26ms	x

In the best-case scenario, an HTTP call between Europe and North America (London to Montreal) adds 77 ms of latency per request. The numbers reported in the table above show the overhead of a single cross-region call: each request back and forth incurs that penalty on top of its normal execution time. Latency scales linearly with the number of requests, so performance deteriorates further when multiple calls are executed.

For instance, let’s consider a scenario where a request is made between data centers in Europe (Frankfurt) and the West Coast (Los Angeles). In this scenario, each request incurs 152ms of overhead.

While it may not look like much, in practice, when you also account for the execution time of the request, you’ll often exceed 200ms. It is well documented that latency above 200ms is perceived by the user and contributes to user drop-off.

The performance impact also becomes evident when a user persists in one region and travels to another.

Risk of regional outages

Last year alone, the biggest cloud providers suffered from multiple major outages, often affecting entire regions. US-EAST-1 AWS (the biggest AWS US region) experienced a 6-hour outage. A flood tanked a GCP region for 12 whole hours. Most companies using AWS and GCP to host their services were not prepared for these incidents and, as a result, suffered global outages for many hours. You might have noticed several tools and products not working during those days.

It’s hard (and beyond the scope of this blog post) to quantify the financial and reputational damage those products suffered during the AWS and GCP outages. However, the positive outcome is that it triggered multiple discussions about the importance of having a strong multi-region strategy. Unfortunately, multi-region deployment is not a quick and easy fix: it is hard to get right and, when not implemented from the get-go, it often requires a lot of architectural changes.

Regional outages of core infrastructure have occurred in the past. They will undoubtedly occur again. Using external services (e.g., identity providers) that are resilient to such outages should be part of every solid multi-region strategy.

Compliance

As mentioned earlier, depending on the countries you operate in, you may be subject to data residency or protection laws requiring user data to be localized wholly or in part in a given region.

A single region deployment model would require you to spawn a separate instance of your services for each country you operate in, making implementation, customer support, and analytics significantly more challenging.

SlashID: Global deployment with the ease of single-region deployment

We built SlashID as a global identity platform deployed and replicated across GCP’s regions to achieve best-in-class resiliency with the lowest latency and highest availability.

Our infrastructure can bootstrap a new region in approximately 24 hours and we are present by default in 5 regions.

We want to give you the best developer experience, which also means handling all cross-region concerns, so you won’t have to. As such, you can access cross-region functionality with the same abstraction as common single-region APIs. Unless specified, users will automatically be assigned to the region closest to them, minimizing latency and maximizing performance.

We designed our system so that each region is autonomous. This reduces the chance of a global outage. If there is no connectivity between regions (due to network issues or regional outage), all regions will keep on working independently, and will eventually synchronize the data after network connections are restored.

How does it work in practice?

By default, when a user is created, their data will be stored in the region geographically closest to the request to our API. Data residency is transparent for you: you can store and retrieve data without thinking about where it’s stored. SlashID will either read the data directly from the region where it’s stored or use a copy of the data replicated in a closer location.

You can try it yourself with the HTTP requests we saw earlier:

curl -L -X POST 'https://api.slashid.com/persons' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: <ORG_ID_VALUE>' \
-H 'SlashID-API-Key: <API_KEY_VALUE>' \
--data-raw '{
  "handles": [
    {
      "type": "email_address",
      "value": "[email protected]"
    }
  ],
  "attributes": {
    "end_user_no_access": {
      "is_vip": true,
      "secret_note": "this person is our VIP"
    },
    "end_user_read_write": {
      "address": {
        "city": "New York",
        "country": "USA",
        "postal_code": "10001",
        "street": "123 Main Street"
      },
      "credit_card_number": "1234-1234-1234-1234"
    }
  }
}'

The response will contain the person ID and the person’s region:

{
  "result": {
    "active": true,
    "person_id": "064d221c-b5cb-7c37-a908-0120768e52f5",
    "region": "us-iowa",
    "roles": []
  }
}

As shown earlier, even if requests are traveling between data centers in the same geographical location, the latency overhead is nominal (~1 ms). Thanks to that, you can call the SlashID API with a latency similar to calling your internal API.

If you want, you can also specify where the person’s data should be stored explicitly. For example, let’s create a person in Japan:

curl -L -X POST 'https://api.slashid.com/persons' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: <ORG_ID_VALUE>' \
-H 'SlashID-API-Key: <API_KEY_VALUE>' \
--data-raw '{
  "handles": [
    {
      "type": "email_address",
      "value": "[email protected]"
    }
  ],
  "region": "asia-japan"
}'

{
  "result": {
    "active": true,
    "person_id": "064d2234-79b9-74f4-a810-061d691e84dd",
    "region": "asia-japan",
    "roles": []
  }
}

To achieve a multi-region setup with other Identity Providers, you must configure a separate project or organization for each region. In such a case, data retrieval and synchronization become difficult and error-prone: you will usually need to know which organization or project must be used to query a particular user. In the worst case, when you only have a user ID and don’t know where the data resides, you will need to iterate over all regions to find it.

When using SlashID, you can retrieve a person’s data without thinking about where it’s stored:

curl -L -X GET 'https://api.slashid.com/persons/064d2234-79b9-74f4-a810-061d691e84dd' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: <ORG_ID_VALUE>' \
-H 'SlashID-API-Key: <API_KEY_VALUE>'

{
  "result": {
    "active": true,
    "person_id": "064d2234-79b9-74f4-a810-061d691e84dd",
    "region": "asia-japan",
    "roles": []
  }
}

The same applies to user attributes:

curl -L -X GET 'https://api.slashid.com/persons/064d2250-acf8-74bd-a308-e4ecda7c8bf4/attributes' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: <ORG_ID_VALUE>' \
-H 'SlashID-API-Key: <API_KEY_VALUE>'

{
  "result": {
    "end_user_no_access": {
      "is_vip": true,
      "secret_note": "this person is our boss's friend"
    },
    "end_user_read_write": {
      "address": {
        "city": "New York",
        "country": "USA",
        "postal_code": "10001",
        "street": "123 Main Street"
      },
      "credit_card_number": "1234-1234-1234-1234"
    }
  }
}

We’ve also optimized for the corner case where at a point in time a user is far from the region where their data is stored. We’ll describe that in detail in an upcoming blog post focused on the multi-region architecture.

Replicating configuration

The other burden of manually dealing with multi-region deployments is handling proper configuration. For instance, messaging templates, authentication and authorization policies, and SAML/OIDC providers.

Maintaining separate projects or organizations per region adds significant configuration overhead. Additionally, when you maintain separate projects or organizations for each region, you must synchronize their configurations.

In SlashID, all organizations are global by design. There is no need to have an organization or a project for each region. You update the configuration in the same way as you would for a single-region organization. All the complexity of replicating organizations globally is handled by SlashID.

curl -L -X PATCH 'https://api.slashid.com/organizations/config' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: <ORG_ID_VALUE>' \
-H 'SlashID-API-Key: <API_KEY_VALUE>'
--data-raw '{
  "token_duration": 86400
}'

By default, we store the configuration in the region from which the HTTP call originated synchronously. Other regions are updated asynchronously.

Sometimes, knowing that the configuration was stored in a single region is not enough and SlashID provides the ability to wait synchronously for updates to be propagated. It may be useful when you want to make sure that the configuration is stored in all regions before you proceed with another configuration update. It’s especially important when your product has a global reach.

You can choose to wait until the configuration is stored in all regions, by using SlashID-Required-Consistency: all_regions header:

curl -L -X PATCH 'https://api.slashid.com/organizations/config' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: <ORG_ID_VALUE>' \
-H 'SlashID-API-Key: <API_KEY_VALUE>' \
-H 'SlashID-Required-Consistency: all_regions' \
-H 'SlashID-Required-Consistency-Timeout: 15' \
--data-raw '{
  "token_duration": 86400
}'

While global configuration replication typically completes in less than half a second, it can sometimes take a few seconds.

If you decide not to wait for global replication, our model propagates the organization structure with eventual consistency.

The UI

Lastly, you can see all your users in the same dashboard (with role-based access control for different regions if compliance is a priority) without switching between multiple systems or tenants to make analytics and customer support easier.

What’s next?

If you haven’t tried it yet, we recommend you create a free account in SlashID and see for yourself what our API and SDK can do.

Explore our replication documentation to dig deeper into the technical details of our replication model.

If you like our solution and are considering migrating your identity system, test it out with a free account or reach out to us for any questions or comments at [email protected].