Cross-Cloud Commerce: Uniting AWS EKS and Azure for Enterprise Integration, Identity, and Advanced Traffic Control
From Single-Cloud to Dual-Cloud: How an E-Commerce Giant Bridged AWS EKS with Azure SQL & AD for Seamless Growth
1. Introduction
In a competitive retail environment, speed, scalability, and secure user experiences are paramount. Many e-commerce platforms initially choose a single cloud for their main site to reduce complexity. However, business demands often shift, requiring integration with third-party systems, advanced data warehousing, or specialized enterprise solutions. This scenario focuses on a global retailer— ShopSphere{Client name changed in request to reduce security attack surface}—who primarily operated on AWS EKS for their commerce site but needed to extend certain functionalities to Microsoft Azure. By leveraging Azure SQL for data warehousing/analytics and Azure AD for enterprise SSO, ShopSphere realized synergy between the two clouds, culminating in the introduction of a service mesh to handle advanced traffic shaping across microservices.
Key Themes:
Running a main commerce site on AWS EKS for reliable performance, elasticity, and strong ecosystem tooling.
Integrating with Azure for specialized enterprise capabilities—Azure AD for single sign-on, Azure SQL for data warehousing, and potential synergy with enterprise partners also on Azure.
Minimizing overhead while eventually leveraging a service mesh that shapes cross-cloud traffic, especially around the inventory microservices that sync data between AWS and Azure.
Achieving unified user authentication by bridging Azure AD for the entire user base while continuing to use AWS IAM roles for certain service accounts.
This case study delves into business drivers, technical architecture, implementation details, challenges, and outcomes. It also explores the role of service mesh adoption, enabling advanced routing between AWS and Azure microservices as the system matured.
2. Company Background
2.1 ShopSphere: A Growing Retailer
ShopSphere started as an online electronics and apparel retailer. Initially, they hosted everything in AWS—EC2-based monolithic deployments. Over the years, they containerized and migrated to Amazon EKS. By doing so, they scaled seamlessly for seasonal spikes—like holiday or back-to-school campaigns—and integrated with other AWS services:
Amazon RDS for OLTP databases
Amazon ElastiCache for session caching
Amazon S3 for product images
Amazon CloudFront for global CDN
2.2 Need for Azure Integration
However, as the company grew in the B2B and enterprise markets, they found themselves needing advanced data warehousing, plus a desire to unify user credentials with a partner ecosystem heavily reliant on Microsoft. Specifically:
Azure SQL: Some internal analytics and external partner reporting needed a robust data warehouse. ShopSphere’s data engineering team believed Azure SQL Data Warehouse features, or simply a large Azure SQL instance, could serve as a staging ground for further analytics. They also had partial data-lake plans with Azure Data Lake or Synapse in mind.
Azure AD: Many enterprise partners used Azure AD for single sign-on. ShopSphere saw synergy in letting those enterprise customers or B2B partners log in with Azure AD credentials, bridging that identity to the commerce site’s own user model.
3. Initial Architecture on AWS
3.1 Monolith to Microservices on EKS
Before the Azure push, ShopSphere had a robust microservices environment on EKS:
Commerce Service: Handling the main storefront, product catalog, checkout.
User Service: Managing user profiles, authentication tokens, session logic.
Inventory Service: Tracking product stock across multiple warehouses.
Order Service: Handling order creation, statuses, payment callbacks.
All these microservices used AWS IAM roles for pods (IRSA) to access S3, RDS, or ElastiCache without storing static credentials. This approach made it a “best practice” scenario in AWS.
3.2 Single Sign-On with Amazon Cognito (Previously)
ShopSphere’s site used Amazon Cognito for user logins. This was fine for B2C customers. They also integrated social logins (Google, Facebook). For the B2B partner side, they used a kludged approach, generating unique accounts or relying on SAML, but it was inconsistent. Partner user management was a headache—some used SAML, others wanted direct user provisioning.
4. Azure Extension: Enterprise Integration and Data Warehousing
4.1 Why Azure SQL?
Data engineering found Azure SQL or Azure Synapse to be a strong contender for warehousing large volumes of transaction logs and partner data. The impetus:
Better Partnerships: Some large enterprise customers mandated that their data land in an Azure environment, ensuring compliance or easy cross-company analytics.
BI Tools: The internal analytics team liked certain Microsoft Power BI features that integrated well with Azure SQL.
Trial: ShopSphere decided to do a small POC with Azure SQL for certain historical analytics, comparing it to Amazon Redshift. They discovered some performance advantages or cost differences that favored Azure for certain workloads.
4.2 Azure AD for SSO
ShopSphere recognized a more robust B2B sign-on flow if they used Azure AD. This would unify enterprise partner logins, letting them use their existing corporate credentials. By bridging the site’s login with Azure AD, they could drastically reduce friction. Also, internally, some teams were adopting Office 365 for collaboration, so having employees log in to the dev environment with Azure AD was consistent.
Trade-Off: They had to figure out how to unify Azure AD credentials with AWS resources, especially EKS, in a secure manner.
5. Unified Identity: Azure AD Bridge to AWS EKS
5.1 Approaches to Federating Azure AD with AWS IAM
ShopSphere’s engineers found multiple ways to integrate:
SAML Federation: Using AWS as a SAML service provider, mapping Azure AD users to IAM roles. This approach is typical for human access to AWS console or CLI.
OIDC: For pods in EKS, they already used IRSA (OIDC identity provider for AWS). Potentially, they could unify Azure AD tokens, but it got complicated.
Hybrid: Let user logins flow from Azure AD → a custom auth service → Cognito or IRSA mapping.
They ended up with a pragmatic solution:
Human Access: Azure AD issues SAML to AWS for employees or B2B partners needing console or developer CLI access.
Pod Access: EKS pods still used standard IRSA for AWS resources, but for external calls to the new Azure environment, the pods fetched short-lived tokens from an internal identity broker that integrated with Azure AD on the back end.
5.2 Minimal Overhead Implementation
They avoided rewriting large chunks of the user authentication logic. The e-commerce site’s front end integrated a new “Login with Azure AD” button for enterprise users. That flow triggered Azure AD’s OIDC sign-in, returning a token to a custom “auth” microservice in EKS. That microservice validated the token, created an internal session for the user, and used IRSA behind the scenes to assume any needed AWS roles for data access.
Outcome: For typical B2B enterprise users, logging in via Azure AD felt native. For the microservices bridging to Azure SQL, they used an Azure AD service principal. The EKS pods that needed to push data into Azure SQL used short-lived tokens from a local sidecar or identity broker that integrated with Azure AD.
6. Data Flow: AWS EKS to Azure SQL
6.1 Setting Up Secure Networking
Cross-cloud connectivity:
VPN or Direct Connect + ExpressRoute approach was considered. They ended up with an IPsec-based site-to-site VPN for a start.
Possibly in the future, they’d invest in AWS Transit Gateway and Azure Virtual WAN for a more robust, scalable approach.
Trade-Off: IP overlap. They had to ensure the subnets used by EKS wouldn’t conflict with the ones in Azure VNet where Azure SQL resided. They re-assigned a chunk of 10.10.x.x for AWS, 10.20.x.x for Azure, avoiding collisions.
6.2 Real-Time or Batch Sync?
The e-commerce system needed near real-time updates for some data (like sales records) to appear in Azure SQL. They built a microservice that listened to an Amazon SQS queue of order events, then forwarded relevant data to Azure SQL over a secure connection. For historical logs or large data sets, they used batch daily exports to reduce overhead.
Pseudo-Code (Service connecting to Azure SQL):
apiVersion: v1 kind: Deployment metadata: name: azure-exporter spec: template: spec: serviceAccountName: azure-exporter-sa containers: - name: exporter image: shopsphere/azure-exporter:1.0 env: - name: AZURE_SQL_HOST value: "myazuresql.database.windows.net" - name: AZURE_SQL_DB value: "ShopSphereAnalytics" - name: TOKEN_ENDPOINT value: "https://login.microsoftonline.com/<tenant>/oauth2/v2.0/token"
The container obtains a short-lived token from Azure AD, uses that token to connect to Azure SQL (configured with AAD authentication).
6.3 Data Warehousing Gains
Once data reached Azure SQL, the data team ran advanced queries or connected Power BI. This synergy made it easier to generate B2B partner reports or internal dashboards, as many enterprise customers also used Power BI, so they could unify reporting easily.
7. Inventory Microservices: The Spark for a Service Mesh
7.1 Cross-Cloud Inventory Challenges
Inventory was crucial. They had partial stock data in AWS-based databases, but some data also stored in Azure SQL to feed partner solutions. Microservices in EKS needed to call certain Azure endpoints, or microservices in Azure needed to call back to EKS. They ended up spinning a small set of container workloads on Azure Container Instances or Azure Kubernetes Service to run partner integration logic.
Without a consistent routing layer, they manually configured IP addresses, DNS records, or load balancer endpoints. This overhead grew.
7.2 Decision to Adopt a Service Mesh
ShopSphere decided to introduce Istio (alternatively, they considered Linkerd or Consul) to unify traffic management. The main reasons:
mTLS cross-cluster: They wanted guaranteed encryption at the microservice level, not just the IPsec layer or TLS.
Advanced Traffic Shaping: They could do canary rollouts of new inventory logic in Azure while still directing the majority of traffic to AWS.
Observability: A mesh would give them standardized metrics and distributed tracing across clouds.
7.3 Implementation: Istio Multi-Cluster Federation
They followed Istio’s multi-primary multi-network approach, deploying Istio in both EKS and the new AKS environment. Each cluster had its own control plane, but they established a mesh gateway that could route traffic across the IPsec link.
Both clusters recognized each other’s service endpoints through the Istio “ServiceEntries.” For the inventory microservices, they defined:
apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: azure-inventory-service spec: hosts: - azure-inventory.svc.mesh addresses: - 10.20.1.50 ports: - name: http number: 80 protocol: HTTP resolution: STATIC location: MESH_EXTERNAL endpoints: - address: <IP of azure service in the mesh gateway>
Outcome: The inventory microservices in AWS EKS could call the Azure counterpart as if it were a local service. The mesh handled mTLS, path-based routing, and consistent telemetry. They tested canary releases by gradually shifting 10% of traffic to the new Azure-based inventory logic.
8. Operational Overhead and Optimizations
8.1 Minimal Overhead Gains
By carefully orchestrating the environment, they avoided duplicating user management or re-architecting the entire codebase. The multi-cloud environment remained comprehensible:
AWS for the main commerce site.
Azure for data warehousing, AD-based SSO, partial microservice expansions.
8.2 Ongoing Observability
They used a combination of:
Prometheus in EKS for AWS-based metrics, plus Azure Monitor for AKS.
Some teams tested Thanos to unify the metrics globally.
For logs, they centralized them in Elastic Stack or considered Splunk.
Service mesh logs and telemetry gave them a cross-cluster view of all microservice calls. This exposed unforeseen latencies, especially if the IPsec link was saturated or if certain calls still used plain public endpoints instead of the internal mesh route.
8.3 IAM vs. Azure AD: The Ongoing Tension
Though bridging Azure AD with AWS worked, it required a certain level of vigilance. They needed to rotate client secrets or service principal credentials. Some teams occasionally found confusion between an “Azure AD user’s SSO token” and an “AWS IAM role for service accounts.” Thorough documentation and training was essential to keep the dev teams from mixing up the concepts.
9. Challenges Encountered
9.1 IP Overlaps
In the earliest attempts, they discovered the EKS cluster subnets in 10.10.x.x collided with an Azure VNet they built by default. They had to tear down one side or the other, reassign subnets, and set up correct routes. This caused a multi-week delay.
9.2 Performance Under Load
When the inventory microservices in Azure needed to query large data sets from AWS or vice versa, the IPsec link or ephemeral service mesh routes introduced added latency. They implemented caching strategies, pushing ephemeral data into each side for local reads. Over time, they considered a direct connect-like solution to reduce overhead.
9.3 Handling of Edge Cases in SSO
Some B2B partners used older SAML IdPs. Others used native Azure AD. The integration was mostly smooth, but distinct enterprise user roles sometimes demanded advanced claim transformations in Azure AD or custom logic in the AWS side.
10. Business Impact
10.1 B2B Growth and Partnerships
Thanks to the robust integration with Azure AD, ShopSphere signed several large enterprise deals. Potential clients recognized the convenience: their employees or systems could seamlessly log into the commerce portal, place bulk orders, or retrieve specialized data in Azure SQL for further analysis. The frictionless SSO approach gave them a strong selling point, reinforcing “ease of integration” as a brand advantage.
10.2 Enhanced Data Analytics and Collaboration
With the partial shift of data warehousing to Azure SQL, the data team enjoyed new capabilities, especially hooking up Power BI dashboards. They discovered patterns in product stock levels, shipping times, and seasonal demands faster. Some advanced analytics tasks even used Azure Data Lake for raw data storage. This synergy improved internal decision-making.
10.3 Scalability Gains with a Service Mesh
As traffic soared during holiday seasons, the service mesh approach let them:
Gracefully route certain calls from AWS to Azure-based microservices, or vice versa, balancing load.
Perform canary rollouts for new features that spanned cross-cloud boundaries, ensuring minimal risk for production.
Minimal Overhead: Despite the multi-cloud complexity, the design was intentionally simple where possible—one main EKS cluster in AWS, one smaller AKS environment in Azure, plus a robust IPsec link or potential future ExpressRoute. The service mesh gave them dynamic traffic control, but day-to-day management of the mesh was less burdensome once operational.
11. Roadmap and Future Plans
ShopSphere’s leadership is exploring:
Direct Connect / ExpressRoute: A more stable, low-latency approach than the existing IPsec VPN. If cost and logistics permit, it would unify the networks at higher throughput.
Deeper Azure Synapse Integration: Possibly migrating more analytics from on-prem or from AWS Redshift to a consolidated Azure Synapse environment for big data pipelines.
API Ecosystem: Exposing APIs to partners who exclusively live in Microsoft ecosystems. With the current bridging, they can easily provide partner-specific APIs with Azure AD–based auth.
Global Expansion: Some new markets might prefer local Azure regions. The multi-cloud approach could replicate to new geographies more quickly, letting them localize compliance or data residency.
12. Detailed Architecture Diagram for Org wide Implementation
Explanations:
Main commerce site remains on AWS EKS.
Azure AD provides single sign-on for enterprise users, bridging to AWS IAM where needed.
Data flows from EKS to Azure SQL across a site-to-site VPN or IPsec tunnel, integrated with the service mesh for advanced traffic shaping.
The inventory microservices might exist partially on AKS, partially on EKS, all controlled by an Istio multi-cluster mesh.
13. Best Practices Derived from This Case
13.1 Start Small, Validate Connectivity
When bridging AWS and Azure, test the subnets, NAT rules, and route tables carefully. Overlapping IPs or misconfigured route tables cause major delays.
13.2 Leverage Service Mesh for Cross-Cloud Traffic
Ensure you have a strong reason to adopt a mesh, as it introduces complexity. But for advanced routing, canary deployments, or observability across clusters, it’s invaluable.
13.3 Federate Identity in a Single Place
In this scenario, Azure AD became the “source of truth” for enterprise logins. This approach ensures fewer user management headaches, especially for B2B or employee accounts. Meanwhile, AWS remains the environment for day-to-day microservices, with IRSA for resource access.
13.4 Document Everything
Cross-cloud identity bridging is easy to confuse. Provide thorough, up-to-date documentation so new developers or partner engineers can navigate the system effectively.
14. Conclusion
In bridging AWS EKS and Azure for an e-commerce platform, ShopSphere effectively expanded their capabilities, unified user identity for enterprise partners, and unlocked new data analytics potential with Azure SQL. The integration process involved site-to-site VPN for secure cross-cloud networking, OIDC/SAML bridging with Azure AD for single sign-on, and a service mesh (Istio) to shape traffic around complex inventory microservices that spanned both clouds. Despite the inherent complexity of multi-cloud orchestration, the company maintained a relatively minimal operational overhead by carefully planning subnets, standardizing microservice patterns, and leveraging advanced features like IRSA on AWS and Azure AD–backed tokens for data warehousing tasks.
Key Takeaways:
Cloud Specialization: Leveraging each cloud’s strongest features—AWS for high-traffic commerce, Azure for enterprise identity and data warehousing—achieved synergy rather than duplication.
Unified SSO: Azure AD integration lowered friction for B2B partners, streamlining the onboarding of enterprise clients who already have stable AD environments.
Service Mesh: Introducing Istio allowed advanced cross-cloud traffic shaping and robust mTLS, bridging the AWS and Azure microservices into a cohesive environment.
Future Scalability: As demands grow, they can further optimize connectivity (perhaps with Direct Connect/ExpressRoute), expand analytics in Azure, or replicate the architecture to new regions, all while preserving a consistent SSO approach.
In the end, ShopSphere’s approach demonstrates the feasibility and business benefits of combining AWS-based e-commerce with Azure-based enterprise integration and data services. By aligning identity, networking, and microservice best practices, they capitalized on multi-cloud advantages without incurring paralyzing complexity—thus ensuring a smoother path for ongoing growth and innovation in an ever-evolving global retail landscape.