When we modified the proposal and included the term “Cloud,” we meant to create a cloud agnostic platform comprising our own private cloud and also bare metal servers in low-traffic regions. Allow us to give some insights.
- Infrastructure
1.a. Personal Cloud – Our tech stack backbone is a private cloud infrastructure platform called “Scale Computing Platform”, https://www.scalecomputing.com/, something similar to VMware.
Presently, we have scale computing clusters deployed on the private racks we rent out from major data centres in 5 locations: Chicago, Sydney, London, Singapore, and Mumbai.
We are planning to launch more scale-clusters at other locations mentioned in the proposal OR use bare-metal dedicated servers for low volume regions (please read 1.b. Bare-metal Servers).
The scale clusters we deploy comprise of 3-6 node servers and a standby node server for redundancy. We use enterprise hardware on these nodes with Intel Gold CPUs, Nvme SSDs and high speed networking.
1.b. Bare-metal Servers - We also propose to use bare metal dedicated servers in low traffic regions where scalability or redundancy is not a concern. This will considerably lower the total cost while keeping the performance unaffected and the latency minimal across major populations across the internet.
- DNS/Load Balancing/Routing Based on Latency
We propose to use CloudFlare-based DNS and Routing to handle the internet traffic and route it to the nearest servers. CloudFlare has already created an ecosystem that will fit really well for our requirements. Our tests and production environments have shown superior performance. DNS based load balancing is a a much richer in vendor options.
3)Automation
We already have a functional, well-tested automation system created for public clouds, including AWS, using Terraform. We have already started working on extending this to our on-premise servers. This is a work in progress.
To give some additional insight: we use an extension of the Blue/Green methodology used by Terraform for the system software upgrades. Whenever a new version of the endpoint software is publicly launched, we take the canary testing approach; i.e., we upgrade the software in a low-traffic region, watch for any mishaps, and upon confirmation that the new version is working perfectly, we deploy the newer versions to all other regions.
- Support and Monitoring
We have a dedicated team of our own employees actively monitoring the systems and handling any issues proactively. They are also responsible for conducting security audits at regular intervals and upgrading systems as required.
We predominantly use Graphana dashboards for our on-premise server monitoring.
Below are some reasons why we chose the Scale Computing Platform as the core cloud infrastructure:
- Redundancy: There is always a stand-by hardware node which will kick-in whenever there is a hardware failure, and high availability is ensured with their self-healing machine intelligence.
- Managed Support: Paid for fully managed support from the vendor, including hardware replacement.
- Scalability: As with any cloud, we can easily scale the CPU, RAM, and disc of the instances.
- Backups are essentially OS images, which makes the process of backup automation and spinning up new instances during software upgrades a lot easier. This is built into the ecosystem.
- High-Speed Network-Scale clusters are enterprise grade.
- The No Single Point of Failure-Scale Computing Platform is designed as a clustered architecture. When virtual instances are created, they are shared across the nodes with high availability in mind.
In a nutshell, what we propose for Astar’s public end-point is
- A private cloud in 4-5 locations using Scale Computing Clusters and bare-metal dedicated servers at other locations in prominent well-known data centers, We own and manage all the hardware inside the rack.
- DNS and traffic routing are handled by the CloudFlare ecosystem.
- Automation and CI/CD are handled by a combination of Terraform and Ansible solutions that we create.
- Support is provided by our in-house team and monitoring using Graphana and other third-party tools as backup.