RadiumBlock Builders Program Application - Cloud independent High Performance Geo-Distributed Public Infrastructure

RadiumBlock is a web3 infrastructure service provider providing high performance infrastructure to the Polkadot ecosystem. We are proposing to run a high performance, geographically distributed public RPC API Service for Astar ecosystem on our platform, allowing free community access. Our RPC service for Kusama and Polkadot is the fastest(lowest latency) and most Geo-driverse to our knowledge.

We manage private compute infrastructure on private data centers for our web2 customers and we have been working on deploying endpoints on that infrastructure. We are happy to report that we can deploy that in multiple geographies now. Here is an updated proposal based on our own infrastructure - i.e. without using any cloud provider.

TLDR; We are sharing our proposal details here:

@bLd759 @Maarten re-posting as a new topic because the previous topic was closed.

If you read the reply from @bLd759 in the previous topic, I think you can find the answer.

That is why the topic was closed.

Understood. This proposal is different and addresses the concern @bLd759 expressed about not using cloud infrastructure. Hence as @bLd759 encouraged we are back with a private infrastructure based solution. In that sense having a separate topic seems appropriate too given the proposal is different from the earlier one.

1 Like

I’m sorry but I don’t see any change to this proposal from the last one.

We have been working on our own private compute deployment using our existing data center locations (locations we have had for 10+ years now) before your feedback and added more since then. We request to please look at the budget section. The budget breakdown assumes geo locations on our private compute server clusters NOT aws or cloud. This will allows for significantly more compute(~20x vs AWS) and bandwidth (250TB at no cost, ~$31k in aws) to be provided to the endpoint service. The compute and bandwidth savings scales up as more geo locations are factored in. All of this explicitly tries to address the cloud cost over run and independence issue. It does increase our labor cost. We have tried to be transparent about our pricing model in the proposal. Please do ask us any more questions you all may have.

Sorry again but I see only a few words and the cost change to your proposal (can’t compare with the other since the document has been edited), it’s still totally cloud oriented.
Is that what it takes to transform a proposal from cloud to on-premise solution?
Do you have some infos on your technical stack?

We will work on adding information about the tech stack. Could please help us understand what gives you the impression the proposal is cloud based? We would like to work on clearing up that misunderstanding.

Well all your communication and strategy is oriented around cloud providers, your validators run on AWS.
Your proposal keeps all same arguments of scalability, load balancing, orchestration, which are obviously not the strong points of an on-premise architecture. Metrics shown are all AWS dashboards…

Please note that a beyond the technical architecture, grants for infrastructure can only be temporary: we are asking all our infra providers to be self sustainable in the services proposed, some already are, others are working on it. We expect a self sustainability plan from all infra providers who apply for a grant.
Astar will become a DAO in the future, there will be no company behind to pay for invoices.

1 Like

When we modified the proposal, and included the term cloud, we meant to create a cloud agnostic platform comprising our own private cloud and also bare metal servers in low traffic regions. Let me give some insights.

1)Infrastructure

1.A) Private Cloud
——

Our tech stack backbone is a Private Cloud Infrastructure platform called “Scale Computing Platform” https://www.scalecomputing.com/, something similar to VMWare.

Presently, we have Scale Computing clusters deployed on the private racks we rent out from major Datacenter in 5 locations, Chicago, Sydney, London, Singapore and Mumbai.

We are planning to launch more Scale clusters on other locations mentioned in the proposal OR use bare-metal dedicated servers for low volume regions (please read 1.B. Bare-metal Servers).

The Scale clusters we deploy comprise of 3-6 node servers and a stand-by node server for redundancy. We use enterprise hardware on these nodes with Intel Gold CPUs and Nvme SSDs, and high speed network.

Below are some reasons why we chose Scale Computing Platform as the core cloud infrastructure:

  1. Redundancy - There is always a stand-by hardware node which will kick-in whenever there is a hardware failure and high availability is ensured with their self healing machine intelligence
  2. Managed Support - Paid fully managed support directly from the Vendor including any hardware replacement
  3. Scalability - We can scale the CPU, RAM and Disk of the instances easily as any cloud
  4. Backups - Backups are essentially OS images, which helps in the process of backup automation and spinning up new instances during software upgrade a lot easier. This is built into the ecosystem
  5. High Speed Network - Scale clusters are enterprise grade
  6. No Single point of Failure - Scale Computing Platform is designed as a clustered architecture. When virtual instances are created they are shared across the nodes with high availability in mind

1.B) Bare-metal Servers
—

We also propose to use bare metal dedicated servers in low traffic regions where scalability or redundancy is not a concern. This will considerably lower the total cost, while keeping the performance unaffected and latency minimal through major population across the internet.

  1. DNS/Load Balancing/Latency based Routing

We propose to use CloudFlare based DNS/Routing to handle the internet traffic and routing to the nearest servers. CloudFlare has already created an ecosystem that will fit really well for our requirements. Our tests and production environments have shown superior performance. There are many DNS providers for similar feature so there isn’t much vendor lock-in.

3)Automation

We already have functional well-tested automation system created for public clouds including AWS. - using Terraform. We have already started working on extending this to our on-premise servers. This is work in progress.

To give some additional insight - We use an extension of Blue/Green methodology used by Terraform for the system software upgrades. Whenever a new version of of the endpoint software is publicly launched, we take the canary testing approach, ie, upgrade the software on a low traffic region, watch for any mishaps, and upon confirmation that the new version is working perfectly, we deploy the newer versions to all other regions.

  1. Support and Monitoring
    ——
    We have a 24/7 dedicated team of our own employees actively monitoring the systems, and handle any issues proactively. They are also responsible for conducting security audits on regular intervals and upgrades as required.
    We primarily use Graphana dashboards for our on-premise server monitoring. These dashboards will scaleup to monitor multi region endpoints. An example

In in the nutshell, what we propose for Astar public end-point is

  1. a private cloud in 4-5 locations using Scale Computing Clusters and bare-metal dedicated servers at other location on prominent well-known datacenters, - We own and manage all the hardware inside the rack.

  2. DNS and traffic routing handled by CloudFlare ecosystem

  3. Automation and CI/CD handled by a combination of Terraform/Ansible scripts and this proposed solution is work in progress

  4. Support by our in-house team and monitoring using Graphana and other third party tools as backup

When we modified the proposal and included the term “Cloud,” we meant to create a cloud agnostic platform comprising our own private cloud and also bare metal servers in low-traffic regions. Allow us to give some insights.

  1. Infrastructure

1.a. Personal Cloud – Our tech stack backbone is a private cloud infrastructure platform called “Scale Computing Platform”, https://www.scalecomputing.com/, something similar to VMware.

Presently, we have scale computing clusters deployed on the private racks we rent out from major data centres in 5 locations: Chicago, Sydney, London, Singapore, and Mumbai.

We are planning to launch more scale-clusters at other locations mentioned in the proposal OR use bare-metal dedicated servers for low volume regions (please read 1.b. Bare-metal Servers).

The scale clusters we deploy comprise of 3-6 node servers and a standby node server for redundancy. We use enterprise hardware on these nodes with Intel Gold CPUs, Nvme SSDs and high speed networking.

1.b. Bare-metal Servers - We also propose to use bare metal dedicated servers in low traffic regions where scalability or redundancy is not a concern. This will considerably lower the total cost while keeping the performance unaffected and the latency minimal across major populations across the internet.

  1. DNS/Load Balancing/Routing Based on Latency

We propose to use CloudFlare-based DNS and Routing to handle the internet traffic and route it to the nearest servers. CloudFlare has already created an ecosystem that will fit really well for our requirements. Our tests and production environments have shown superior performance. DNS based load balancing is a a much richer in vendor options.

3)Automation

We already have a functional, well-tested automation system created for public clouds, including AWS, using Terraform. We have already started working on extending this to our on-premise servers. This is a work in progress.

To give some additional insight: we use an extension of the Blue/Green methodology used by Terraform for the system software upgrades. Whenever a new version of the endpoint software is publicly launched, we take the canary testing approach; i.e., we upgrade the software in a low-traffic region, watch for any mishaps, and upon confirmation that the new version is working perfectly, we deploy the newer versions to all other regions.

  1. Support and Monitoring

We have a dedicated team of our own employees actively monitoring the systems and handling any issues proactively. They are also responsible for conducting security audits at regular intervals and upgrading systems as required.
We predominantly use Graphana dashboards for our on-premise server monitoring.

Below are some reasons why we chose the Scale Computing Platform as the core cloud infrastructure:

  1. Redundancy: There is always a stand-by hardware node which will kick-in whenever there is a hardware failure, and high availability is ensured with their self-healing machine intelligence.
  2. Managed Support: Paid for fully managed support from the vendor, including hardware replacement.
  3. Scalability: As with any cloud, we can easily scale the CPU, RAM, and disc of the instances.
  4. Backups are essentially OS images, which makes the process of backup automation and spinning up new instances during software upgrades a lot easier. This is built into the ecosystem.
  5. High-Speed Network-Scale clusters are enterprise grade.
  6. The No Single Point of Failure-Scale Computing Platform is designed as a clustered architecture. When virtual instances are created, they are shared across the nodes with high availability in mind.

In a nutshell, what we propose for Astar’s public end-point is

  1. A private cloud in 4-5 locations using Scale Computing Clusters and bare-metal dedicated servers at other locations in prominent well-known data centers, We own and manage all the hardware inside the rack.
  2. DNS and traffic routing are handled by the CloudFlare ecosystem.
  3. Automation and CI/CD are handled by a combination of Terraform and Ansible solutions that we create.
  4. Support is provided by our in-house team and monitoring using Graphana and other third-party tools as backup.
1 Like

@bLd759 do you all have any feedback on our proposal?

Astar team, we look forward to hearing from you all.

1 Like