Latency-Based Geo Forwarding + Round-Robin Load Balancing 3-Caddy/Garage Servers Cluster for TFGrid Dashboard Stacks #96

Closed
opened 2024-08-27 16:41:29 +00:00 by mik-tf · 7 comments
Owner

Situation

Status

  • WIP: Bert is on it and doing great progress.
    • Now compiling geoip2 into caddy and more tests

Update

  • Will keep this issue updated

Load balancing and certifications

  • host a small but truly geo garage cluster that exposes an S3 endpoint at 3 locations and one or more gateways
  • store any cert of (in this case geo forwarders) x amount of nodes that need to share it in some way (via an S3 caddy plugin).
    • certs need to be shared to cancel out the SPOF, and to offer more then one location to be forwarded from
    • specs: random and cyclical as per round-robin, i.e. can't check geolocation, but regions are possible (see below)

Future Phase

Once this issue is done, we can have new issues for future phase. See below.

  • Add functionality of zone records
# Situation - As explained and developed on GH issue: https://github.com/threefoldtech/grid_deployment/issues/56 # Status - WIP: Bert is on it and doing great progress. - Now compiling geoip2 into caddy and more tests # Update - Will keep this issue updated # Load balancing and certifications - host a small but truly geo garage cluster that exposes an S3 endpoint at 3 locations and one or more gateways - store any cert of (in this case geo forwarders) x amount of nodes that need to share it in some way (via an S3 caddy plugin). - certs need to be shared to cancel out the SPOF, and to offer more then one location to be forwarded from - specs: random and cyclical as per round-robin, i.e. can't check geolocation, but regions are possible (see below) # Future Phase Once this issue is done, we can have new issues for future phase. See below. - Add functionality of zone records
mik-tf added the
Story
label 2024-08-27 16:41:29 +00:00
mik-tf added this to the tfgrid_3_15 project 2024-08-27 16:41:29 +00:00
mik-tf changed title from Load balancer for tfgrid validators to round-robin Load balancing for tfgrid stacks 2024-08-27 16:47:40 +00:00
Author
Owner

Update

The geoforwarder + basic load balancing part is done. We need to clarify how to set HTTPS now.

Status

  • WIP
    • @coesensb is finalizing the HTTPS part, see below

Specs

  • Geoforwarder
  • Load balancing
    • Round robin at DNS provider level for each geoforwarder
  • HTTPS: the geoforwarder needs to direct user to proper dashboard URLs, via HTTPS
    • setup: 3 instances of the geo-forwarder with DSN round-robin, set with HTTPS
      • every server needs access to certificate
        • we need shared storage across endpoint
          • Option: S3 server garage node on every endpoint
            • Notes: we want it as FOSS as possible, but first we need that it WORKS :)
# Update The geoforwarder + basic load balancing part is done. We need to clarify how to set HTTPS now. # Status - WIP - @coesensb is finalizing the HTTPS part, see below # Specs - Geoforwarder - 3 options: https://github.com/threefoldtech/grid_deployment/issues/56#issuecomment-2379452618 - the user is directed to the closest dashboard among the available dashboards: be, us, sg, fin, etc. - Load balancing - Round robin at DNS provider level for each geoforwarder - HTTPS: the geoforwarder needs to direct user to proper dashboard URLs, via HTTPS - setup: 3 instances of the geo-forwarder with DSN round-robin, set with HTTPS - every server needs access to certificate - we need shared storage across endpoint - Option: S3 server garage node on every endpoint - Notes: we want it as FOSS as possible, but first we need that it WORKS :)
mik-tf changed title from round-robin Load balancing for tfgrid stacks to Geo-forwarder + round-robin Load balancing for tfgrid stacks 2024-10-04 14:52:39 +00:00
Author
Owner

Update

  • As discussed with Bert, we will do this https://github.com/threefoldtech/grid_deployment/issues/56#issuecomment-2394004366 where we use geolocation (2 different methods also checking that the URL is responsive) and fallback to quickest responsive stack.
    • This uses parallelism and it is as fast as the other methods with fallbacks. Round-robin is used as the load balancing basic mechanism.
  • still need to put the setup above online to test with caddy
  • still have to setup 3 poc vm's to test and run caddy/s3/garage setup (will use existing garage for poc) and build the custom caddy with a s3 plugin
# Update - As discussed with Bert, we will do this https://github.com/threefoldtech/grid_deployment/issues/56#issuecomment-2394004366 where we use geolocation (2 different methods also checking that the URL is responsive) and fallback to quickest responsive stack. - This uses parallelism and it is as fast as the other methods with fallbacks. Round-robin is used as the load balancing basic mechanism. - still need to put the setup above online to test with caddy - still have to setup 3 poc vm's to test and run caddy/s3/garage setup (will use existing garage for poc) and build the custom caddy with a s3 plugin
Author
Owner

Update

We will use the simpler approach of latency-based using javascript, as proposed by @scott. @lee confirmed it is OK to do this.

We are using a combination of caddy + garage for the certification.

Details

  • geoforwarder issue
    • 3 caddy/garage server cluster
      • certificate
      • multiple caddy instances
        • need same certificate
      • garage
        • set lightweight s3 interface
        • 1 garage instance on each node with caddy
        • connection with caddy done locally
      • caddy makes certificate, garage links them to other caddy instances
    • latency-based
      • works on browser, javascript based
      • fine for first stage
      • todo, see what we do for our own backend
        • check for 3.16
    • road-robin load balancing
      • the 3 servers of the cluster point to the same domain and round-robin balancing is applied to the 3
# Update We will use the simpler approach of latency-based using javascript, as proposed by @scott. @lee confirmed it is OK to do this. We are using a combination of caddy + garage for the certification. # Details - geoforwarder issue - 3 caddy/garage server cluster - certificate - multiple caddy instances - need same certificate - garage - set lightweight s3 interface - 1 garage instance on each node with caddy - connection with caddy done locally - caddy makes certificate, garage links them to other caddy instances - latency-based - works on browser, javascript based - fine for first stage - todo, see what we do for our own backend - check for 3.16 - road-robin load balancing - the 3 servers of the cluster point to the same domain and round-robin balancing is applied to the 3
Author
Owner

Update

Upcoming

  • Test the 3 caddy instances setup
  • Revise based on test
  • Put to production
# Update - Scott improved the latency-based geo forwarding script - Bert implemented it on 3 caddy instances with garage for certification - Github Issue Link: https://github.com/threefoldtech/grid_deployment/issues/56#issuecomment-2407047037 - Dashboard link to use the script: https://dashboard.geo.grid.tf # Upcoming - Test the 3 caddy instances setup - Revise based on test - Put to production
mik-tf changed title from Geo-forwarder + round-robin Load balancing for tfgrid stacks to Latency-Based Geo Forwarding + Round-Robin Load Balancing 3-Caddy/Garage Servers Cluster for TFGrid Dashboard Stacks 2024-10-11 16:49:39 +00:00
Author
Owner

Update

  • Testing and implementation done
  • Next step is to put to production: garage/caddy 3-node cluster for dashboard.geo.grid.tf
    • Should be done in the following days
# Update - Testing and implementation done - Next step is to put to production: garage/caddy 3-node cluster for dashboard.geo.grid.tf - Should be done in the following days
Author
Owner

Update

  • Bert will try to make this happen in the new few days, depending on ops workload:

TODO

  • setup a geo spread mini garage cluster,
  • setup caddy with s3 plugin
  • deploy the geo forwarding code

Issue on GH

# Update - Bert will try to make this happen in the new few days, depending on ops workload: # TODO - setup a geo spread mini garage cluster, - setup caddy with s3 plugin - deploy the geo forwarding code # Issue on GH - https://github.com/threefoldtech/tf_operations/issues/2835
Author
Owner
Completed by Bert: https://github.com/threefoldtech/grid_deployment/issues/56
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: tfgrid/circle_engineering#96
No description provided.