Hi!

See this as my high-level public backlog. I don't track day-to-day activities like the tasks for play-with-ghost.com.

Sharing my challenges feels like the right thing to do as I get so much from the open-source community. If this can help people to better understand what we are building here, I'm glad to share it :)

You would like to get involved? β€” If you have solid skills πŸ€“ with Docker Swarm, Linux bash and the gang* and you would love to help a startup to launch πŸ”₯ a solid project, I would love to get to know you 🍻. Buzz me πŸ‘‹ on Twitter @askpascalandy. You can see the things that are done and the things we have to do here.

I’m looking for bright and caring people to join this journey with me.

To see how we roll (technically speaking) at FirePress, please check the post What kind of Back-End drives FirePress.

We are hosting between 30 to 60 websites/en/services at any given moment. Not so much at this point as we are in the Beta phase. I’m looking to define an official SLA for our stack.

In short we have hoster challenges. Think static website and blog/CMS (Ghost) sites. This site is actually running within a container at http://firepress.org/en/. The home page is running into another one at http://firepress.org/.


Table of Content

  1. Up and Running
  2. Must be built
  3. Backlog
  4. Exploring the idea

1) Up and running

To see how we roll (technically speaking) at FirePress, please check the post What kind of Back-End drives FirePress.

We are hosting between 30 to 60 websites/en/services at any given moment. Not so much at this point as we are in Beta phase. I’m looking to define an official SLA for our stack.

In short we have hoster challenges. Think static website and blog/CMS (Ghost) sites. This site is actually running within a container at http://firepress.org/en/. The home page is running into another one at http://firepress.org/.

  • βœ… Our stack is cloud agnostic. No AWS/Azure/Google locked in.
  • βœ… We use Ubuntu servers a deploy them via CLI Docker Machine
  • βœ… We configure our servers via a bash script. No need for teraform at the moment but probably will some day.
  • βœ… We set UFW rules to work along Docker
  • βœ… We run services docker service create (well 95% of the time).
  • βœ… We use Resilio service to share a common folder between all nodes. Looking to switch… see below.
  • βœ… Reverse proxy to redirection public traffic
  • βœ… Docker label and deploy services against those constraints
  • βœ… Fancy bash script to launch services like:
    • Traefik
    • Percona (MySQL)
    • Ghost
    • Nginx
    • Portainer
    • Sematext
    • rClone
    • Most containers are built on Alpine.
  • βœ… We deploy each website via an unique ID and fancy bash script
  • βœ… Generate dynamic landing page via a script from an HTML template. Nothing fancy yet, but great at this stage.
  • βœ… Our back up processes are solid.
  • Via cron
  • InternvalL: every 4 hours, every day
  • Compressed and encrypt before going outside the cluster on Backblaze B2.
  • Notified in Slack when the backup is done
  • Keeping only the last 2 backup on the DB node
  • Swarm (raft) is also backed up
  • βœ… Cron docker system prune --all --force on each node
  • βœ… Cron back up the Swarm Raft
  • βœ… Docker build
  • Highly standardized for all containers
  • Tagging edge, stable, version are made automatically. We build our containers simply by running ./builder.sh + directory name
  • Versioning is A1. We use tags: edge and stable
  • βœ… We deploy our web app with a PathPrefix (Traefik)
  • mycie.com/green/
  • mycie.com/blue/
  • mycie.com/yellow/
  • We use Cloudflare CLI - Create, update, delete | Zone, A, CNAME etc via flarectn which run within a sporadic container
  • βœ… We contribute to making Docker a better place
  • Feature Request: Show --global instance numbers when docker service
  • Fixed β€” https://github.com/moby/moby/issues/27670
  • Scheduler limits the # of ctn at 40 per nodes worker (overlay network limit is 252 ctn) | Swarm 1.12.1
  • Fixed β€” https://github.com/moby/moby/issues/26702

WORK IN PROGRESS:

Go back to the Table of content


2) Must be built


β­• MUST-01a) Monitoring stack
  • Metrics | Collects, processes, and publishes metrics
    • Intel Snap | Collects, processes, and publishes metrics
    • InfluxDB | Stores metrics
    • Grafana | Displays metrics visually
  • Logs ELK (ElasticSearch, Logstash, Kibana)
  • To try | https://github.com/bvis/docker-prometheus-swarm
  • My requirements are not clear at the moment as I have 0 experience with this whole paging thing
  • Monitoring Percona Mysql performance DB (in docker of course)
  • Alerts management (i.e. one node is not responsive)
    • I do not have idea about the performance of my Percona DB
    • I have no alert if one node is down, etc.
β­• MUST-01b) Logs
β­• MUST-02) Traefik config

Traefik is a beast. So many configs!

  • Traefik allows me to automatically create https for each site. But I can’t make it work along Cloudflare service. It’s one or the other. I’m screwed so I don’t use SSL at the moment.
  • Test ACME renewal
β­• MUST-03) Caching website / blogs

Challenge β€” Add a Varnish caching container for each blog (or maybe one for every domain we host??)

  • CMO, a request goes to Traefik CNT > Ghost CTN > MySQL CTN
  • FMO, I want Traefik CNT > Varnish Cache > (if contain is not cached...) > Ghost CTN > MySQL CTN
β­• MUST-04) DNS load balance BEFORE hitting the swarm cluster

Challenge β€” At the moment, Cloudflare point to to ONE node. If this node crash, all our site goes down !

  • Cloudflare are working on their load balancing solution but let's be proactive. See this ticket.
  • We need a health check to see if our 3 managers are health and do a round robin sticky session between them. If one manager is not healthy, the round-robin system shall stop sending traffic to this node. If node Leader 1 is down, the system shall point traffic to node Leader 2 or 3 (health check).

β­• MUST-05A) Minio storage for our private Docker registry
  • All nodes in the cluster shall have access to Minio bucket
  • Would be nice to use Backblaze B2 as storage provider - wip
  • To consider | https://github.com/cloudflavor/miniovol
  • Storage pricing is key. No AWS S3.
  • Backblaze is the best deal at the moment. I use them to do our back up.

maybe REX-Ray

maybe Portworx and Minio together

β­• MUST-06) Deploy a HA MySQL database

Go back to the Table of content


Backlog (other stuff)



πŸ™Š backlog-04) Container as an external hard drive

As a DevOps hero, I want to:
launch a new server,
run a container (or a service) as an external hard drive,
have my applications consume this data like it always existed.

User stories

  1. As a DevOps hero, I’m looking to have a container that I can deploy as a service via Docker Swarm. Let’s call it this service perm data. When I run this service as β€”global the data is synced in every host I have.

  2. As a DevOps hero, when perm data on two hosts or more, perm-data service must communicate between each other to get a maximum of performance. Your classic SQL DB use case.

  3. As a DevOps hero, I want to be able to completely shut down my data center knowing the data is stored in a 3rd party place like S3 or Backblaze B2.

Big challenges between 2 and 3.

  1. As a DevOps hero, I should not pre-define the size of the perm data. I want it to be dynamic. (But I’m OK to live with a 50G volume during the PoC.)

  2. As a DevOps hero, I want my 100s of web apps services to mount the same perm data. It’s not one web app for one perm data. I want to manage one perm data for a lot of web apps.

  3. As a DevOps hero, I want to run a single docker service create (gloabal mode) command. All nodes have the same data available to them. I can mount N1, N2, Nx services like this:

mount

for caddy container I would mount:

--mount type=bind,src=/mnt/statefuldir/client123/caddy/srv,dst=/srv

for mysql container I would mount:

--mount type=bind,src=/mnt/statefuldir/client123/mysql/var/lib/mysql,dst=/var/lib/mysql

volume

or, if I could mount sub-folders in docker volumes, it would be even easier:

--mount type=volume,src=/statefulvol/client123/caddy/srv,dst=/srv
--mount type=volume,src=/statefulvol/client123/mysql/var/lib/mysql,dst=/var/lib/mysql

Challenge: To my knowledge and IMHO,
I cannot use different sub-folders when I mount a volume to an application. That’s why I prefer the idea to use a container where all the data is synced within the container and have my apps using the data from this container.

β­• BACKL-xx) Deploy Piwik
β­• BACKL-01) Swarm delay
β­• BACKL-02) Zero-downtime deployments with rolling upgrades
β­• BACKL-03) A better docker service update behaviour toward services with replicas = 1
β­• BACKL-04) Find the best practice to update each node
  • At the moment the docker deamon needs to restart... and the DB goes down for 1-2 minutes
β­• BACKL-05) Monitoring our DB | PMM
β­• BACKL-06) Build multi stage NodeApp
β­• BACKL-07) Secret
  • Passwords, Credits, ENV for each site
  • I need to do a major update to the workflow to use Docker Secrets
β­• BACKL-08) Redirect path to domain.com/web'/'

Go back to the Table of content


Exploring the idea


β­• X1) Cluster crash mitigation

This is a big one. Let's say a whole cluster is not available for 6 hours. Whatever the reason. Shall we, as a business, cry on Twitter that our server vendor are down? Absolutely not! Remember the S3 crash in April 2017? Shit happens and I don't want this to happen to us at FirePress.

The idea here is that we would have two independent clusters running in two zones (data centre).

  • 50% of our clients are in NYC
  • 50% of our clients are in AMS

Let's say NYC crash. Fuck. OK no panic.

Deploy 100% of our clients to AMS.

The challenge is to this very quickly. Database merging + picture merging.

Then, went things are back to normal, redistribute 50%/50%.

With this setup, it also allows an easy transition from one cluster to a new one. I love it. Don't patch. Scrap and start from scratch.

β­• X2) Deploying servers
  • Build a Packer / Terraform routine to deploy new nodes (see also SCW Builder)
  • Minimize manual processes (of running bash scripts) to setup up Docker Swarm join / Gluster, UFW rules for private networks
  • Better use of Docker-machine so I can use eval more efficient instead of switching between terminal windows. I have a bug doing this at the moment eval.
β­• X3) ChatOps

It would be nice to use Slack as a terminal. Why is that?? Here is my use case.

I want to let none-technical folks (the operations) run Docker stack without having to setup their user/pass/local environment and all the pain that come with welcoming a new user in your DevOps stuff. I assume I could prevent from doing some actions as well like rm *.

β­• X4) Use a VPN | OpenSSH

Go back to the Table of content

β­• X5) CICD
  • Of course one day it will make sense to get there
  • I don't feel the need for this at the moment, the docker workflow by itself is solid enough
  • Would be great to rebuild image every night
β­• X3) FaaS

Conclusion

With such a tool, as a DevOps hero, I can automate the whole lifecycle application, from servers provisioning up to have my apps in production in one single command and finally deserve my title.


Hi!

If you have solid skills πŸ€“ with Docker Swarm, Linux bash and the gang* and you would love to help a startup to launch πŸ”₯ a solid project, I would love to get to know you 🍻. Buzz me πŸ‘‹ on Twitter @askpascalandy. You can see the things that are done and the things we have to do here.

I’m looking for bright and caring people to join this journey with me.

Thanks in advance!
Pascal