A gentle introduction to cloud computing for Kaldiers

This is the first in a series of seven rather long posts, a hands-on crash course with walkthroughs and explanations of cloud HPC computing, and a gentle intro into Google Cloud Platform (GCP), which a vast collection of services that allow for creating, automating and monitoring your applications.

Whenever you see a word or phrase in oval highlighting, we’re exposing you to the GCP lingo: this is the term from the GCP documentation. We’ll highlight the terms that you will eventually be familiar with, so please note these terms. We’ll tread gently, do not jump ahead of yourself. I am an engineer experienced in deploying hardware solutions, and it still took me months to grok some of the more advanced concepts of the platform. GCP is vast, and I hope I could be your guide to understanding it at your own pace, with an assumption that your main job is ASR science, not tinkering with hardware that does not even physically exist, and differs quite significantly from hardware hardware.

What is BurrMill?

BurrMill is a suite of commands and other tools to run Kaldi experiments on a tight budget entirely in Google Cloud Platform (GCP). It can be used with other distributed computing loads, as long as their pattern of use is similar; this is described below in this post. Our goal has been to allow the user with no experience with cloud computing get started training and evaluating their models very quickly, while learning this powerful platform as you progress.

How quickly is “very quickly?” Well, you can get it up and running in two or three days. The longest part will be waiting for your quota allotment, which can take up to 2 days, but we’ll keep you busy with other stuff. You need not understand anything about GCP or clusters to start, but you can get from zero to ~~hero~~ a newbie in a week.

We don’t not try to isolate you entirely from GCP; rather, you’ll gain more experience with the platform as you work. You can use BurrMill tools to get up and running after learning and understanding very few GCP concepts on a highly abstract level. Nevertheless, you’ll learn the necessary platform features as you progress, both out of necessity and curiosity. Be open to change your mind about the ways you think of computers, networking and data storage, and you’ll gain new and exciting insights into computing at scale. Cloud is an entirely different kind of computing environment, where you can conjure a hundred of working nodes our of nothing in under a minute and dismantle them all in one command, or save a backup copy of a disk with 1.5TB of used space in a matter of minutes–a task that with a physical disk would take a tar or rsync job hours of work.

GCP is awesome!

Why did you create BurrMill?

While working for a small startup company (which is no longer either small or a startup, and we’ve since parted ways), at some point I needed more computing power than my 2 GPU and 16 core machine would crunch in a reasonable time. One model could have been trained in a little over a week: that’s too slow a pace for experimentation. At this point, we had two options: buy more computers and maintain an in-house cluster, or build one in the cloud. Running even two machines in tandem is already quite a task: you need to add fast (10Gbps) Ethernet adapters to them, or better two and gang them to work in parallel, buy a high-speed switch…

And the consumer-grade hardware is not extremely reliable when constantly stressed to the limit. A couple of months into my work in the cloud, my physical machine lost one DIMM slot. Think about the time I had to spend disassembling a computer tightly packed with GPUs, thick power cables neatly routed with cable ties away from airflow, then removing the CPU cooling system to get to the motherboard to replace it, then reassemble everything and route everything as neatly as it was done before. I was glad I did not have more of these boxes to maintain. Besides, more than 4 of them would overpower the normal office air conditioner and overheat. Since going cloud, I’d never plan for any compute-intensive workload on an in-house computer, this simple.

Of course, BurrMill did not start a this neat set of command you are looking at now. I automated my own work, of course (this now has this trendy-silly buzzword name “infrastructure as code”), but I knew the limitations and how and when to run the scripts, when it is safe and when not, and sometimes tweaked them based on situation. It took quite a lot of effort to rework these scripting bits into a consistent suite of tools, and I did all that in hope that the product will be useful to the wide research community. I sincerely hope you’ll enjoy it.

Looks like it must be expensive?

No! And this is the inherent property of cloud computing: it’s very reasonably priced, if not cheaper. You pool your money with countless thousands of other users to get the best engineers to maintain, fix and replace the datacenter-grade hardware for you. When you buy a $1K GPU, you have to keep it, in use or not, and then replace it in 2 years when a newer and faster one comes out (and I canceled the planned upgrade of my 2×1080Ti box to the new 2000 series). In the cloud, you pay for what you use, by minute or by second of use. As a point of reference, the largest production model I trained, a 30-million parameter NNET3, took 48 hours to train and 4 hours to test-decode on 480 CPUs^✼0.006655 USD×vCPU/hr × 480 vCPU × 4hr = 12.78 USD for the decode job CPU use; some RAM was also necessary, but it’s also very affordable. (I always decode many interim saved models, not only the final one), all under $300 in expenses, with the number of GPUs progressing from 3 at the start to 18 at the end. A back-of-envelope calculation shows that my puny “powerhouse” box would huff and puff on this task for about 35 days. I would not even attempt it on one machine; I’d probably buy at least the second box and upgrade each to 3 GPUs, totaling $8K; and that’s for a DIY-assembled computer. Also, we had unmetered electricity with our lease, and a computer like this is a 1kW space-heater (thermals from the CPU and 2 GPU sensors read 900W combined; waste heat of RAM, voltage regulator and other parts probably added up to more than 100W, to round it up).

When your cloud computer is turned off, you are not paying for its CPU, GPU and RAM, not a cent, nada. You still pay for its disks: for example, my 10 and 20GB SSDs that I keep provisioned at all time cost me $5.10 a month. Don’t be surprised at the small disk size: this is more than enough for daily work, and if I need a larger disk, I just save my home disk as a snapshot (remember you should pay special attention the oval-encircled words?), a form of permanent, cheaper storage, then resize the disk to any size I need today, and then I can save what I need, kill the large work disk and restore the old, small one when I no longer need a lot of space. (You cannot shrink a disk or restore a snapshot to a smaller disk: this is where your good old rsync has to come into play). You can script this task for yourself, and it takes a couple minutes. Changing the number of CPUs and amount of RAM is a one-liner.

In the end, 60 to 90% of your bill will be for the use of GPU, and 95% for the GPU, CPU and RAM. The rest of the costs is negligible.

If you are (or work for) a startup or small business, or even a moonlight researcher paying out of your own pocket, GCP is very affordable, if you use it right. And BurrMill’s design goal is to make sure you use it right.

The pattern of use that BurrMill employs is often called elastic: compute nodes are created only when they are needed, and dismantled as soon as they sit unused for some time (6 minutes, if you are curious about the exact number).

What do I need?

Not much, really:

Be familiar with SSH: This is the way to connect to your cluster machines. We’ll touch on the convenient setup of keys and using our proxy program that wraps a very secure IAP Tunnel. This way, none of the machines need to have listening ports open to the Internet; you connect directly inside a firewall.
A Google account. Make sure to enable two-factor authentication on it, since raw computing power is a valuable commodity sought after by bitcoin miners. In some cases (as was in mine), your training data may be PII-sensitive, too, so take this extra step to protect your account.
A Web browser. Besides Cloud Console, GCP provides a small Debian-based virtual machine (called Cloud Shell officially, or DevShell in their open source code) free of charge, which you can use directly from the browser for up to 50 hours a week. This is the most secure way to run administrative tools, such as BurrMill commands.
A machine you can comfortably connect to the cluster and run experiments. It can be any OS supported by Cloud SDK (Linux, Windows or Mac), as long as it has an SSH client with a decent terminal emulator available. Cloud Shell is not designed for that, because a TTY emulation in browser is not the most precise and comfortable to use, and you will likely exhaust the 50 hour a week quota.
Optionally, you can use your own Linux computer if you are extending BurrMill-packaged software inventory, and need to debug its build process. Unless it’s a Debian “Buster” 10, your best bet is use Docker for local build debugging, so it mimics the target OS completely (we have a script for that). You can as well use a machine in the cloud for this, if you do not have Linux at hand, and also to avoid the Docker simulated environment. Scripted builds are performed by the Cloud Builder service, so you do not need a more powerful computer capable of building large software distribution, Kaldi being a prime example.

Kaldi workload pattern

We need to say a couple words about Kaldi, even if you are after a different use for BurrMill. Kaldi pipeline is controlled from a single computer with scripts that shard workload into independent work units, and dispatches these shards (via a Kaldi plugin script you name) to a computing cluster. The two main advantageous traits of the pattern are:

The shards are independent of each other. They use a shared file system, but do not communicate directly (in other words, no MPI).
The shards complete quickly; 2 to 20 minutes is the most sensible range.

These requirements are not set in stone, but reduce your cost of computation by ≈70% (yes, you read it right: at ≈0.3× the regular cost), by using preemptible VMs, which GCP may stop at any time with a 30s advance notice. For those of you who are after long-running MPI computations:

BurrMill uses Slurm as the workload manager

If you are familiar with Slurm, you can tailor it to your HPC use pattern. Currently, GCP offers computing nodes with sizes up to:

48 Skylake-X cores with 624 GB of RAM and, optionally, up to 4 of P100 or T4, or 8 of V100 GPUs (N1). This is the only VM type that can have GPUs attached to.
80 Skylake-X cores with 3.75 TB or RAM (M1)
30 Cascade Lake Xeon Platinum 9282 cores with 240 GB of RAM (C2)
112 EPYC Rome cores with 896 GB of RAM (N2D)

If you can shard your work into non-interacting units that fit this size, either short-running or checkpointable—once every 15 minutes is ideal, once an hour is probably the practical limit—you can pretty much use the same low cost benefit of preemptibility.

This was a digression: I had nothing but Kaldi in mind when starting the project, but let us know if you are using BurrMill to help you with non-Kaldi workloads.

Can I help you?

Yes, please, by all means do! You can help by opening a bug report if something does not work as expected, or contributing code. The project is licensed under the Apache 2.0 license; you may keep your copyright to yourself or assign it to “BurrMill Authors,” your choice. If you want to add a larger feature, please open an issue to discuss it first. Or, feel free to grab any open issue with the green “help wanted” label, but post a note to that issue that you are working on it.

Can you help me?

Sure, we’ll be happy to help. Post your question to the burrmill-users group if it’s a question, or open a GitHub issue if you think you are affected by a bug or are requesting a new feature. You can ask a short question in a comment down below, but if I believe it would be of benefit to the community (and most questions are), I may prompt you to post it instead at one of the above places. Comments are a good place to report typos or asking to clarify the writing, or just to say hi.

If you are looking for consulting help (“can you build, fix and/or maintain the cluster for us”), keep in mind that BurrMill is only a set of tools helping you organize other software to do the computation work in a cost-efficient way, and that “other” software has its own professional paid support. Slurm is developed, maintained and supported by SchedMD, Inc., and GCP has various paid support options, too. If you still think that neither of these options is more appealing, please contact me. I can give no promises, but some of the project members may happen to be able to contribute enough quality time to do the consulting work for you.

May I donate to support your work?

Thank you, that’s a virtuous offer, but no. Although the supporting of OSS development financially is trending up now, and that is certainly a good trend, we are not accepting donations.

Personally, not as a team member but as the writer of this blog (and incurably a cat person), I’m asking you to consider donating instead to an organization working for the humane treatment of humans, such as Amnesty International or Médecins Sans Frontières (a.k.a. Doctors Without Borders), before those working for the humane treatment of other living critters. Humans are the cutest of all animals! And there are way too many of us in the world who are doing much, much worse than the eggheads running their high-performance computations for a living or for the fun of it.