Network as a software: CI/CD for NetDevOps
A question came up to my mind few weeks ago: Can enterprises operate on Networks like they are doing with Software today? In other words: Can enterprises start to automate things without buying additional solutions?
In the previous post, we reviewed what SDN is, in this post we’ll focus on enterprises having hundreds of physical legacy devices.
Should companies buy an out-of-the-box ready-to-use solution?
Enterprises usually don’t have time to develop robust tools. On the other side commercial tools usually cannot cover all prerequisites enterprises have. Moreover commercial tools cannot be easily integrated within business processes of the enterprises itselves. So enterprises are required to change operations according to the commercial tools. And that’s not a good idea.
In other words: there is no ready-to-use solution, because you cannot find two different enterprises running the Network in the same way. Some solutions can be easily adapted to some of your use cases, but no solution can cover all your needs without customizations.
Be also aware that high customizations usually make difficulties to upgrade to latest releases. Be also aware that solutions can be discontinued, both commercial and open source (history has brought us many examples).
Because each enterprise is unique, there is no step-by-step procedure to move from a legacy approach to an automated solution. Usually the help of someone who already has been involved in a similar process is priceless.
How much does it cost to adapt an existing solution or to build a new one?
This question is wrong, of course. The right question is: How much can I spare by implementing an SDN solution?
The term „spare“ is referred to:
- Less hours spent on operating the network
- Less hours because of unexpected downtime
- Less hours spent for maintenance (and expected downtime)
- Less hours spent for each task
So if the time (and cost) saved is less than the cost of adapting/developing an SDN solution, then you should think about it.
Network as a Software: What exactly does it mean?
Developers are familiar with a set of tools and methodologies that can be reused by the network guys. Discussing the DevOps philosophy and all related tools in this post is out of the scope, but let me mention some of them:
- CI/CD: the acronyms stand for Continuous Integration/Delivery/Deployment. Continuous Integration means that everyone can merge the code changes multiple times a day, and each merge is automatically tested to assure that the code is working. Continuous Delivery starts after CI and automatically builds the software. Continuous Deployment is a step up: the code is automatically deployed in production without any manual step.
- git is the (client) tool to track changes in files and allows to coordinate people working on the same software.
- GitLab is a web-based git repository manager with CI/CD features.
CI/CD for legacy Networks
Handling Networks as a Software means (in short and practice):
- Store Network configurations in a git repository;
- Automatically test changes;
- Deploy changes to the production, via a manual trigger or automatically.
The first prerequisite is a testing environment. How can we test network changes before send them to the production?
Usually MMR (Mass Market Retailers) have a complete demo installation for testing purpose, and that’s could be a good start. But what about ISP, enterprises, banks and so on? The answer is virtualization: many vendors provide virtual appliances that can be installed into a network emulator solutions like EVE-NG/UNetLab, Cisco VIRL, GNS3. In this post I’m using EVE-NG to emulate a network. By the way, many ISPs are already emulating their networks for testing or learning purposes.
The second prerequisite require to manage configuration using git. There are many different approaches, but I tried to abstract the network device and focus only on some part of the configurations. Again there is no perfect approach for everyone, so spend a few to design yours.
- An enterprise running Cisco devices only;
- The physical network is 100% emulated into EVE-NG using equivalent devices;
- Each device name has a postfix („-test“ or „-prod“) declaring if the device is in production or in test;
- Virtual devices are different (in term of ports and configurations) from the physical devices;
- We’re using the „replace config“ feature, so every time we’re pushing the full configuration to the devices.
The demo used in this post is very simple: just two routers interconnected with a single cable. Both routers are managed using an out-of-band network.
Every time a user commits a change:
1. The test lab is built from scratch (EVE-NG exports REST APIs even if they are not documented).
2. The running configs are downloaded from the test devices, then some parts of each config is removed to generate „templates“ for test.
3. New running configs are generated from templates and some variables.
4. New running configs are pushed to all test devices.
5. Some tests are run against the test environment to check if everything is working fine.
6. The test lab is destroyed.
7. Changes are now ready for production, manually or automatically.
The tools used in the demo
Many tools are involved to make it happen:
- EVE-NG: even if APIs are not documented, it’s easy to find them using tcpdump (and yes, also because I developed UNetLab).
- Python 3: because I’m using NAPALM, Nornir, Jinja2, Python is mandatory. Also because making API HTTP requests with Python is very easy.
- Jinja2: used to build running configs starting from templates and YAML files.
- YAML files: used to store the configuration for all network devices.
NAPALM: used to send commands to network devices (also NetMiko can be used if NAPALM is missing a feature).
- Nornir: used to parallelize NAPALM requests.
- GitLab-EE: store all files and trigger the pipeline on each commit (it’s installed into the EVE-NG server).
The pipeline in practice
Every commit to the repository stored into the GitLab server, triggers some scripts defined into the „.gitlab-ci.yml“ file. This file defines to environment (stage): test and deploy. Each commit starts a local gitlab-runner that run the following pre scripts:
- scripts/destroy_lab.py: using the EVE-NG APIs, it deletes any old stale lab.
- scripts/prepare_lab.py: using the EVE-NG APIs, it upload the test lab, start it, and wait that all nodes are ready to accept SSH connections.
When the lab is up and running, the gitlab-runner runs:
- scripts/get_templates_from_test.py: via SSH (Nornir + NAPALM) it downloads all configurations, then remove interface and router sections. The reason is obvious: prod and test environment differs, we need to get unique commands for each devices or the replace config feature won’t work.
- scripts/make_configs_for_test.py: using Jinja2 and YAML files, it builds complete configurations for test devices.
- scripts/push_to_test.py: via SSH (Nornir + NAPALM) it pushes all configurations to test devices.
- scripts/test_lab_1.py: via SSH (Nornir + NAPALM) it test that everything is fine.
- scripts/test_lab_2.py: via SSH (NAPALM) for tasks not working under Nornir (yet).
If the above scripts are ok, than a user can trigger a deploy into production. The after script is always executed to remove the lab after the tests. If something went wrong, the changes cannot go into production.
If tests are ok and a user confirm the changes for the production, the following scripts are executed:
The pipeline should be enhanced so:
the commit is reverted back if tests are not going fine (we should introduce merge requests, but it’s out of scope);
some test are added after the changes are applied to the production and eventually a rollback procedure;
store final configs to another git repository (they are not saved in this pipeline).
Even if it can appear a little bit complicated (and actually it is), this method allows to have lot of things under control:
- Every change is tested;
- Changes can go into production only after test and a manual confirm by a privileged user;
- Every change is tracked;
- Massive changes can be done in minutes.
But the real prerequisite is: changing network engineers mindset (and yes, teach them programming).
…at the beginning of this writing, my wish was to use this white paper to talk about CISCO ACI Multi POD solution, but suddenly I thought that it would have been much better to debate about it using a pragmatic approach; (too boring just talk about something that is already written somewhere) let’s give to it a spark!