A software company's success is dependent on its ability to ship new features, fix bugs, and improve code and infrastructure.
A tight feedback loop is essential, as it permits constant and speedy iteration. This necessitates that the codebase should always be in a deployable state so that new features can be rapidly shipped to production.
Achieving this can be difficult, as there are many working parts and it can be easy to introduce new bugs when shipping code changes.
Small changes don't seem to impact the state of the software in the short term, but long term it can have a big effect.
If small software companies want to be successful, they need to move fast. As they grow, they become slow, and that's when things get tricky.
Now, they
- have to coordinate their work more,
- need to communicate more,
- and have more people working on the same codebase. This makes it more difficult to keep track of what is happening.
Thus, it is essential to have a team who handles shipping code changes. This team should be as small and efficient as possible so that they can rapidly iterate on code changes.
Furthermore, use feature flags to toggle new features on and off in production. This allows for prompt and easy experimentation, as well as the capability to roll back changes if need be. Set up Alerts to notify the team when you deploy new code. This way, they can monitor the effects of the changes and take action if need be.
There are a few things that can make this process easier:
- Automate as much of the development process as possible
- A separate team is responsible for publishing code changes.
- Use feature flags to turn new features on and off in production
- Set up alerts to notify the team when you deploy new code.
If you follow these tips, you can deploy code to the production environment 100 times a day. And with minimal disruption.
Continuous deployment of small changes
This insight, though not new, is a core element of the DevSecOps movement. Another way to reduce risk (next to growing teams) is to optimize the developer workflow for rapid delivery. Achieve this, by increasing the number of people in the engineering department. This not only leads to an increase in the number of deployments but also in the number of deployments per engineer.
But what's even more remarkable, this reduces the number of incidents. While the average number of rollbacks remains the same.
But be careful with these metrics. On paper they are great. But, there's not a 100% correlation between customer satisfaction or negative customer impact.
Your goal should be to deploy many small changes. They are quicker to implement, quicker to validate, and of course to roll back.
Further, small changes tend to have only a minor impact on your system compared to big changes.
Generally speaking, the process, from development to deployment needs to be as smooth as possible. Any friction will result in developers bulking up changes and releasing them all at once.
To mitigate the friction within your process, do this:
- Allow engineers to deploy a change without communicating it to a manager.
- Automate testing and deployment at every stage.
- Allow different developers to test simultaneously and multiple times.
- Offer numerous development and test systems.
Next to a frictionless development and deployment process, concentrate on a sophisticated, open-minded, and blameless engineering culture. Only then you can deploy to production 100 times per day (or even more).
Our engineering (& company) culture
At XALT, we have a specific image in mind when we talk about our development culture.
For us, a modern development culture is
- one that is based on trust.
- that puts the customer at the center,
- uses data as a basis for decision-making,
- focuses on learning,
- is result and team oriented and
- promotes a culture of continuous improvement.
This type of development culture enables our development team to work quickly, deliver high-quality code, and learn from mistakes.
This approach goes hand in hand with our entire corporate culture. Regardless of the department, team or position. We also tend to challenge the status quo.
I know, this sounds a bit cheesy. But it's true. Allowing our team to focus on the problem at hand without any friction or unnecessary regulations enabled us to be more productive and faster.
For example, our development, testing and deployment process looks like this.
It's pretty simple. Once one of our developers has created and tested a new code branch, all it takes is one more person to review the code and it is integrated into the production environment.
But the most important core element at XALT is trust! Let me explain that in more detail.
We trust our team
We trust our team on what they are doing or what tools they are using to accomplish a task. If things go wrong or something doesn’t work out, it doesn’t matter. We start our post-mortem process and find the root cause of the incident, fix it and learn from our mistakes.
I know it's not just about development, testing and other parts are just as important.
Monitoring and testing
In order to get better, faster and ultimately make our users (or customers) happy, we constantly monitor and review our development processes.
In the event of an incident, it's not just a matter of getting the system up and running again. But also to make sure that something like this doesn't happen again.
That is why we have invested heavily in monitoring and auditing.
So we can
- Get real-time insights into what's going on,
- Identify problems and possible improvements,
- Take corrective action when necessary; and
- recover more quickly from incidents.
We have also implemented an automatic backup solution (daily) for our core applications and infrastructure. So if something breaks, we can revert to a previous version, further reducing the risk.
Minimizing risk in a DevOps culture
To mitigate risk in day-to-day development, we employ the following tactics:
- Trunk-based development: This is a very simple branching model where all developers work on the main development branch or trunk. This is the default branch in Git. All developers commit their changes to this branch and push their changes regularly. The main advantage of this branching model is that it reduces the risk of merge conflicts because there is only one main development branch.
- Pull Requests: With a pull request, you ask another person to review your code and include it in their branch. This is usually used when you want to contribute to another project or when you want someone else to review your code.
- Code review: Code review involves manually checking the code for errors. This is usually done by a colleague or supervisor. Perform code reviews using tools that automate this process.
- Continuous Integration (CI): This is the process of automatically creating and testing code changes. This is usually done with a CI server such as Jenkins. CI helps to find errors early and prevent them from flowing into the main code base.
- Continuous Deployment (CD): This is the process of automated deployment of code changes in a production environment.
It is also important that we establish clear guidelines to guide our development team.
The guidelines at XALT:
- At least one other developer reviews all code changes before we add them to the main code base.
- In order to create and test code changes before committing them to the main code base, we set up a Continuous Integration Server.
- Use tools such as Code SonarQube to ensure code quality and provide feedback on potential improvements.
- Implement a comprehensive automated test suite to find defects before they reach production.
Summary
The success of a software company depends on its ability to regularly deliver new features, fix bugs, and improve code and infrastructure. This can be difficult because there are numerous components being worked on, and as code changes are released, new bugs can easily appear. There are a few things that can make this process easier: Automate the process as much as possible, create a dedicated team responsible for releasing code changes, use feature flags to turn new features on and off in production, and set up alerts to notify the team when new code is deployed.
If you follow these tips, you should be able to go to production 100 times a day with minimal interruptions.