Hardware and Firmware Toolchains Simplified for Business Managers
Understanding hardware toolchains and how to manage their potential time-sinks
As a non-technical manager, it may come as a surprise when your CTO, or engineer tells you that something trivial on paper, will take significantly long to implement.
Some “trivial on-paper” tasks are known in advance but painful to schedule, while others frustratingly come out of the blue, blindsiding even experienced engineers.
The reaction to setbacks and foreseeing potential delays can be the difference between a startup surviving and failing.
A task that is easy to be overlooked even by engineers themselves, is setting up the tools they need to carry out work.
This article takes a look at how toolchains cause friction to a project, for non-technical founders and managers of hardware projects.
The article will be split in two sections:
What is a Toolchain?
Time-Sinks After Setup
What is a Toolchain?
A toolchain is a set of software tools, that a developer would use to carry out their technical tasks.
When starting a project, an engineer may setup their development environment.
Firmware and software engineers’ development environments are where they write code and is part of the toolchain.
Underneath the development environment is the rest of the toolchain, which handles the following functionality:
Building the code
Flashing the code (for hardware engineers)
Debugging
The functionality listed above represents the core functionality of toolchains, and are most unique for hardware engineers.
(A comprehensive toolchain could include tools to track the versions and changes of code as well as testing protocols that are autonomous).
Toolchains are not trivial for software engineers, however for hardware (or firmware) engineers, the toolchain has challenges unique to physical products.
Building Code
Building the code means putting it into a format that can be put onto the device being worked on.
As build tools are specific to the device hardware, a lot of the setup process is spent reading documentation provided by the manufacturer of the device hardware.
The specific hardware in question is based on design decisions the technical team have made particularly around a component called the microcontroller (for anyone interested, I go further into what a microcontroller is in this article).
The challenge of setting up a build system, is installing the correct combinations of software for the engineer’s operating system.
Flashing the Code
Flashing the code means putting whatever code is built and written, onto the device.
If you are building a device with LEDs, and you want the light colour changed from red to blue, this would be reprogrammed in the development environment, built and then flashed onto the device so it has the new configurations to follow.
Updating With Hardware
Unlike with conventional software, the deployment of firmware can require an additional physical device, called a debugging probe (or debugger) to flash or debug firmware onto the device you are using.
A debugger physically connects the engineers computer to the device they are working on, and the type of debugger required, is dependent on the chip which the firmware will be flashed onto.
(Incase you come across these terms, ST-Link, Segger J-Link are both examples of debuggers).
Similarly to build systems, the challenge of setting up tools for flashing are having the correct softwares installed, with the added challenges of having the right drivers, settings, and permissions for the computers USB to access the specific hardware on the device.
Updating Over-The-Air
After updating devices with debugging probes for the first time, engineers may also decide to build firmware that can be uploaded “over-the-air”, which means over Bluetooth (via a smartphone) or over the internet.
So long as the firmware builds are correct, over-the-air updates for internal company-use can be straight forward with mobile apps created by the manufacturers hardware.
Over-the-air updates become a challenge when building customer-facing apps, that are designed to carry out over-the-air updates.
Over-the-air app firmware updates are an app-feature, however it is important that they are also considered and scheduled as a toolchain setup process, as your are integrating different parts of your app-software to carry out firmware flashing process.
The challenge of implementing app-based firmware updates is combining different vendor softwares to work together (Android/iOS, Bluetooth, hardware manufacturers) and the process of testing, debugging and fixing the system to work is one that can cause delays.
It is vital to know what state your hardware becomes when an over-the-air update fails, as in the field it will be your customer who deals with the issue.
As the manager of the project, it is therefore important to schedule enough time in the system integration process of testing different scenarios of how firmware updates go, such as:
What happens when the devices battery dies mid-update?
What happens when someone does the update on an older smartphone?
What happens if someone quits the app by accident mid-update?
If you sit and think about it, you can come up with several of these scenarios, and it is easy to fall into the trap of thinking customers will use your technology as delicately your team would in their office.
Rushing the tests at this stage of the process will result in greater time, disruption and money spent down the line if customers have to return their devices for you to fix or replace.
Debugging
Debugging involves investigating issues with the system you are working on, and in the context of toolchains can mean observing it in real-time.
Debugging gives engineers a vision of how the device is operating.
Just like with flashing a device, debugging probes can be used to debug the device you are working on.
Logging
If an engineer wants to know where in the code the device fails, they can add logging messages to different stages in the code.
A logging message can be as simple as “Bluetooth enabled”, which would be written just after Bluetooth initialisation.
They can then connect their device to a serial monitor (which can be in their development environment) where they can see the output of the device as it runs in real-time.
If they were investigating their device not connecting to a smartphone, they may be curious to know if it is crashing while Bluetooth is being setup - and if they see the message “Bluetooth enabled” in the serial monitor they would know at the very least that their code is not crashing at that point.
The challenges with setting up monitoring tools to facilitate logging are similar to those of build and flashing tools.
Signal Analysis
As well as monitoring log outputs, additional hardware can be used to analyse the signal outputs of the device.
Logic Analyser
A logic analyser is a small device, that reads logic signals - essentially the low-power signals of communications between different components on hardware.
To investigate why your device was not outputting the expected motion sensor data, your engineer may connect their logic analyser to the device to see if the communication signals are being physically sent to and from their sensor.
These range in price, and work with computer interfaces, so the setup process involves software installation.
Oscilloscopes
Like logic analysers, oscilloscopes read logic signals - but they are capable of reading higher-voltage signals, have a bigger form factor and can be more expensive.
They do not have a setup process, however some junior engineers may need some time to get used to using them, if they are to make full use of their functionality.
Time-Sinks After Setup
After the setup of the toolchain has taken place, and product development is in full-flow, we still need to factor toolchain related tasks.
Here are some examples of instances after the setup of your teams’ toolchain that you need to be mindful of toolchain-related time-sinks.
New Starters
It may seem obvious but going through the same processes at different times or with different operating systems can cause delays.
It is important to ask what operating system they are comfortable working on, and ensuring they have the support with colleagues working in similar domains with setup.
Toolchain Upgrades
Occasionally a wider system that is being used, no longer supports the version of a toolchain your team is using.
I was in a team where our toolchain was 3 years out of date, and when we were forced to upgrade it (due to a system incompatibility), and the full migration process took 6 weeks to complete.
The reason it took so long was because like most embedded systems, different subsystems have to interact with each other, and with each upgrade, another feature no longer communicates correctly with another.
To mitigate a time-sink as severe as this, consider a toolchain review 2-4 times per year, to review or plan low-impact upgrades, that will not result in breaking changes.
If you are undergoing a major upgrade, ensure the engineering team test the new toolchain as thoroughly as possible, as a significant amount of the codebase could have been rewritten.
Consultancy Handover
A handover of work from a consultancy to an internal team, is a transition that has a significant risk of delays.
A big contributor to delays at consultancy handover includes a different toolchain used by the consultancy.
If your consultant has written firmware in their toolchain that is a year and a half old and you have installed the most recent version, there is a high chance the firmware will not work on your system.
(A real-life example of this would be with manufacturer “Espressif” and their toolchain versions 5.2 and 5.1).
The error messages as a result of failing will not be clear in indicating that it was due to a different version, so the debugging process will add even more friction to the handover process.
It is important to schedule full walk-through of toolchain setups with your consultant for handover meetings to mitigate these issues.
Old Developer Hardware
When I was a junior at a startup, we were operating lean and I was using an old personal Mac.
The machine had not caused me any issues so I never requested for a more up to date machine.
When testing firmware after a consultancy handover, I was unable to read data from our devices memory component.
I had connected the device to my Mac and was trying to read it through a serial monitor.
When connecting the device to an oscilloscope I saw that the communication signals were operating as normal.
I kept running the test on my Mac and noticed it work 3 out of ten times.
My Mac was so old that the toolchain was working unreliably.
As uncommon as this scenario is, the lesson stands that an upfront spend in good value technology will mitigate developer system level faults.
Turnover Of Staff
When an engineer who was involved in the setup of a codebase and toolchain leaves your team, they take with them a lot of knowledge that is not always easy to derive back efficiently.
In startups, a team member leaving without having processes documented can cause disruption for months to years.
A clear way of mitigating this, is documenting how toolchains and setup processes were setup - and if your team uses a platform like GitHub to version their software, having some notes written in the “ReadMe” section can save a lot of time in future.