Safety Plan and Safety Case
An integrated solution
2025-09-23 26 min Season 1 Episode 6
Description & Show Notes
Interestingly, even 14 years after the release of the first version of ISO 26262, there are still no standardized templates for the Safety Plan and the Safety Case. What we often see instead are simple lists of the work products defined in the standard. However, it’s questionable whether such lists actually meet the ISO’s requirements for these two deliverables. At least in the case of the Safety Case, that’s likely not the case.
In today’s episode, we’ll take a completely different approach—by presenting a solution that integrates both work products. In other words, a deliverable that serves as both a Safety Plan and a Safety Case.
Transcript
Hello and welcome to a new episode of Applied FuSa, the podcast for FuSa pragmatists. Interestingly, even 14 years after the release of the first version of ISO 26262, there are still no standardized templates for the Safety Plan and the Safety Case. What we often see instead are simple lists of the work products defined in the standard. However, it's questionable whether such lists actually meet the ISO's requirements for these two deliverables. At least in the case of the Safety Case, that's likely not the case. In today's episode, we'll take a completely different approach—by presenting a solution that integrates both work products. In other words, a deliverable that serves as both a Safety Plan and a Safety Case.
We're going to present a document that was originally developed by us as a standard Safety Case. But then we realized something surprising: it also fully met all the requirements of a Safety Plan. At first, that caught us off guard. But on closer inspection, it quickly became clear: of course the document meets the requirements of both work products. After all, it describes all the safety measures necessary to make the product sufficiently safe. But let's take it step by step. Let's start at the beginning—with the idea for a new Safety Case template.
We, too, used to maintain a list of all safety-related work products as our Safety Case. Each work product came with some basic information—like the responsible person, review status, and links to the actual files. The common argument—still widely used today—is that all the necessary information to demonstrate sufficient product safety can be found within those documents. But what's completely missing is any form of actual argumentation. A sound argument always consists of claims and evidence that supports those claims. A list of links to work products, however, is essentially just a list of potential pieces of evidence—without ever making any specific claim. That cannot—and will not—be sufficient. So we set out to develop a generic form of argumentation—a structure that could be used in any project, with just a few adjustments. We started thinking about what such a generic structure of argumentation could look like. After several unsuccessful attempts, we spontaneously decided—almost as a joke—to simulate a courtroom scenario. We imagined a situation where we had to defend ourselves in court. An accident had occurred, and during the investigation, the expert witness concluded that our camera system had been faulty—and ultimately triggered the accident. Now it was our turn to explain ourselves. We had to provide solid proof that the sensor could be considered sufficiently safe. Suddenly, we realized what we were actually doing: We were systematically looking at all potential root causes for safety-relevant failure events—evaluating which countermeasures we had implemented, whether those measures were sufficient, and whether we could provide the necessary proof. And that was it. We had found the solution. Because what is functional safety, if not this: For every safety-relevant malfunction, you identify and analyze all possible root-causes—and then implement appropriate measures to reduce the risk to an acceptable level. And what is a Safety Case, if not the structured documentation of those failure modes, their root-causes, the countermeasures, and—ultimately—their verification? We had reached our goal. The only thing left to do was to put the idea into practice—and see if it holds up in real-world scenarios. So we got to work right away. We chose the front camera as our example.
We began by trying to define the function of the front camera in an appropriate way. And as anyone who's ever defined a system function knows—there are many ways to do it: From plain text descriptions to complex UML models that capture both static and dynamic behavior. Even Simulink models, in the end, are representations of the intended function. What we faced was a bit of a mess—one we had to sort out. And we quickly realized that the biggest challenge at first was this: Not getting lost in the details. In the end, we decided to describe the function and its failure modes using very short, concise sentences. We deliberately avoided too much technical detail, as the goal was not to develop a safety concept. Rather, the aim was to structure all existing documentation in such a way that a clear line of argumentation would be possible for each relevant malfunction. So it is entirely sufficient to name these in a simple form—which naturally ensures that each one is taken into account. Accordingly, the function of the front camera was defined as: "Send a list of relevant objects and describe each object using relevant physical parameters such as range, range rate, yaw, yaw rate, object class, etc." From this, a number of potential failure modes could immediately be derived—keeping in mind that these can occur on three different levels.
First, the object lists themselves can be faulty. We referred to this as the list level. Specifically, this means that objects may be missing from the list (missing object, or false negative object), or that objects may be included in the list that don't actually exist in reality (ghost object, or false positive object). Then, the description of an object can be incorrect. That's the object level. In other words, object parameters such as range, yaw, or object class may be wrong. For example, an underestimated range could lead to an unnecessary or incorrect braking maneuver. The lowest level is communication or protocol level—typically CAN communication. Errors on this level can also lead to safety-relevant malfunctions. We quickly realized how complex the whole matter would become. This was less due to the method itself, and more because even simple functions can have a whole range of malfunctions—all of which must first be identified and evaluated. For those malfunctions that can be considered not safety-relevant, at the very least, a justification must be provided as to why they were classified that way. For all other malfunctions, the Safety Case had to be further developed. We therefore decided to build the Safety Case using the so-called GSN method. GSN stands for Goal Structure Notation and is a method developed at the University of York for the graphical representation of argumentation structures. Its major advantage is that claims and evidences are presented graphically, which greatly improves the readability and clarity of the argument. It also offers a range of additional symbols for typical elements used in a line of reasoning—such as constraints, for example.
So, we built a GSN structure for our camera function. Yes, for the function itself. We did not develop a separate GSN structure for each malfunction, but a single one for the function as a whole. The overarching claim (our main claim) was: "The function of the camera can be considered sufficiently safe." Our goal was now to use GSN to represent all potential failure causes to a certain level of detail, so that the evidence could reference existing documents. Below the main claim, there were therefore two groups of sub-claims. On one hand, the identified malfunctions themselves. For each of them, we added an element that asserted suitable measures had sufficiently reduced the risk—such as, for instance, "Sufficiently safe regarding ghost objects." But there was one point we had previously overlooked. Regardless of all the malfunction, proof also had to be provided that the list was complete. What if a very relevant malfunction had been forgotten? That would be a disaster. Therefore, we added a corresponding element to the GSN structure that read: "List of malfunctions is complete and correct." Claims of this kind are very important for a Safety Case and can be most easily verified by questioning everything. We generated a list of malfunctions. How do we know it is complete? How do we know all malfunctions are correctly described? And so on. You have to exercise this questioning repeatedly at all levels. And you will be surprised at how many possible causes of failure you will discover in the process. We provided evidence for the completeness of the list of malfunctions through a review report. Of course, it's important to ensure that the review itself was properly conducted. Was the colleague who performed the review even qualified? Does he have the necessary expertise? Is he impartial? Yes, it can become very complex. And very quickly, the question arose: "At what point can we stop asking 'Why?' or 'What if?', like a two-year-old child?" We did not find universally valid criteria and ultimately left the decision to the engineering judgment of the Functional Safety Manager. We still believe this is the right and reasonable approach, as in our view, it will always be necessary for people with sufficient technical knowledge and experience to make the final decision as to whether a system can be considered sufficiently safe or not. This responsibility should not be completely handed over to acceptance criteria—tempting as that may be, for instance in terms of transparency and traceability. We believe it is more desirable to limit formalization to the information that a responsible person uses to make a decision. But the decision itself must ultimately be made by that person (and, of course, documented properly).
So, for the completeness of the list, we had planned to use review reports as evidence. Now we turned our attention to the individual malfunctions. As a first step, we added a claim under each malfunction in the GSN structure for each respective root cause—again phrased as a statement that the associated risk can be considered sufficiently low. At this point, we need to take a short detour and take a closer look at the term "sufficient." What exactly does sufficient mean? "Sufficient" means that the system can be considered adequately safe with respect to a specific malfunction. There will always be a residual risk, but the overarching goal of functional safety is to reduce this residual risk to an acceptable level. For an E-E-system like a sensor, this means that the functional safety concept (FSC) for the respective malfunction has been fully and correctly implemented. If a supplier can provide evidence for this, then the supplier's system can be classified as acceptably safe—again, with respect to the specific malfunction for which the FSC was implemented. In our example of the malfunction ghost object, this means that "sufficient safety measures are in place" precisely when the functional safety concept that the supplier of the camera system received for the ghost object malfunction has been implemented—and when there is also adequate evidence for it.
Back to our Safety Case. In the GSN structure, we included an element for the ghost object malfunction, among others. Naturally, the argument doesn't end there. On the next level, we added elements for each of the potential root causes of this malfunction—again, formulated as positive claims stating that the risk associated with each cause can be considered sufficiently low. Similar to how we handled the malfunctions themselves, we also added an element asserting the completeness of the list of root causes. As evidence, we again used a link to the corresponding review report. In GSN notation, evidences are referred to as solutions, because each piece of evidence concludes a branch of the argument. The solution element is linked to the corresponding files—for instance, the review report. The question now was: What kind of evidence can we provide for the individual root causes? After some deliberation—and many attempts that were discarded for various reasons—we came to the conclusion that there are primarily two branches relevant for each root cause: On one side is the safety analysis, which provides us with the list of safety measures that need to be implemented to sufficiently mitigate a malfunction and its causes. If the top-level event in a fault tree indeed represents the malfunction in question—and this is strongly recommended at the system level FTA—and if all relevant safety measures are properly integrated into the fault tree, then a minimal cut sets analysis can be used to assess the residual risk. This assessment should result in a Safety Analysis Report, which is subject to a peer review and, if successfully reviewed, formally approved. This approved report then serves as adequate evidence in the Safety Case for the completeness of the implemented safety measures. However, this only covers completeness. What about actual implementation? It is of little use to have an excellent safety analysis identifying the necessary safety measures, if those measures are then implemented incompletely or incorrectly. So, beneath each identified root cause, we added individual elements for each safety measure. For the completeness of the safety measures, we used the Safety Analysis Report as well as the corresponding Review Report as evidence. And for each safety measure, we further added one element each for Specification, Implementation, and Verification: • The Specification elements were linked to the respective requirements documents. • The Implementation elements were linked to the relevant work products, such as system architecture, source code, schematics, etc. • Finally, the Verification elements were linked to the appropriate Verification Reports. With this, we had reached an important milestone: For a single function, we had successfully structured all evidence for completeness and correctness of each root cause of every safety-relevant malfunction into a logical format. This enabled us to argue, in a traceable and structured manner, that the function can be considered sufficiently safe. We were very satisfied with this solution. Very satisfied! However, there was still one flaw — a crucial aspect that isn't explicitly derived from ISO 26262. I'm referring to possible interferences between implemented safety measures. In other words: How can we ensure that the safety measures do not interfere with each other in a way that might increase the risk again? In fact, we had identified a necessary improvement to our process, because up to that point, an analysis of potential interference between safety measures had not been planned or carried out. We therefore did two things: First, we adapted the process to include such an interference analysis. Second, we added a new element to the GSN structure, representing the Review Report of this interference review. To put it simply: We added a new claim stating that "No interactions exist between safety measures that could unacceptably increase the risk." As evidence, we linked this claim to the corresponding Review Report of the interference analysis. Now we had done it. We looked at the structure, searched for further gaps in the argumentation, found none, and finally gave each other a high five. We looked at the structure… and it felt complete. In a way, it was very complex, yet also lean. Complex, because a system can fail in a multitude of ways. But lean, because the structure ultimately describes this complex situation in an extremely organized and traceable manner. It's true — you can lose track. But just like with fault trees, complex GSN structures can be broken down into multiple substructures. For example, one might create a separate GSN structure for each malfunction, which is then linked back to the main claim. Similarly, you can break down the root causes of a single malfunction into their own substructures. This approach doesn't just improve manageability. There is another important benefit that should not be underestimated: reusability. This especially emerges when complex structures are modularized based on functional aspects, as the following example will illustrate.
In a project, certain malfunctions may be safety-relevant and mitigated using the same measures that were already applied in a previous project. In such cases, the corresponding sub-structure related to the malfunction can be fully reused in the new project. The line of argumentation remains the same, which makes sense – especially since it has already proven effective in practice. Only the links to the evidence files need to be updated, as these are typically specific to each project. We had already discussed that an analysis of potential interactions between safety measures is necessary, and that a positive outcome of this analysis will be an important element of the Safety Case. In the same way, potential interactions between the safety concepts of different malfunctions must be analyzed — and ultimately also the interactions between safety concepts of malfunctions of different functions. In summary, this means the structure of the Safety Case is as follows: Our Safety Case is based on the general statement: "The product is sufficiently safe. All Functional Safety Concepts have been fully and correctly implemented." On the level below, there is — for each safety-relevant function — one element stating that any risk originating from this function has been sufficiently mitigated, and another element asserting the completeness of the list of functions. On the third level, each function element branches into the corresponding malfunction elements, along with an additional completeness element for the list of malfunctions. Important: A completeness element must be provided for each function, in parallel to the malfunction elements. The fourth level contains the root-causes for each malfunction, as individual elements — again accompanied by an element for the completeness of the list of root-causes. Note: Each malfunction must have its own completeness element, placed directly beneath it and in parallel to the listed root-causes. Two more levels to go, and we're done. The second-to-last level — directly beneath the root causes — contains elements for the safety measures, as previously described. And below that, serving as the argument for the effectiveness of the safety measures, we have — in GSN terms — the so-called Solutions: These include references to Safety Analysis Reports (as evidence for the identified safety measures), as well as to the specification, implementation, and verification of each measure. In parallel to all levels, additional structures may be added as needed to capture cross-influences between safety measures and interactions between safety concepts. Finally, it should be mentioned that other process-relevant activities — such as the confirmation measures — can be added at appropriate places in the structure as needed. How do you determine where they are needed? Quite simply: by asking, whenever a claim is added to the GSN structure, the question: "What evidence do I need to support this claim?" And if you're working with lists of claims, then you must inevitably ask: "Is this list complete and correct — and what evidence supports that?" If all of this is taken into account, the result is a top-tier Safety Case. Wow… that sounds pretty complicated. But it isn't. It's complex, and not always easy to keep track of — but it's not complicated. And this approach has some advantages whose value should not be underestimated. First and foremost, the structure provides a true safety argument — a genuine line of reasoning that, if the development is sound, will be complete and correct. It is not just a disconnected list of work products with a few added properties and file links. No — by its very structure, it forms an argument, and thanks to the file links, it ultimately becomes real evidence. Secondly, due to the strict structure — organized by functions, their malfunctions, and the respective root causes — it is virtually guaranteed that the safety case will be complete. This, of course, assumes that the relevant work products, such as the safety analyses and the corresponding safety analysis reports, are themselves complete and correct. Thirdly, if the safety case is developed as described here, the likelihood is extremely high that any remaining safety gaps will be identified at an early stage, allowing countermeasures to be initiated in good time.
Another highly significant advantage of the safety case approach presented here is the fact that the basic structure can already be generated at the very beginning of a project—in principle, even during the quotation phase. As soon as a function is known, we can derive the potential failure modes. And we—as a supplier—can do this without customer support. There are established methods and tools for identifying failure modes. One well-known method is HAZOP. We will introduce another very efficient method which we have developed a few years ago—based on UML models—in a separate podcast episode. If the basic structure—down to the level of failure modes and possibly further levels, in case of reuse—can already be created at the start of a project, this greatly facilitates the planning of activities. Fundamentally, all activities within functional safety must be planned and executed to generate the necessary evidences. And with this, we have reached a very important point mentioned at the beginning. With the Safety Case as described here, a separate Safety Plan is essentially no longer necessary. Behind every piece of evidence required for the Safety Case, there are already planned activities. If this is not the case, the evidence in the Safety Case can be marked accordingly so that the planning of activities can be overseen by the Functional Safety Manager. As soon as the files belonging to a piece of evidence exist, the links can be added to the Safety Case. This fulfills all the requirements of a Safety Plan. All functional safety–relevant activities are identified, planned, coordinated, and monitored. In other words, the Safety Case presented today encompasses all the characteristics of a Safety Plan. Therefore, I prefer the term "Safety Integrity Document." Planning, development, and the ultimate demonstration of safety integrity are inseparable, and this is clearly reflected in the Safety Case we have developed. In conclusion, it is worth noting that we ultimately decided against using GSN. The main reason was the anticipated high training effort. The Functional Safety Managers would have had to learn not only GSN, but also the underlying Safety Case methodology, as well as the tool used to create GSN structures (Visio, which at the time was not a standard tool familiar to everyone). Instead, we created a requirements module and converted each GSN element into a requirement. This had the tremendous advantage that the template could be used directly by all responsible parties, since requirements management is a core competency that every employee must possess—including the use of tools such as DOORS or Polarion. We generated evidence by linking the requirements either to other requirements, to Jira tickets, or directly to specific files. The status of the requirements also proved to be very useful. When a requirement is marked as implemented, this means the corresponding evidence has been created. When a requirement is marked as verified, this means the evidence has been confirmed. And since tools like Polarion and Jira are also excellent for automatically generating metrics, the Safety Case could be used very effectively and efficiently to monitor the status of functional safety—even in very complex projects, and at any time. That said, this brings us to the topic of KPIs, which will be covered in a separate podcast. For today, it is enough input for you to digest. Stay safe and see you soon.
Applied FuSa – a podcast for Functional Safety pragmatists. Get your new piece of FuSa every other week.
Expert
00:00:46
Moderator
00:25:20