SimplifyCFG, part 1

- 8 mins read

Introduction

Code optimization is one of the most important steps in the compiler’s code generation pipeline. Optimization is performed in multiple passes, which are categorized into analysis passes and transformation passes. Analysis passes help the compiler gather information and understand the code structure. Examples of such passes include dominance tree construction and alias analysis, which provide insights into code. Transformation passes, on the other hand, are responsible for modifying the target program to improve its performance while ensuring correctness. These passes rely on the information obtained from analysis passes to make safe and effective optimizations.

A prominent example of a transformation pass is SimplifyCFG. This pass is applied repeatedly throughout the optimization pipeline and performs various tasks, including dead code elimination, speculative execution, and eliminate duplicate phi nodes. In fact, it handles so many operations that its source code spans approximately 8,500 lines. Here we will focus on explaining the default actions performed by this pass.

Let’s start

The SimplifyCFGPass class

The SimplifyCFG.h file defines the following class:

class SimplifyCFGPass : public PassInfoMixin<SimplifyCFGPass> {
  SimplifyCFGOptions Options;
  public:
      /// The default constructor configures the pass 
      /// to prioritize canonical IR over optimal IR
      SimplifyCFGPass();
      /// Construct a pass with optional optimizations.
      SimplifyCFGPass(const SimplifyCFGOptions &PassOptions);
 
      /// Run the pass over the function.
      PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

The SimplifyCFGPass class inherits from PassInfoMixin<SimplifyCFGPass>, which is designed for use with the New Pass Manager. Additionally, the class contains an attribute named Options, which allows us to specify the optimizations we want the pass to perform, in addition to the default ones.

Invoking the pass

The pass manager invokes the pass by calling its run member function, which is defined in SimplifyCFG.h. This function takes two primary arguments:

  1. A reference to F, the function we want to optimize.
  2. The AM (AnalysisManager), a utility class that helps the pass invoke various analysis passes to gather information about the code.

In the code snippet, we can see that AM is used to call TargetIRAnalysis and AssumptionAnalysis. While we won’t delve deeply into these passes, it’s worth noting their roles:

  • TargetIRAnalysis: This is an LLVM analysis pass that provides target-specific information. The returned TTI (TargetTransformInfo) object exposes APIs for querying such information.
  • AssumptionAnalysis: This pass identifies and collects assumptions within the function F, enabling the pass to use these assumptions(llvm.assume’ Intrinsic) to guide its transformations.
PreservedAnalyses SimplifyCFGPass::run(Function &F,
                                       FunctionAnalysisManager &AM) {
  auto &TTI = AM.getResult<TargetIRAnalysis>(F);
  Options.AC = &AM.getResult<AssumptionAnalysis>(F);
  DominatorTree *DT = nullptr;

  if (RequireAndPreserveDomTree)
    DT = &AM.getResult<DominatorTreeAnalysis>(F);
  if (!simplifyFunctionCFG(F, TTI, DT, Options))
    return PreservedAnalyses::all();
  PreservedAnalyses PA;
  if (RequireAndPreserveDomTree)
    PA.preserve<DominatorTreeAnalysis>();
  return PA;
}

The dominator tree (DT) is initialized as nullptr. As previously mentioned, this pass offers a variety of options, one of which is maintaining the dominator tree. This behavior can be controlled using the simplifycfg-require-and-preserve-domtree flag. By default, the RequireAndPreserveDomTree option returns false unless explicitly set to true. When this option is not enabled, the DT will not be initialized.

  DominatorTree *DT = nullptr;
  if (RequireAndPreserveDomTree)
    DT = &AM.getResult<DominatorTreeAnalysis>(F);
}

We can verify that the DT is not initialized using gdb.

379       if (RequireAndPreserveDomTree)
(gdb) source commandLineOtionsPrettyPrinter.py ####load pretty printer i made :)####
(gdb) p RequireAndPreserveDomTree
$1 = Hidden: Hidden
NumOcurr: 0
Ocurrences: Optional
Formatting: Normal
Misc: None
FullyInitialized: true
last position at args 0
Additional vals 0
Argument name: 0x555556314818 "simplifycfg-require-and-preserve-domtree"
Type of arg<0x0>
Help description 0x5555563147b8 "Temporary development switch used to gradually uplift SimplifyCFG into preserving DomTree,"

#### call to operator bool() ####
(gdb) call RequireAndPreserveDomTree.operator bool() 
$2 = false 
(gdb) n #### jump line 380 DT was not initialized ####
381       if (!simplifyFunctionCFG(F, TTI, DT, Options))

}

The RequireAndPreserveDomTree flag is implemented as a command-line option of type bool, allowing us to inspect its value directly in gdb. This can be done by invoking the bool operator or by observing that the code skips the associated if statement during debugging.

Finally, we are ready to invoke the optimizations. The function simplifyFunctionCFG(F, TTI, DT, Options) is called to perform the optimizations. It returns a bool indicating whether any changes were made to the IR. If no changes were made, the pass will preserve all the analysis results.

  if (!simplifyFunctionCFG(F, TTI, DT, Options))
    return PreservedAnalyses::all(); //Preserve analysis if no changes were made.
}

Inside the simplifyFunctionCFG function, we can identify three primary calls for optimizations. These are the key functions that we will explain in detail:

  1. removeUnreachableBlocks:
    This function traverses the control flow graph (CFG) and removes all basic blocks that are unreachable within the graph.

  2. tailMergeBlocksWithSimilarFunctionTerminators

  3. iterativelySimplifyCFG:
    This function is closely related to the SimplifyCFGOptions class mentioned earlier. It performs the majority of the various optimizations provided by the SimplifyCFG pass. Examples of these optimizations include:

    • Simplifying switch statements.
    • Deleting dead basic blocks.
    • Sinking common code.
static bool simplifyFunctionCFGImpl(Function &F, const TargetTransformInfo &TTI,
                                    DominatorTree *DT,
                                    const SimplifyCFGOptions &Options) {
  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
 
  bool EverChanged = removeUnreachableBlocks(F, DT ? &DTU : nullptr);
  EverChanged |=
      tailMergeBlocksWithSimilarFunctionTerminators(F, DT ? &DTU : nullptr);
  EverChanged |= iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options);
 
}

removeUnreachableBlocks function

This is the first function called by the pass. The parameters passed to the function are as follows:

  1. A reference to F, the function that we want to optimize.
  2. DTU (DominatorTreeUpdater): This parameter is set to nullptr because we do not intend to preserve the dominator tree.
  3. MSSAU (MemorySSAUpdater): This parameter is initialized to its default value, nullptr.
bool llvm::removeUnreachableBlocks(Function &F, domtreeupdater *DTU,
                                   MemorySSAUpdater *MSSAU) {
  SmallPtrSet<BasicBlock *, 16> Reachable;
  bool Changed = markAliveBlocks(F, Reachable, DTU);
 
}
279 bool EverChanged = removeUnreachableBlocks(F, DT ? &DTU : nullptr);
(gdb) s
llvm::removeUnreachableBlocks (F=..., DTU=0x0, MSSAU=0x0) 
## DTU and MSSAU are nullptr 
}

The core of this pass lies in the markAliveBlocks function, which receives a set of basic blocks named Reachable. This function populates the Reachable set with the basic blocks that can be reached within the control flow graph.

Explanation of the markAliveBlocks Function

The markAliveBlocks function relies on three key variables:

  1. worklist: (vector of basic blocks) – Acts like a stack, storing blocks that need to be processed. SmallVector<BasicBlock*, 128>.
  2. BB: (basic block) – Represents the current basic block being analyzed. BasicBlock *BB.
  3. Reachable: (set of basic blocks) – Keeps track of the blocks that can be reached in the control flow graph (CFG).

The function traverses the CFG, adding the successors of the current basic block (BB) to the worklist and marking them as reachable. If a successor is already in the Reachable set, it will not be inserted again.

Apart from flagging reachable blocks, the function also performs checks on each basic block. These checks can modify the CFG by marking certain blocks as unreachable. Below are two scenarios where the CFG can change:

  1. Storing to a null or an undefined memory location – This can lead to the block being considered unreachable.
  2. An llvm.assume intrinsic evaluates to false – If an assumption is explicitly marked as false, the block becomes unreachable. The following example demonstrates this using the C++ [[assume]] attribute:

Visualization of How markAliveBlocks Works

Below are three examples demonstrating how the markAliveBlocks function operates in different scenarios:

  1. Example 1: Standard markAliveBlocks Execution

    • Shows how the function traverses the control flow graph and marks reachable blocks.
  2. Example 2: Marking a Block as Unreachable Due to an Undefined Store

    • Demonstrates a case where a block is removed because it stores a value to an undefined memory address.
  3. Example 3: Marking a Block as Unreachable Using llvm.assume

    • Illustrates how LLVM assumptions (llvm.assume) can result in a block being marked as unreachable when an assumption is known to be false.

In examples 2 and 3, we saw cases where markAliveBlocks returns Changed = true. This happens when blocks are marked as unreachable due to:

  • Storing to an undefined address.
  • An llvm.assume intrinsic evaluating to false.

Checking for Unreachable Blocks

Once we have the reachable set, the next step is verifying whether all blocks in the function are reachable:

  1. If all blocks are reachable (Reachable.size() == F.size()), the function simply returns Changed.
  2. If some blocks are unreachable, the function iterates over all blocks in F, checking whether they are in the Reachable set using:
    Reachable.count(&BB)
    
    • If count == 1, the block is reachable, and we continue.
    • If count == 0, the block is unreachable and is added to a vector BlocksToRemove.
    • If DTU (Dominator Tree Updater) is enabled, it also skips blocks pending deletion.

Removing Dead Blocks

After iterating through the function:

  • If BlocksToRemove is empty, the function returns Changed.
  • Otherwise, Changed is set to true, and DeleteDeadBlocks is called to eliminate the unreachable blocks.
 bool Changed = markAliveBlocks(F, Reachable, DTU);
 
  // If there are unreachable blocks in the CFG...
  if (Reachable.size() == F.size())
    return Changed;
  
  SmallSetVector<BasicBlock *, 8> BlocksToRemove;
  for (BasicBlock &BB : F) {
    
    if (Reachable.count(&BB))
      continue;
    // Skip already-deleted blocks
    if (DTU && DTU->isBBPendingDeletion(&BB))
      continue;
    BlocksToRemove.insert(&BB);
  }
 
  if (BlocksToRemove.empty())
    return Changed;
 
  Changed = true;
 
  DeleteDeadBlocks(BlocksToRemove.takeVector(), DTU);
 
  return Changed; 
}

Conclusion

In this discussion, we explored how the SimplifyCFGPass is invoked and examined the command-line options class in LLVM. We saw that by default, the dominator tree is not preserved in this pass. However, it can be preserved by enabling the --simplifycfg-require-and-preserve-domtree command-line option.

We then delved into the removeUnreachableBlocks function, which starts with the markAliveBlocks function. This function identifies reachable basic blocks by traversing the control flow graph (CFG). Any blocks that cannot be reached are flagged for removal. Furthermore, the function accounts for obvious undefined behavior (UB) cases that also make certain blocks unreachable.

The process continues with removeUnreachableBlocks, which compares the number of reachable blocks with the total number of blocks in the function. If any blocks are found to be unreachable, they are collected in a set called BlocksToRemove. These unreachable blocks are then removed using the DeleteDeadBlocks function.

In the next part, we will explore the tailMergeBlocksWithSimilarFunctionTerminators function and its role in further optimizing the control flow graph.


References

ReferenceDescriptionLink
1DominatorDominator
2Alias AnalysisAlias Analysis
3Simplify the CFGSimplify the CFG
4SimplifyCFG.hSimplifyCFG.h
5New Pass ManagerNew Pass Manager
6llvm::AnalysisManagerllvm::AnalysisManager
7llvm::TargetTransformInfollvm::TargetTransformInfo
8llvm::AssumptionAnalysisllvm::AssumptionAnalysis
9llvm.assumellvm.assume
10llvm::Functionllvm::Function
11Remove UnreachableRemove Unreachable
12llvm::DomTreeUpdaterllvm::DomTreeUpdater
13llvm::MemorySSAUpdaterllvm::MemorySSAUpdater
14markAliveBlocksmarkAliveBlocks
15llvm::SmallPtrSetllvm::SmallPtrSet
16BasicBlockBasicBlock
17markAliveBlocksmarkAliveBlocks
18assumeassume