bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Update dma-generate pass to (1) work on blocks of instructions (instead of just	Uday Bondhugula	2019-03-29	1	-2/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loops), (2) take into account fast memory space capacity and lower 'dmaDepth' to fit, (3) add location information for debug info / errors - change dma-generate pass to work on blocks of instructions (start/end iterators) instead of 'for' loops; complete TODOs - allows DMA generation for straightline blocks of operation instructions interspersed b/w loops - take into account fast memory capacity: check whether memory footprint fits in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA generation is performed until it does fit in the provided memory - add location information to MemRefRegion; any insufficient fast memory capacity errors or debug info w.r.t dma generation shows location information - allow DMA generation pass to be instantiated with a fast memory capacity option (besides command line flag) - change getMemRefRegion to return unique_ptr's - change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst' - other helper methods; add postDomInstFilter option for replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods Eg. output $ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir /tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { ^ $ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir /tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { PiperOrigin-RevId: 232297044
*	Define the AffineForOp and replace ForInst with it. This patch is largely ↵	River Riddle	2019-03-29	2	-7/+10
\| \| \| \| \| \|	mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body. PiperOrigin-RevId: 232060516
*	LoopFusion: insert the source loop nest slice at a depth in the destination ↵	MLIR Team	2019-03-29	1	-20/+94
\| \| \| \| \| \|	loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out. PiperOrigin-RevId: 231684492
*	Update tests using affine maps to not rely on specific map numbers in the ↵	River Riddle	2019-03-29	9	-267/+303
\| \| \| \| \| \|	output IR. This is necessary to remove the dependency on ForInst not numbering the AffineMap bounds it has custom formatting for. PiperOrigin-RevId: 231634812
*	Standardize the spelling of debug info to "debuginfo" in opt flags.	River Riddle	2019-03-29	1	-1/+1
\| \| \| \|	PiperOrigin-RevId: 231610337
*	Fold CallIndirectOp to CallOp when the callee operand is a known constant ↵	River Riddle	2019-03-29	1	-0/+13
\| \| \| \| \| \|	function. PiperOrigin-RevId: 231511697
*	Support fusing loop nests which require insertion into a new instruction ↵	MLIR Team	2019-03-29	1	-104/+282
\| \| \| \| \| \| \| \|	Block position while preserving dependences, opening up additional fusion opportunities. - Adds SSA Value edges to the data dependence graph used in the loop fusion pass. PiperOrigin-RevId: 231417649
*	Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. ↵	River Riddle	2019-03-29	3	-6/+6
\| \| \| \| \| \|	Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231342063
*	Automated rollback of changelist 231318632.	Nicolas Vasilache	2019-03-29	3	-6/+6
\| \| \| \|	PiperOrigin-RevId: 231327161
*	Define a AffineOps dialect as well as an AffineIfOp operation. Replace all ↵	River Riddle	2019-03-29	3	-6/+6
\| \| \| \| \| \|	instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231318632
*	Drop unused result from affine map in test case - NFC	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \|	PiperOrigin-RevId: 231008044
*	More updates of tests to move towards single result affine maps.	Chris Lattner	2019-03-29	6	-190/+192
\| \| \| \|	PiperOrigin-RevId: 230991929
*	Update replaceAllMemRefUsesWith to generate single result affine_apply's for	Uday Bondhugula	2019-03-29	2	-128/+197
\| \| \| \| \| \| \| \| \| \| \|	index remapping - generate a sequence of single result affine_apply's for the index remapping (instead of one multi result affine_apply) - update dma-generate and loop-fusion test cases; while on this, change test cases to use single result affine apply ops - some fusion comment fix/cleanup PiperOrigin-RevId: 230985830
*	Update createAffineComputationSlice to generate single result affine maps	Uday Bondhugula	2019-03-29	1	-80/+94
\| \| \| \| \| \| \| \| \| \|	- Update createAffineComputationSlice to generate a sequence of single result affine apply ops instead of one multi-result affine apply - update pipeline-data-transfer test case; while on this, also update the test case to use only single result affine maps, and make it more robust to change. PiperOrigin-RevId: 230965478
*	Update dma-generate: update for multiple load/store op's per memref	Uday Bondhugula	2019-03-29	1	-21/+81
\| \| \| \| \| \| \| \| \| \|	- introduce a way to compute union using symbolic rectangular bounding boxes - handle multiple load/store op's to the same memref by taking a union of the regions - command-line argument to provide capacity of the fast memory space - minor change to replaceAllMemRefUsesWith to not generate affine_apply if the supplied index remap was identity PiperOrigin-RevId: 230848185
*	Incremental progress to move the testsuite towards single-result affine_apply	Chris Lattner	2019-03-29	4	-62/+64
\| \| \| \| \| \|	instructions. PiperOrigin-RevId: 230775607
*	Minor updates + cleanup to dma-generate	Uday Bondhugula	2019-03-29	1	-32/+38
\| \| \| \| \| \| \| \| \| \| \|	- switch some debug info to emitError - use a single constant op for zero index to make it easier to write/update test cases; avoid creating new constant op's for common zero index cases - test case cleanup This is in preparation for an upcoming major update to this pass. PiperOrigin-RevId: 230728379
*	Add a function pass to strip debug info from functions and instructions.	River Riddle	2019-03-29	1	-0/+23
\| \| \| \|	PiperOrigin-RevId: 230654315
*	Fix single producer check in loop fusion pass.	MLIR Team	2019-03-29	1	-0/+28
\| \| \| \|	PiperOrigin-RevId: 230565482
*	Update fusion cost model + some additional infrastructure and debug ↵	Uday Bondhugula	2019-03-29	1	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	information for -loop-fusion - update fusion cost model to fuse while tolerating a certain amount of redundant computation; add cl option -fusion-compute-tolerance evaluate memory footprint and intermediate memory reduction - emit debug info from -loop-fusion showing what was fused and why - introduce function to compute memory footprint for a loop nest - getMemRefRegion readability update - NFC PiperOrigin-RevId: 230541857
*	loop unroll update: unroll factor one for a single iteration loop	Uday Bondhugula	2019-03-29	1	-0/+12
\| \| \| \| \| \| \| \|	- unrolling a single iteration loop by a factor of one should promote its body into its parent; this makes it consistent with the behavior/expectation that unrolling a loop by a factor equal to its trip count makes the loop go away. PiperOrigin-RevId: 230426499
*	Allocate private/local buffers for slices accurately during fusion	Uday Bondhugula	2019-03-29	1	-145/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- the size of the private memref created for the slice should be based on the memref region accessed at the depth at which the slice is being materialized, i.e., symbolic in the outer IVs up until that depth, as opposed to the region accessed based on the entire domain. - leads to a significant contraction of the temporary / intermediate memref whenever the memref isn't reduced to a single scalar (through store fwd'ing). Other changes - update to promoteIfSingleIteration - avoid introducing unnecessary identity map affine_apply from IV; makes it much easier to write and read test cases and pass output for all passes that use promoteIfSingleIteration; loop-fusion test cases become much simpler - fix replaceAllMemrefUsesWith bug that was exposed by the above update - 'domInstFilter' could be one of the ops erased due to a memref replacement in it. - fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was missing (the latter need not always be 1); add lbFloorDivisors output argument - rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape PiperOrigin-RevId: 230405218
*	Handle escaping memrefs in loop fusion pass:	MLIR Team	2019-03-29	1	-0/+58
\| \| \| \| \| \| \|	) Do not remove loop nests which write to memrefs which escape the function. ) Do not remove memrefs which escape the function (e.g. are used in the return instruction). PiperOrigin-RevId: 230398630
*	AffineExpr pretty print - add missing handling to print expr * - 1 as -expr	Uday Bondhugula	2019-03-29	3	-4/+4
\| \| \| \| \| \| \| \|	- print multiplication by -1 as unary negate; expressions like s0 * -1, d0 * -1 + d1 will now appear as -s0, -d0 + d1 resp. - a minor cleanup while on printAffineExprInternal PiperOrigin-RevId: 230222151
*	Add a constant folding hook to ExtractElementOp to fold extracting the ↵	River Riddle	2019-03-29	1	-0/+30
\| \| \| \| \| \|	element of a constant. This also adds a 'getValue' function to DenseElementsAttr and SparseElementsAttr to get the element at a constant index. PiperOrigin-RevId: 230098938
*	Fix test cases that were accessing out of bounds to start with (b/123072438)	Uday Bondhugula	2019-03-29	1	-33/+34
\| \| \| \| \| \| \| \| \|	- detected with memref-bound-check - fixes b/123072438; while on this, fix another test case which was reported out of bounds PiperOrigin-RevId: 229978187
*	LoopFusion: Creates private MemRefs which are used only by operations in the ↵	MLIR Team	2019-03-29	1	-119/+164
\| \| \| \| \| \| \| \| \| \|	fused loop. ) Enables reduction of private memref size based on MemRef region accessed by fused slice. ) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence. *) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice). PiperOrigin-RevId: 229936698
*	Fix AffineApply corner case	Nicolas Vasilache	2019-03-29	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds a test reported by andydavis@ and fixes the corner case that appears when operands do not come from an AffineApply and no Dim composition is needed. In such cases, we would need to create an empty map which is disallowed. The composition in such cases becomes trivial: there is no composition. This CL also updates the name AffineNormalizer to AffineApplyNormalizer. PiperOrigin-RevId: 229819234
*	Update stale / target-specific information in comments - NFC	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \|	PiperOrigin-RevId: 229800834
*	Fix improperly indexed DimOp in LowerVectorTransfers.cpp	Nicolas Vasilache	2019-03-29	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL fixes a misunderstanding in how to build DimOp which triggered execution issues in the CPU path. The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to construct the dynamic dimensions should be: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and `dim %arg, 4 : memref<?x4x?x8x?xf32>` Before this CL, we wold construct: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 1 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and expect the other dimensions to be constants. This assumption seems consistent at first glance with the syntax of alloc: ``` %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32> ``` But this was actuallyincorrect. This CL also makes the relevant functions available to EDSCs and removes duplication of the incorrect function. PiperOrigin-RevId: 229622766
*	Add a canonicalization pattern to remove Dealloc operations if the memref is ↵	River Riddle	2019-03-29	1	-0/+27
\| \| \| \| \| \|	an AllocOp that is only used by Dealloc operations. PiperOrigin-RevId: 229606558
*	Add canonicalization to remove AllocOps if there are no uses. AllocOp has ↵	River Riddle	2019-03-29	1	-5/+12
\| \| \| \| \| \|	side effects on the heap, but can still be deleted if it has zero uses. PiperOrigin-RevId: 229596556
*	LoopFusion improvements:	MLIR Team	2019-03-29	1	-10/+140
\| \| \| \| \| \| \| \|	) Adds support for fusing into consumer loop nests with multiple loads from the same memref. ) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth. *) Removes dependence on src loop depth and simplifies cost model computation. PiperOrigin-RevId: 229575126
*	Add a canonicalization pattern for conditional branch to fold constant ↵	River Riddle	2019-03-29	1	-0/+16
\| \| \| \| \| \|	branch conditions. PiperOrigin-RevId: 229242007
*	LoopFusion: automate selection of source loop nest slice depth and ↵	MLIR Team	2019-03-29	1	-68/+276
\| \| \| \| \| \| \| \| \| \| \| \|	destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time). ) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality). ) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest. ) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests). ) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths). *) Test: Adds multiple unit tests to test the new functionality. PiperOrigin-RevId: 229219757
*	Fix typo in lower_vector_transfers.mlir	Nicolas Vasilache	2019-03-29	1	-2/+2
\| \| \| \|	PiperOrigin-RevId: 229010160
*	[MLIR] Clip all access dimensions during LowerVectorTransfers	Nicolas Vasilache	2019-03-29	2	-79/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds a short term remedy to an issue that was found during execution tests. Lowering of vector transfer ops uses the permutation map to determine which ForInst have been super-vectorized. During materialization to HW vector sizes however, some of those dimensions may be fully unrolled and do not appear in the permutation map. Such dimensions were then not clipped and may have accessed out of bounds. This CL conservatively clips all dimensions to ensure no out of bounds access. The longer term solution is still up for debate but will probably require either passing more information between Materialization and lowering, or just merging the 2 passes. PiperOrigin-RevId: 228980787
*	Drop -canonicalize from -dma-generate test case cmd	Uday Bondhugula	2019-03-29	1	-20/+21
\| \| \| \| \| \| \|	- should be testing on the output of -dma-generate and not '-dma-generate -canonicalize'; save trouble for those updating -canonicalize in the future! PiperOrigin-RevId: 228915192
*	Const fold splat vectors/tensors in standard add, sub, and mul ops	Lei Zhang	2019-03-29	2	-9/+108
\| \| \| \| \| \| \| \| \| \| \| \| \|	The const folding logic is structurally similar, so use a template to abstract the common part. Moved mul(x, 0) to a legalization pattern to be consistent with mul(x, 1). Also promoted getZeroAttr() to be a method on Builder since it is expected to be frequently used. PiperOrigin-RevId: 228891989
*	Uniformize composition of AffineApplyOp by construction	Nicolas Vasilache	2019-03-29	7	-88/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL is the 5th on the path to simplifying AffineMap composition. This removes the distinction between normalized single-result AffineMap and more general composed multi-result map. One nice byproduct of making the implementation driven by single-result is that the multi-result extension is a trivial change: the implementation is still single-result and we just use: ``` unsigned idx = getIndexOf(...); map.getResult(idx); ``` This CL also fixes an AffineNormalizer implementation issue related to symbols. Namely it stops performing substitutions on symbols in AffineNormalizer and instead concatenates them all to be consistent with the call to `AffineMap::compose(AffineMap)`. This latter call to `compose` cannot perform simplifications of symbols coming from different maps based on positions only: i.e. dims are applied and renumbered but symbols must be concatenated. The only way to determine whether symbols from different AffineApply are the same is to look at the concrete values. The canonicalizeMapAndOperands is thus extended with behavior to support replacing operands that appear multiple times. Lastly, this CL demonstrates that the implementation is correct by rewriting ComposeAffineMaps using only `makeComposedAffineApply`. The implementation uses a matcher because AffineApplyOp are introduced as composed operations on the fly instead of iteratively forwardSubstituting. For this purpose, a walker would revisit freshly introduced AffineApplyOp. Regardless, ComposeAffineMaps is scheduled to disappear, this CL replaces the implementation based on iterative `forwardSubstitute` by a composed-by-construction `makeComposedAffineApply`. Remaining calls to `forwardSubstitute` will be removed in the next CL. PiperOrigin-RevId: 228830443
*	Add safeguard against FM explosion	Uday Bondhugula	2019-03-29	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- FM has a worst case exponential complexity. For our purposes, this worst case is rarely expected, but could still appear due to improperly constructed constraints (a logical/memory error in other methods for eg.) or artificially created arbitrarily complex integer sets (adversarial / fuzz tests). Add a check to detect such an explosion in the number of constraints and conservatively return false from isEmpty() (instead of running out of memory or running for too long). - Add an artifical virus test case. PiperOrigin-RevId: 228753496
*	Implement branch-free single-division lowering of affine division/remainder	Alex Zinenko	2019-03-29	2	-0/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the lowering of `floordiv`, `ceildiv` and `mod` operators from affine expressions to the arithmetic primitive operations. Integer division rules in affine expressions explicitly require rounding towards either negative or positive infinity unlike machine implementations that round towards zero. In the general case, implementing `floordiv` and `ceildiv` using machine signed division requires computing both the quotient and the remainder. When the divisor is positive, this can be simplified by adjusting the dividend and the quotient by one and switching signs. In the current use cases, we are unlikely to encounter affine expressions with negative divisors (affine divisions appear in loop transformations such as tiling that guarantee that divisors are positive by construction). Therefore, it is reasonable to use branch-free single-division implementation. In case of affine maps, divisors can only be literals so we can check the sign and implement the case for negative divisors when the need arises. The affine lowering pass can still fail when applied to semi-affine maps (division or modulo by a symbol). PiperOrigin-RevId: 228668181
*	Fix DMA overlap pass buffer mapping	Uday Bondhugula	2019-03-29	1	-11/+10
\| \| \| \| \| \| \| \| \|	- the double buffer should be indexed (iv floordiv step) % 2 and NOT (iv % 2); step wasn't being accounted for. - fix test cases, enable failing test cases PiperOrigin-RevId: 228635726
*	Fix affine expr flattener bug + improve simplification in a particular scenario	Uday Bondhugula	2019-03-29	2	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	- fix visitDivExpr: constraints constructed for localVarCst used the original divisor instead of the simplified divisor; fix this. Add a simple test case in memref-bound-check that reproduces this bug - although this was encountered in the context of slicing for fusion. - improve mod expr flattening: when flattening mod expressions, cancel out the GCD of the numerator and denominator so that we can get a simpler flattened form along with a simpler floordiv local var for it PiperOrigin-RevId: 228539928
*	[MLIR] Make SuperVectorization use normalized AffineApplyOp	Nicolas Vasilache	2019-03-29	4	-201/+246
\| \| \| \| \| \| \| \| \|	Supervectorization does not plan on handling multi-result AffineMaps and non-canonical chains of > 1 AffineApplyOp. This CL uses the simpler single-result unbounded AffineApplyOp in the MaterializeVectors pass. PiperOrigin-RevId: 228469085
*	Introduce AffineMap::compose(AffineMap)	Nicolas Vasilache	2019-03-29	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \|	This CL is the 2nd on the path to simplifying AffineMap composition. This CL uses the now accepted `AffineExpr::compose(AffineMap)` to implement `AffineMap::compose(AffineMap)`. Implications of keeping the simplification function in Analysis are documented where relevant. PiperOrigin-RevId: 228276646
*	Fix 0-d memref corner case for getMemRefRegion()	Uday Bondhugula	2019-03-29	1	-0/+8
\| \| \| \| \| \| \|	- fix crash on test/Transforms/canonicalize.mlir with -memref-bound-check PiperOrigin-RevId: 228268486
*	Introduce AffineExpr::compose(AffineMap)	Nicolas Vasilache	2019-03-29	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	This CL is the 1st on the path to simplifying AffineMap composition. This CL uses the now accepted AffineExpr.replaceDimsAndSymbols to implement `AffineExpr::compose(AffineMap)`. Arguably, `simplifyAffineExpr` should be part of IR and not Analysis but this CL does not yet pull the trigger on that. PiperOrigin-RevId: 228265845
*	Extend loop-fusion's slicing utility + other fixes / updates	Uday Bondhugula	2019-03-29	2	-18/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- refactor toAffineFromEq and the code surrounding it; refactor code into FlatAffineConstraints::getSliceBounds - add FlatAffineConstraints methods to detect identifiers as mod's and div's of other identifiers - add FlatAffineConstraints::getConstantLower/UpperBound - Address b/122118218 (don't assert on invalid fusion depths cmdline flags - instead, don't do anything; change cmdline flags src-loop-depth -> fusion-src-loop-depth - AffineExpr/Map print method update: don't fail on null instances (since we have a wrapper around a pointer, it's avoidable); rationale: dump/print methods should never fail if possible. - Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to IsRangeOneToOne when it's trivially going to be true. - Add additional test cases to exercise the new support - update a few existing test cases since the maps are now generated uniformly with all destination loop operands appearing for the backward slice - Fix projectOut - fix wrong range for getBestElimCandidate. - Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since we didn't have any non-hyperrectangular ones. PiperOrigin-RevId: 228265152
*	Convert expr - c * (expr floordiv c) to expr mod c in AffineExpr	Uday Bondhugula	2019-03-29	1	-16/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Detect 'mod' to replace the combination of floordiv, mul, and subtract when possible at construction time; when 'c' is a power of two, this reduces the number of operations; also more compact and readable. Update simplifyAdd for this. On a side note: - with the affine expr flattening we have, a mod expression like d0 mod c would be flattened into d0 - c * q, c * q <= d0 <= cq + c - 1, with 'q' being added as the local variable (q = d0 floordiv c); as a result, a mod was turned into a floordiv whenever the expression was reconstructed back, i.e., as d0 - c (d0 floordiv c); as a result of this change, we recover the mod back. - rename SimplifyAffineExpr -> SimplifyAffineStructures (pass had been renamed but the file hadn't been). PiperOrigin-RevId: 228258120