summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Add intrinsics for constrained floating point operationsAndrew Kaylor2017-01-263-0/+108
| | | | | | | | | | | | | | This commit introduces a set of experimental intrinsics intended to prevent optimizations that make assumptions about the rounding mode and floating point exception behavior. These intrinsics will later be extended to specify flush-to-zero behavior. More work is also required to model instruction dependencies in machine code and to generate these instructions from clang (when required by pragmas and/or command line options that are not currently supported). Differential Revision: https://reviews.llvm.org/D27028 llvm-svn: 293226
* [IfConversion] Use reverse_iterator to simplify. NFCKyle Butt2017-01-261-70/+35
| | | | | | This simplifies skipping debug instructions and shrinking ranges. llvm-svn: 293202
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-01-262-396/+370
| | | | | | | | UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-01-262-370/+396
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
* [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting ↵Craig Topper2017-01-261-0/+8
| | | | | | undef subvectors. llvm-svn: 293152
* New OptimizationRemarkEmitter pass for MIRAdam Nemet2017-01-254-0/+191
| | | | | | | | | | | | | | | | | This allows MIR passes to emit optimization remarks with the same level of functionality that is available to IR passes. It also hooks up the greedy register allocator to report spills. This allows for interesting use cases like increasing interleaving on a loop until spilling of registers is observed. I still need to experiment whether reporting every spill scales but this demonstrates for now that the functionality works from llc using -pass-remarks*=<pass>. Differential Revision: https://reviews.llvm.org/D29004 llvm-svn: 293110
* SDag: fix how initial loads are formed when splitting vector ops.Tim Northover2017-01-251-1/+4
| | | | | | | | Later code expects the vector loads produced to be directly concatenable, which means we shouldn't pad anything except the last load produced with UNDEF. llvm-svn: 293088
* GlobalISel: rework getOrCreateVReg to avoid double lookup. NFC.Tim Northover2017-01-251-20/+20
| | | | | | Thanks to Quentin for suggesting the refactoring. llvm-svn: 293087
* DebugInfo: remove unused parameter from function. NFC.Tim Northover2017-01-252-4/+2
| | | | | | | I think it's a hold-over from some previous iteration, but it's never set to true in LLVM as it exists now. llvm-svn: 293086
* Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFCKrzysztof Parzyszek2017-01-255-32/+16
| | | | llvm-svn: 293077
* Revert "Do not verify dominator tree if it has no roots"Chad Rosier2017-01-251-4/+0
| | | | | | | This reverts commit r293033, per Danny's comment. In short, we require domtrees to have roots at all times. llvm-svn: 293075
* Fix buildbot failures introduced by 293036Artur Pilipenko2017-01-251-2/+5
| | | | | | Fix unused variable, specify types explicitly to make VC compiler happy. llvm-svn: 293039
* [DAGCombiner] Match load by bytes idiom and fold it into a single load. ↵Artur Pilipenko2017-01-251-0/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt #2. The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used. From the original commit: Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 *a = ... i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24) => i32 val = *((i32)a) i8 *a = ... i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3] => i32 val = BSWAP(*((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: RKSimon, filcab, chandlerc Differential Revision: https://reviews.llvm.org/D27861 llvm-svn: 293036
* Do not verify dominator tree if it has no rootsSerge Pavlov2017-01-251-0/+4
| | | | | | | | | | | If dominator tree has no roots, the pass that calculates it is likely to be skipped. It occures, for instance, in the case of entities with linkage available_externally. Do not run tree verification in such case. Differential Revision: https://reviews.llvm.org/D28767 llvm-svn: 293033
* DAG: Recognize no-signed-zeros-fp-math attributeMatt Arsenault2017-01-251-1/+2
| | | | | | | | clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024
* GlobalISel: Fix typo in error messageJustin Bogner2017-01-251-1/+1
| | | | llvm-svn: 293023
* DAGCombiner: Allow negating ConstantFP after legalizeMatt Arsenault2017-01-251-4/+10
| | | | llvm-svn: 293019
* Try to prevent build breakage by touching a CMakeLists.txt.Ahmed Bougacha2017-01-251-1/+0
| | | | | | | | | | | Looks like our cmake goop for handling .inc->td dependencies doesn't track the .td files. This manifests as cmake complaining about missing files since r293009. Force a rerun to avoid that. llvm-svn: 293012
* GlobalISel: Use the correct types when translating landingpad instructionsJustin Bogner2017-01-251-4/+8
| | | | | | | | | | | There was a bug here where we were using p0 instead of s32 for the selector type in the landingpad. Instead of hardcoding these types we should get the types from the landingpad instruction directly. Note that we replicate an assert from SDAG here to only support two-valued landingpads. llvm-svn: 292995
* Revert rL292621. Caused some internal build bot failures in apple.Wei Mi2017-01-241-171/+0
| | | | llvm-svn: 292984
* [SelectionDAG] Handle inverted conditions when splitting into multiple branches.Geoff Berry2017-01-242-15/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When conditional branches with complex conditions are split into multiple branches in SelectionDAGBuilder::FindMergedConditions, also handle inverted conditions. These may sometimes appear without having been optimized by InstCombine when CodeGenPrepare decides to sink and duplicate cmp instructions, causing them to have only one use. This problem can be increased by e.g. GVNHoist hiding more cmps from InstCombine by combining equivalent cmps from different blocks. For example codegen X & !(Y | Z) as: jmp_if_X TmpBB jmp FBB TmpBB: jmp_if_notY Tmp2BB jmp FBB Tmp2BB: jmp_if_notZ TBB jmp FBB Reviewers: bogner, MatzeB, qcolombet Subscribers: llvm-commits, hiraditya, mcrosier, sebpop Differential Revision: https://reviews.llvm.org/D28380 llvm-svn: 292944
* [SelectionDAG] Teach getNode to simplify a couple easy cases of ↵Craig Topper2017-01-241-0/+13
| | | | | | | | | | | | | | | | | | | | | EXTRACT_SUBVECTOR Summary: This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations. For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there. Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines. Reviewers: RKSimon, delena Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29000 llvm-svn: 292876
* LiveIntervalAnalysis: Calculate liveness even if a superreg is reserved.Matthias Braun2017-01-241-6/+13
| | | | | | | | | | | | | | | | | A register unit may be allocatable and non-reserved but some of the register(tuples) built with it are reserved. We still need to calculate liveness in this case. Note to out of tree targets: If you start seeing machine verifier errors with this commit, it probably means that you do not properly mark super registers of reserved register as reserved. See for example r292836 or r292870 for example on how to fix that. rdar://29996737 Differential Revision: https://reviews.llvm.org/D28881 llvm-svn: 292871
* [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)David L. Jones2017-01-232-58/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The LibFunc::Func enum holds enumerators named for libc functions. Unfortunately, there are real situations, including libc implementations, where function names are actually macros (musl uses "#define fopen64 fopen", for example; any other transitively visible macro would have similar effects). Strictly speaking, a conforming C++ Standard Library should provide any such macros as functions instead (via <cstdio>). However, there are some "library" functions which are not part of the standard, and thus not subject to this rule (fopen64, for example). So, in order to be both portable and consistent, the enum should not use the bare function names. The old enum naming used a namespace LibFunc and an enum Func, with bare enumerators. This patch changes LibFunc to be an enum with enumerators prefixed with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override macros.) There are additional changes required in clang. Reviewers: rsmith Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28476 llvm-svn: 292848
* DAG: Don't fold vector extract into load if target doesn't want toMatt Arsenault2017-01-231-0/+5
| | | | | | | Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842
* [AArch64][GlobalISel] Legalize narrow scalar fp->int conversions.Ahmed Bougacha2017-01-231-0/+14
| | | | | | | | | Since we're now avoiding operations using narrow scalar integer types, we have to legalize the integer side of the FP conversions. This requires teaching the legalizer how to do that. llvm-svn: 292828
* DAG: Allow legalization of fcanonicalize vector typesMatt Arsenault2017-01-231-0/+3
| | | | llvm-svn: 292814
* LiveRegUnits: Add accumulateBackward() functionMatthias Braun2017-01-211-0/+24
| | | | | | | | | | | | | | | | | | | Re-Commit r292543 with a fix for the situation when the chain end is MBB.end(). This function can be used to accumulate the set of all read and modified register in a sequence of instructions. Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove the concept. - The AArch64A57LoadBalancing code is using a backwards analysis now which is irrespective of kill flags. This is the main motivation for this change. Differential Revision: http://reviews.llvm.org/D22082 llvm-svn: 292705
* GlobalISel: prevent heap use-after-free when looking up VReg.Tim Northover2017-01-201-1/+3
| | | | | | | | Translating the constant can create more VRegs, which can invalidate the reference into the DenseMap. So we have to look up the value again after all that's happened. llvm-svn: 292675
* [mips] Fix debug information for __thread variablePetar Jovanovic2017-01-202-1/+10
| | | | | | | | | | | This patch fixes debug information for __thread variable on Mips using .dtprelword and .dtpreldword directives. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D28770 llvm-svn: 292624
* [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".Wei Mi2017-01-201-0/+171
| | | | | | | | | | | | | | | The recommit fixes a bug related with live interval update after the partial redundent copy is moved. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 llvm-svn: 292621
* Revert "LiveRegUnits: Add accumulateBackward() function"Matthias Braun2017-01-201-24/+0
| | | | | | | | This seems to be breaking some bots. This reverts commit r292543. llvm-svn: 292574
* [AArch64][GlobalISel] Widen scalar int->fp conversions.Ahmed Bougacha2017-01-201-0/+22
| | | | | | | It's incorrect to ignore the higher bits of the integer source. Teach the legalizer how to widen it. llvm-svn: 292563
* [AMDGPU] Prevent spills before exec mask is restoredStanislav Mekhanoshin2017-01-201-2/+8
| | | | | | | | | | | | | Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554
* GlobalISel: Add a note about how we're being a bit loose with memory operandsJustin Bogner2017-01-201-0/+4
| | | | | | | The logic in r292461 is conservatively correct, but we should revisit this later. Add a TODO so we don't forget. llvm-svn: 292553
* [MIRParser] Allow generic register specification on operand.Ahmed Bougacha2017-01-201-12/+16
| | | | | | | | This completes r292321 by adding support for generic registers, e.g.: %2:_(s32) = G_ADD %0, %1 llvm-svn: 292550
* GlobalISel: Only set FailedISel on dropped dbg intrinsics when using fallbackJustin Bogner2017-01-201-2/+2
| | | | | | | | | It's easier to test the non-fallback path if we just drop these intrinsics for now, like we did before we added the fallback path. We'll obviously need to fix this properly, but the fixme for that is already here. llvm-svn: 292547
* GlobalISel: Pass the MachineFunction in to reportSelectionError directlyJustin Bogner2017-01-201-6/+6
| | | | | | | | Rather than trying to find MF based on the possibly-null MI we've passed in here, just pass it in directly. It's already available at all callers anyway. llvm-svn: 292544
* LiveRegUnits: Add accumulateBackward() functionMatthias Braun2017-01-201-0/+24
| | | | | | | | | | | | | | | | This function can be used to accumulate the set of all read and modified register in a sequence of instructions. Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove the concept. - The AArch64A57LoadBalancing code is using a backwards analysis now which is irrespective of kill flags. This is the main motivation for this change. Differential Revision: http://reviews.llvm.org/D22082 llvm-svn: 292543
* CodeGen: Add/Factor out LiveRegUnits class; NFCIMatthias Braun2017-01-203-59/+106
| | | | | | | | | | | | | This is a set of register units intended to track register liveness, it is similar in spirit to LivePhysRegs. You can also think of this as the liveness tracking parts of the RegisterScavenger factored out into an own class. This was proposed in http://llvm.org/PR27609 Differential Revision: http://reviews.llvm.org/D21916 llvm-svn: 292542
* AArch64: fall back to DAG ISel for inline assembly.Tim Northover2017-01-191-0/+3
| | | | | | | We can't currently handle "calls" to inlineasm strings so it's better to let the DAG handle it than generate rubbish. llvm-svn: 292540
* [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)Simon Pilgrim2017-01-191-3/+33
| | | | | | | | | | | | This patch improves the knownbits logic for unsigned integer min/max opcodes. For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits. This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set. Differential Revision: https://reviews.llvm.org/D28853 llvm-svn: 292528
* [DAG] Don't increase SDNodeOrder for dbg.value/declare.Mikael Holmen2017-01-192-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The SDNodeOrder is saved in the IROrder field in the SDNode, and this field may affects scheduling. Thus, letting dbg.value/declare increase the order numbers may in turn affect scheduling. Because of this change we also need to update the code deciding when dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues. Dbg values now have the same order as the SDNode they are connected to, not the following orders. Test cases provided by Florian Hahn. Reviewers: bogner, aprantl, sunfish, atrick Reviewed By: atrick Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D25318 llvm-svn: 292485
* Re-commit: [globalisel] Tablegen-erate current Register Bank InformationDaniel Sanders2017-01-191-3/+4
| | | | | | | | | | | | | | | | | | | | | Summary: Adds a RegisterBank tablegen class that can be used to declare the register banks and an associated tablegen pass to generate the necessary code. Changes since first commit attempt: * Added missing guards * Added more missing guards * Found and fixed a use-after-free bug involving Twine locals Reviewers: t.p.northover, ab, rovka, qcolombet Reviewed By: qcolombet Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27338 llvm-svn: 292478
* GlobalISel: Implement widening for shiftsJustin Bogner2017-01-191-5/+9
| | | | llvm-svn: 292476
* GlobalISel: Implement narrowing for G_LOADJustin Bogner2017-01-191-0/+26
| | | | llvm-svn: 292461
* GlobalISel: Fix text wrapping in a comment. NFCJustin Bogner2017-01-191-2/+1
| | | | llvm-svn: 292460
* Add -debug-info-for-profiling to emit more debug info for sample pgo profile ↵Dehao Chen2017-01-193-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | collection Summary: SamplePGO binaries built with -gmlt to collect profile. The current -gmlt debug info is limited, and we need some additional info: * start line of all subprograms * linkage name of all subprograms * standalone subprograms (functions that has neither inlined nor been inlined) This patch adds these information to the -gmlt binary. The impact on speccpu2006 binary size (size increase comparing with -g0 binary, also includes data for -g binary, which does not change with this patch): -gmlt(orig) -gmlt(patched) -g 433.milc 4.68% 5.40% 19.73% 444.namd 8.45% 8.93% 45.99% 447.dealII 97.43% 115.21% 374.89% 450.soplex 27.75% 31.88% 126.04% 453.povray 21.81% 26.16% 92.03% 470.lbm 0.60% 0.67% 1.96% 482.sphinx3 5.77% 6.47% 26.17% 400.perlbench 17.81% 19.43% 73.08% 401.bzip2 3.73% 3.92% 12.18% 403.gcc 31.75% 34.48% 122.75% 429.mcf 0.78% 0.88% 3.89% 445.gobmk 6.08% 7.92% 42.27% 456.hmmer 10.36% 11.25% 35.23% 458.sjeng 5.08% 5.42% 14.36% 462.libquantum 1.71% 1.96% 6.36% 464.h264ref 15.61% 16.56% 43.92% 471.omnetpp 11.93% 15.84% 60.09% 473.astar 3.11% 3.69% 14.18% 483.xalancbmk 56.29% 81.63% 353.22% geomean 15.60% 18.30% 57.81% Debug info size change for -gmlt binary with this patch: 433.milc 13.46% 444.namd 5.35% 447.dealII 18.21% 450.soplex 14.68% 453.povray 19.65% 470.lbm 6.03% 482.sphinx3 11.21% 400.perlbench 8.91% 401.bzip2 4.41% 403.gcc 8.56% 429.mcf 8.24% 445.gobmk 29.47% 456.hmmer 8.19% 458.sjeng 6.05% 462.libquantum 11.23% 464.h264ref 5.93% 471.omnetpp 31.89% 473.astar 16.20% 483.xalancbmk 44.62% geomean 16.83% Reviewers: davidxl, echristo, dblaikie Reviewed By: echristo, dblaikie Subscribers: aprantl, probinson, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D25434 llvm-svn: 292457
* LiveIntervalAnalysis: Cleanup; NFCMatthias Braun2017-01-191-81/+64
| | | | | | | | - Fix doxygen comments: Do not repeat name, remove duplicated doxygen comment (on declaration + implementation), etc. - Use more range based for llvm-svn: 292455
* Treat segment [B, E) as not overlapping block with boundaries [A, B)Krzysztof Parzyszek2017-01-181-1/+6
| | | | llvm-svn: 292446
OpenPOWER on IntegriCloud