summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86][AVX512] Tag VPMADD52/VPSADBW instruction scheduler classesSimon Pilgrim2017-12-051-22/+25
| | | | llvm-svn: 319772
* [DAGCombine] Handle big endian correctly in CombineConsecutiveLoadsBjorn Pettersson2017-12-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Found out, at code inspection, that there was a fault in DAGCombiner::CombineConsecutiveLoads for big-endian targets. A BUILD_PAIR is always having the least significant bits of the composite value in element 0. So when we are doing the checks for consecutive loads, for big endian targets, we should check if the load to elt 1 is at the lower address and the load to elt 0 is at the higher address. Normally this bug only resulted in missed oppurtunities for doing the load combine. I guess that in some rare situation it could lead to faulty combines, but I've not seen that happen. Note that this patch actually will trigger load combine for some big endian regression tests. One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get t76: i64,ch = load<LD8[FixedStack-9] instead of t37: i32,ch = load<LD4[FixedStack-10]> t35: i32,ch = load<LD4[FixedStack-9]> t41: i64 = build_pair t37, t35 before legalization. Then the legalization will split the LD8 into two loads, so the end result is the same. That should verify that the transfomation is correct now. Reviewers: niravd, hfinkel Reviewed By: niravd Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D40444 llvm-svn: 319771
* [X86][AVX512] Add missing scalar CMPSS/CMPSD logic scheduler classesSimon Pilgrim2017-12-051-16/+21
| | | | llvm-svn: 319770
* Bail out of a SimplifyCFG switch table opt at undef values.Mikael Holmen2017-12-051-1/+1
| | | | | | | | | | | | | | | | | | | Summary: A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef. The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization. Patch by JesperAntonsson. Reviewers: spatel, eeckstein, hans Reviewed By: hans Subscribers: uabelho, llvm-commits Differential Revision: https://reviews.llvm.org/D40639 llvm-svn: 319768
* [X86][AVX512] Cleanup bit logic scheduler classesSimon Pilgrim2017-12-051-21/+24
| | | | llvm-svn: 319767
* [DAGCombine] isLegalNarrowLoad function (NFC)Sam Parker2017-12-051-42/+60
| | | | | | | | | Pull the checks upon the load out from ReduceLoadWidth into their own function. Differential Revision: https://reviews.llvm.org/D40833 llvm-svn: 319766
* [X86][AVX512] Tag scalar CVT and CMP instruction scheduler classesSimon Pilgrim2017-12-052-130/+150
| | | | llvm-svn: 319765
* [InstCombine] Don't crash on out of bounds shiftsIgor Laevsky2017-12-051-13/+17
| | | | | | Differential Revision: https://reviews.llvm.org/D40649 llvm-svn: 319761
* [X86][AVX512] Tag VPCMP/VPCMPU instruction scheduler classesSimon Pilgrim2017-12-051-42/+60
| | | | | | Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now. llvm-svn: 319760
* [X86][AVX512] Cleanup VPCMP scheduler classesSimon Pilgrim2017-12-051-27/+30
| | | | | | Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now. llvm-svn: 319758
* [X86][AVX512] Tag VFIXUPIMM instructions scheduler classesSimon Pilgrim2017-12-051-23/+30
| | | | llvm-svn: 319757
* [SystemZ] set 'guessInstructionProperties = 0' and set flags as needed.Jonas Paulsson2017-12-056-123/+131
| | | | | | | | | | | | | | | | | | | | This has proven a healthy exercise, as many cases of incorrect instruction flags were corrected in the process. As part of this, IntrWriteMem was added to several SystemZ instrinsics. Furthermore, a bug was exposed in TwoAddress with this change (as incorrect hasSideEffects flags were removed and instructions could now be sunk), and the test case for that bugfix (r319646) is included here as test/CodeGen/SystemZ/twoaddr-sink.ll. One temporary test regression (one extra copy) which will hopefully go away in upcoming patches for similar cases: test/CodeGen/SystemZ/vec-trunc-to-i1.ll Review: Ulrich Weigand. https://reviews.llvm.org/D40437 llvm-svn: 319756
* [Regalloc] Generate and store multiple regalloc hints.Jonas Paulsson2017-12-053-54/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MachineRegisterInfo used to allow just one regalloc hint per virtual register. This patch extends this to a vector of regalloc hints, which is filled in by common code with sorted copy hints. Such hints will make for more ID copies that can be removed. NB! This improvement is currently (and hopefully temporarily) *disabled* by default, except for SystemZ. The only reason for this is the big impact this has on tests, which has unfortunately proven unmanageable. It was a long while since all the tests were updated and just waiting for review (which didn't happen), but now targets have to enable this themselves instead. Several targets could get a head-start by downloading the tests updates from the Phabricator review. Thanks to those who helped, and sorry you now have to do this step yourselves. This should be an improvement generally for any target! The target may still create its own hint, in which case this has highest priority and is stored first in the vector. If it has target-type, it will not be recomputed, as per the previous behaviour. The temporary hook enableMultipleCopyHints() will be removed as soon as all targets return true. Review: Quentin Colombet, Ulrich Weigand. https://reviews.llvm.org/D38128 llvm-svn: 319754
* Re-commit "[cmake] Enable zlib support on windows"Pavel Labath2017-12-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | This recommits r319533 which was broken llvm-config --system-libs output. The reason was that I used find_libraries for searching for the z library. This returns absolute paths, and when these paths made it into llvm-config, it made it produce nonsensical flags. To fix this, I hand-roll a search for the library in the same way that we search for the terminfo library a couple of lines below. This is a bit less flexible than the find_library option, as it does not allow the user to specify the path to the library at configure time (which is important on windows, as zlib is unlikely to be found in any of the standard places cmake searches), but I was able to guide the build to find it with appropriate values of LIB and INCLUDE environment variables. Reviewers: compnerd, rnk, beanz, rafael Subscribers: llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D40779 llvm-svn: 319751
* [Support/TarWriter] - Don't allow TarWriter to add the same file more than once.George Rimar2017-12-051-0/+4
| | | | | | | | | | | | | | This is for PR35460. Currently when LLD adds files to TarWriter it may pass the same file multiple times. For example it happens for clang reproduce file which specifies archive (.a) files more than once in command line. Patch makes TarWriter to ignore files with the same path, so it will add only the first one to archive. Differential revision: https://reviews.llvm.org/D40606 llvm-svn: 319750
* [X86] Fix a bug in handling GRXX subclasses in Domain Reassignment passGuy Blank2017-12-051-4/+4
| | | | | | | | | | | When trying to determine the correct Mask register class corresponding to a GPR register class, not all register classes were handled. This caused an assertion to be raised on some scenarios. Differential Revision: https://reviews.llvm.org/D40290 llvm-svn: 319745
* [SelectionDAG] Use WidenTargetBoolean in WidenVecRes_MLOAD and ↵Craig Topper2017-12-051-29/+2
| | | | | | | | WidenVecOp_MSTORE instead of implementing it manually and incorrectly. The CONCAT_VECTORS operand get its type from getSetCCResultType, but if the mask type and the setcc have different scalar sizes this creates an illegal CONCAT_VECTORS operation. The concat type should be 2x the mask type, and then an extend should be added if needed. llvm-svn: 319744
* [X86] Use vector widening to support sign extend from i1 when the dest type ↵Craig Topper2017-12-051-20/+31
| | | | | | | | | | is not 512-bits and vlx is not enabled. Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements. If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate. llvm-svn: 319740
* Revert r319691: [globalisel][tablegen] Split atomic load/store into separate ↵Daniel Sanders2017-12-056-61/+17
| | | | | | | | opcode and enable for AArch64. Some concerns were raised with the direction. Revert while we discuss it and look into an alternative llvm-svn: 319739
* [X86] Fix a crash if avx512bw and xop are both enabled when the IR contrains ↵Craig Topper2017-12-051-2/+3
| | | | | | a v32i8 bitreverse. llvm-svn: 319737
* AMDGPU: Fix missing subtarget feature initializerMatt Arsenault2017-12-051-0/+1
| | | | llvm-svn: 319733
* AMDGPU: Fix crash when scheduling DBG_VALUEMatt Arsenault2017-12-051-1/+5
| | | | | | | | | | | | | | This calls handleMove with a DBG_VALUE instruction, which isn't tracked by LiveIntervals. I'm not sure this is the correct place to fix this. The generic scheduler seems to have more deliberate region selection that skips dbg_value. The test is also really hard to reduce. I haven't been able to figure out what exactly causes this particular case to try moving the dbg_value. llvm-svn: 319732
* [X86] Use vector widening to support zero extend from i1 when the dest type ↵Craig Topper2017-12-051-9/+31
| | | | | | | | | | is not 512-bits and vlx is not enabled. Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements. If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate. llvm-svn: 319728
* [X86] Don't use kunpck for vXi1 concat_vectors if the upper bits are undef.Craig Topper2017-12-051-3/+5
| | | | | | This can be efficiently selected by a COPY_TO_REGCLASS without the need for an extra instruction. llvm-svn: 319726
* [X86] Use getZeroVector and remove an unnecessary creation of an APInt ↵Craig Topper2017-12-051-4/+2
| | | | | | | | | | | | before calling getConstant. NFCI The getConstant function can take care of creating the APInt internally. getZeroVector will take care of using the correct type for the build vector to avoid re-lowering. The test change here is because execution domain constraints apparently pass through undef inputs of a zeroing xor. So the different ordering of register allocation here caused the dependency to change. llvm-svn: 319725
* [X86] Rearrange some of the code around AVX512 sign/zero extends. NFCICraig Topper2017-12-051-12/+12
| | | | | | | | Move the AVX512 code out of LowerAVXExtend. LowerAVXExtend has two callers but one of them pre-checks for AVX-512 so the code is only live from the other caller. So move the AVX-512 checks up to that caller for symmetry. Move all of the i1 input type code in Lower_AVX512ZeroExend together. llvm-svn: 319724
* MachineFrameInfo: Cleanup some parameter naming inconsistencies; NFCMatthias Braun2017-12-051-17/+19
| | | | | | | Consistently use the same parameter names as the names of the affected fields. This avoids some unintuitive abbreviations like `isSS`. llvm-svn: 319722
* TwoAddressInstructionPass: Trigger -O0 behavior on optnoneMatthias Braun2017-12-051-0/+4
| | | | | | | | | While we cannot skip the whole TwoAddressInstructionPass even for -O0 there are some parts of the pass that are currently skipped at -O0 but not for optnone. Changing this as there is no reason to have those two hit different code paths here. llvm-svn: 319721
* AMDGPU/EG: Add a new FeatureFMA and use it to selectively enable FMA instructionJan Vesely2017-12-046-3/+23
| | | | | | | | | Only used by pre-GCN targets v2: fix predicate setting for FMA_Common Differential Revision: https://reviews.llvm.org/D40692 llvm-svn: 319712
* AMDGPU: Disable fp64 support on pre GCN asicsJan Vesely2017-12-043-14/+19
| | | | | | | | | | | It's not implemented. Passing +fp64-fp16-denormal feature enables fp64 even on asics that don't support it v2: fix hasFP64 query Differential Revision: https://reviews.llvm.org/D39931 llvm-svn: 319709
* [msan] Add a fixme note for a minor deficiency.Evgeniy Stepanov2017-12-041-0/+2
| | | | llvm-svn: 319708
* Revert r319490 "XOR the frame pointer with the stack cookie when protecting ↵Hans Wennborg2017-12-046-55/+5
| | | | | | | | | | | | | | | | | | the stack" This broke the Chromium build (crbug.com/791714). Reverting while investigating. > Summary: This strengthens the guard and matches MSVC. > > Reviewers: hans, etienneb > > Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits > > Differential Revision: https://reviews.llvm.org/D40622 > > git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319490 91177308-0d34-0410-b5e6-96231b3b80d8 llvm-svn: 319706
* AMDGPU: Fix creating invalid copy when adjusting dmaskMatt Arsenault2017-12-046-61/+129
| | | | | | | | | Move the entire optimization to one place. Before it was possible to adjust dmask without changing the register class of the output instruction, since they were done in separate places. Fix all lane sizes and move all of the optimization into the DAG folding. llvm-svn: 319705
* AMDGPU: Use return value of MorphNodeToMatt Arsenault2017-12-041-3/+1
| | | | llvm-svn: 319704
* Re-submit r289925 (Update .debug_line section version to match DWARF version)Paul Robinson2017-12-041-2/+12
| | | | | | | | | | | | | | | Set the .debug_line version to match the requested DWARF version, except with a maximum of v4 because we don't support v5 yet. Previously Chromium had issues with this patch; see PR31407. Chromium tool issues have been addressed, so hopefully this will go through this time. Patch by Katya Romanova! Differential Revision: https://reviews.llvm.org/D38002 llvm-svn: 319699
* DAG: Follow-up to r319692 check the truncates inputs have the same typeHans Wennborg2017-12-041-1/+2
| | | | | | | | | MatchRotate assumes the types of the types of LHS and RHS are equal, which is always the case then they come from an OR node, but here we're getting them from two different TRUNC nodes, so we have to check the types. llvm-svn: 319695
* DAG: Match truncated rotation (PR35487)Hans Wennborg2017-12-041-0/+9
| | | | | | | | | If the truncation has been pushed past the or-node, look through it and truncate afterwards. Differential revision: https://reviews.llvm.org/D40792 llvm-svn: 319692
* [globalisel][tablegen] Split atomic load/store into separate opcode and ↵Daniel Sanders2017-12-046-17/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enable for AArch64. This patch splits atomics out of the generic G_LOAD/G_STORE and into their own G_ATOMIC_LOAD/G_ATOMIC_STORE. This is a pragmatic decision rather than a necessary one. Atomic load/store has little in implementation in common with non-atomic load/store. They tend to be handled very differently throughout the backend. It also has the nice side-effect of slightly improving the common-case performance at ISel since there's no longer a need for an atomicity check in the matcher table. All targets have been updated to remove the atomic load/store check from the G_LOAD/G_STORE path. AArch64 has also been updated to mark G_ATOMIC_LOAD/G_ATOMIC_STORE legal. There is one issue with this patch though which also affects the extending loads and truncating stores. The rules only match when an appropriate G_ANYEXT is present in the MIR. For example, (G_ATOMIC_STORE (G_TRUNC:s16 (G_ANYEXT:s32 (G_ATOMIC_LOAD:s16 X)))) will match but: (G_ATOMIC_STORE (G_ATOMIC_LOAD:s16 X)) will not. This shouldn't be a problem at the moment, but as we get better at eliminating extends/truncates we'll likely start failing to match in some cases. The current plan is to fix this in a patch that changes the representation of extending-load/truncating-store to allow the MMO to describe a different type to the operation. llvm-svn: 319691
* Move splitIndirectCriticalEdges() to BasicBlockUtils.h.Hiroshi Yamauchi2017-12-042-159/+140
| | | | | | | | | | | | | | | | Summary: Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so that it can be called from other places. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40750 llvm-svn: 319689
* [ConstantFold] Support vector index when factoring out GEP index into ↵Haicheng Wu2017-12-041-26/+71
| | | | | | | | | | | preceding dimensions Follow-up of r316824. This patch supports the vector type for both current and previous index when factoring out the current one into the previous one. Differential Revision: https://reviews.llvm.org/D39556 llvm-svn: 319683
* [SCEV] Use a "Discovered" set instead of a "Visited" set; NFCSanjoy Das2017-12-041-4/+3
| | | | | | Suggested by Max Kazantsev in https://reviews.llvm.org/D39361 llvm-svn: 319679
* [SCEV] A different fix for PR33494Sanjoy Das2017-12-041-29/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I don't think rL309080 is the right fix for PR33494 -- caching ExitLimit only hides the problem[0]. The real issue is that because of how we forget SCEV expressions ScalarEvolution::getBackedgeTakenInfo, in the test case for PR33494 computing the backedge for any loop invalidates the trip count for every other loop. This effectively makes the SCEV cache useless. I've instead made the SCEV expression invalidation in ScalarEvolution::getBackedgeTakenInfo less aggressive to fix this issue. [0]: One way to think about this is that rL309080 essentially augmented the backedge-taken-count cache with another equivalent exit-limit cache. The bug went away because we were explicitly not clearing the exit-limit cache in getBackedgeTakenInfo. But instead of doing all of that, we can just avoid clearing the backedge-taken-count cache. Reviewers: mkazantsev, mzolotukhin Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D39361 llvm-svn: 319678
* [BypassSlowDivision] Improve our handling of divisions by constantsSanjoy Das2017-12-041-7/+13
| | | | | | | | | | | | | | | | | | | (This reapplies r314253. r314253 was reverted on r314482 because of a correctness regression on P100, but that regression was identified to be something else.) Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 319677
* MachineVerifier: undef phi arg doesn't need to be live-out from predecessorMatthias Braun2017-12-041-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D40756 llvm-svn: 319674
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-0479-539/+568
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* Fix function pointer tail calls in armv8-M.basePablo Barrio2017-12-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The compiler fails with the following error message: fatal error: error in backend: ran out of registers during register allocation Tail call optimization for Armv8-M.base fails to meet all the required constraints when handling calls to function pointers where the arguments take up r0-r3. This is because the pointer to the function to be called can only be stored in r0-r3, but these are all occupied by arguments. This patch makes sure that tail call optimization does not try to handle this type of calls. Reviewers: chill, MatzeB, olista01, rengolin, efriedma Reviewed By: olista01, efriedma Subscribers: efriedma, aemerson, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D40706 llvm-svn: 319664
* Revert "[cmake] Enable zlib support on windows"Pavel Labath2017-12-041-3/+3
| | | | | | | | | | | | This reverts commit r319533 as it broke llvm-config --system-libs output and everything that depends on it (which is mostly out of tree or downstream folks, but includes a couple of llvm buildbots as well). I think I have a fix for this in D40779, but I want someone to look review it first. In the mean time, I am reverting this change, as it seems to break a lot of people. llvm-svn: 319663
* [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.Sam Kolton2017-12-042-285/+534
| | | | | | | | | | | | Summary: Reviewers: arsenm, vpykhtin, rampitec Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D37817 llvm-svn: 319662
* [Loop Predication] Teach LP about reverse loopsAnna Thomas2017-12-041-58/+135
| | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, we only support predication for forward loops with step of 1. This patch enables loop predication for reverse or countdownLoops, which satisfy the following conditions: 1. The step of the IV is -1. 2. The loop has a singe latch as B(X) = X <pred> latchLimit with pred as s> or u> 3. The IV of the guard is the decrement IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit). This patch was downstream for a while and is the last series of patches that's from our LP implementation downstream. Reviewers: apilipenko, mkazantsev, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40353 llvm-svn: 319659
* [NVPTX] Assign valid global namesJonas Hahnfeld2017-12-042-2/+19
| | | | | | | | | | | | | | | | | PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The existing pass already ensured this for globals and this patch adds the cleanup for functions with local linkage. However, there was a different problem in the case of collisions of the adjusted name: The ValueSymbolTable then automatically appended ".N" with increasing Ns to get a unique name while helping the ABI demangling. Special case this behavior to omit the dots and append N directly. This will always give us legal names according to the PTX requirements. Differential Revision: https://reviews.llvm.org/D40573 llvm-svn: 319657
OpenPOWER on IntegriCloud