summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero.Craig Topper2018-01-085-24/+22
| | | | | | CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985
* [X86] Add patterns to allow 512-bit BWI compare instructions to be used for ↵Craig Topper2018-01-082-6/+27
| | | | | | 128/256-bit compares when VLX is not available. llvm-svn: 321984
* [X86] Simplify some code in lower1BitVectorShuffle by relying on getNode's ↵Craig Topper2018-01-071-15/+2
| | | | | | ability to constant fold vector SIGN_EXTEND. llvm-svn: 321979
* [X86] Add VSHUFF32X4 and similar instructions to load folding tables.Craig Topper2018-01-071-0/+24
| | | | llvm-svn: 321978
* Revert "[SCCP] Manually fold branches on undef."Davide Italiano2018-01-071-26/+3
| | | | | | | I thought this was responsible for PR35723, but I was wrong, the issue lies elsewhere. Revert while I debug. llvm-svn: 321975
* [SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()).Davide Italiano2018-01-071-9/+23
| | | | | | | | The approach was never discussed, I wasn't able to reproduce this non-determinism, and the original author went AWOL. After a discussion on the ML, Philip suggested to revert this. llvm-svn: 321974
* [X86] Revert accidental change to CMakeLists.txt in r321952Craig Topper2018-01-071-1/+3
| | | | | | I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change. llvm-svn: 321971
* [DAG] Fix for Bug PR34620 - Allow SimplifyDemandedBits to look through bitcastsSimon Pilgrim2018-01-071-0/+6
| | | | | | | | | | Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand. Committed on the behalf of @sameconrad (Sam Conrad) Differential Revision: https://reviews.llvm.org/D41643 llvm-svn: 321969
* [X86] Remove unneeded code from combineGatherScatter that used to delte ↵Craig Topper2018-01-071-11/+1
| | | | | | | | SIGN_EXTEND_INREG nodes created during legalization of v2i1/v4i1 masks on KNL. v2i1/v4i1 are now legal on KNL so no sign_extend_inreg is generated. llvm-svn: 321968
* [X86] Make v2i1 and v4i1 legal types without VLXCraig Topper2018-01-075-132/+132
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
* [LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of ↵Hal Finkel2018-01-074-271/+269
| | | | | | | | | | | | | | | | | | | | | | | | | LoopVectorize.cpp Another small step forward to move VPlan stuff outside of LoopVectorize.cpp. VPlanBuilder.h is renamed to LoopVectorizationPlanner.h LoopVectorizationPlanner class is moved from LoopVectorize.cpp to LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor class is moved to LoopVectorizationPlanner.h (used by the planner class) --- this needs further streamlining work in later patches and thus all I did was take it out of the CostModel class and moved to the header file. The callback function had to stay inside LoopVectorize.cpp since it calls an InnerLoopVectorizer member function declared in it. Next Steps: Make InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular and more aligned with VPlan direction, in small increments. Previous step was: r320900 (https://reviews.llvm.org/D41045) Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41420 llvm-svn: 321962
* [CodeExtractor] Use subset of function attributes for extracted function.Florian Hahn2018-01-071-4/+74
| | | | | | | | | | | | | | | | | | | | | | | | In addition to target-dependent attributes, we can also preserve a white-listed subset of target independent function attributes. The white-list excludes problematic attributes, most prominently: * attributes related to memory accesses, as alloca instructions could be moved in/out of the extracted block * control-flow dependent attributes, like no_return or thunk, as the relerelevant instructions might or might not get extracted. Thanks @efriedma and @aemerson for providing a set of attributes that cannot be propagated. Reviewers: efriedma, davidxl, davide, silvas Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41334 llvm-svn: 321961
* [PowerPC] Add an ISD::TRUNCATE to the legalization for ↵Craig Topper2018-01-072-14/+3
| | | | | | | | | | | | | | | | | | | | | ppc_is_decremented_ctr_nonzero Summary: I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work. There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed. With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: nemanjai, llvm-commits, kbarton Differential Revision: https://reviews.llvm.org/D41654 llvm-svn: 321959
* [X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables.Craig Topper2018-01-071-0/+3
| | | | llvm-svn: 321958
* [X86] Correct the load folding flags for xmm fp->mmx conversion instructions.Craig Topper2018-01-071-4/+4
| | | | | | The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment. llvm-svn: 321956
* [X86] Add TB_NO_REVERSE to some scalar intrinsic instructions in the load ↵Craig Topper2018-01-071-4/+4
| | | | | | folding table. llvm-svn: 321955
* [X86] Don't put any EVEX_B instructions in the tablegen generated load ↵Craig Topper2018-01-071-0/+1
| | | | | | | | folding tables. EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent. llvm-svn: 321954
* [X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables.Craig Topper2018-01-071-0/+30
| | | | llvm-svn: 321953
* [X86] Add some 8 and 16-bit instructions to the load folding tables.Craig Topper2018-01-072-3/+7
| | | | llvm-svn: 321952
* [X86] Add EVEX vcvtph2ps to the load folding tables.Craig Topper2018-01-071-1/+4
| | | | llvm-svn: 321951
* [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex ↵Craig Topper2018-01-071-2/+3
| | | | | | | | versions of cvtps2ph to the store folding tables. The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros. llvm-svn: 321950
* [X86] Add CMP8ri8 to load folding tables.Craig Topper2018-01-071-0/+1
| | | | llvm-svn: 321949
* [X86] Remove assembler predicates from all AVX512 related feature flags.Craig Topper2018-01-061-26/+13
| | | | | | | | We don't do fine grained feature control like this on features prior to AVX512. We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough. llvm-svn: 321947
* [X86] Remove memory forms of EVEX encoded vcvttss2si/vcvttsd2si from asm ↵Craig Topper2018-01-061-7/+20
| | | | | | | | matcher table. This is also needed to fix PR35837. llvm-svn: 321946
* [X86] Add load folding pattern to EVEX vcvttss2si/vcvtsd2si.Craig Topper2018-01-061-6/+7
| | | | llvm-svn: 321945
* [X86] Remove an unnecessary VCVTTSD2SIrrb/VCVTSS2SIrrb instruction with no ↵Craig Topper2018-01-061-27/+23
| | | | | | | | isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead. For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well. llvm-svn: 321944
* [InlineFunction] Preserve calling convention when forwarding VarArgs.Florian Hahn2018-01-061-0/+1
| | | | | | | | | | Reviewers: efriedma, rnk, davide Reviewed By: rnk, davide Differential Revision: https://reviews.llvm.org/D41556 llvm-svn: 321943
* [InlineFunction] Preserve attributes when forwarding VarArgs.Florian Hahn2018-01-061-4/+23
| | | | | | | | | | Reviewers: rnk, efriedma Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D41555 llvm-svn: 321942
* [ORC] Remove AsynchronousSymbolQuery while I debug an issue on one of theLang Hames2018-01-062-85/+0
| | | | | | builders. llvm-svn: 321941
* [InlineFunction] Inline vararg functions that do not access varargs.Florian Hahn2018-01-062-19/+27
| | | | | | | | | | | | | If the varargs are not accessed by a function, we can inline the function. Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41335 llvm-svn: 321940
* [X86] Remove memory forms of EVEX encoded vcvtsd2si/vcvtss2si from the ↵Craig Topper2018-01-061-5/+17
| | | | | | | | | | assembler matcher table We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version. Fixes PR35837. llvm-svn: 321939
* [InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b)Sanjay Patel2018-01-061-2/+2
| | | | | | | | | | | In the minimal case, this won't remove instructions, but it still improves uses of existing values. In the motivating example from PR35834, it does remove instructions, and sets that case up to be optimized by something like D41603: https://reviews.llvm.org/D41603 llvm-svn: 321936
* [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)Sanjay Patel2018-01-062-6/+7
| | | | | | | | | | | | | | | | This is the last step needed to fix PR33325: https://bugs.llvm.org/show_bug.cgi?id=33325 We're trading branch and compares for loads and logic ops. This makes the code smaller and hopefully faster in most cases. The 24-byte test shows an interesting construct: we load the trailing scalar elements into vector registers and generate the same pcmpeq+movmsk code that we expected for a pair of full vector elements (see the 32- and 64-byte tests). Differential Revision: https://reviews.llvm.org/D41714 llvm-svn: 321934
* [X86] Rename the EVEX encoded GFNI instructions to start with a 'V'. NFCCraig Topper2018-01-061-8/+8
| | | | | | This makes the names consistent with the mnemonics like every other instruction. llvm-svn: 321931
* [X86] When parsing rounding mode operands, provide a proper end location so ↵Craig Topper2018-01-061-5/+6
| | | | | | we don't crash when trying to print an error message using it. llvm-svn: 321930
* [X86] Call lowerShuffleAsRepeatedMaskAndLanePermute from ↵Craig Topper2018-01-061-0/+6
| | | | | | lowerV4I64VectorShuffle. llvm-svn: 321929
* [ORC] Yet more debugging output to diagnose test failures.Lang Hames2018-01-061-1/+4
| | | | llvm-svn: 321927
* [ORC] Temporarily adding some redundant asserts / debug output to aid inLang Hames2018-01-061-0/+14
| | | | | | debugging a tester failure. llvm-svn: 321920
* [Utils] Simplify salvageDebugInfo, NFCIVedant Kumar2018-01-051-34/+30
| | | | | | | | | Having a single call to findDbgUsers() allows salvageDebugInfo() to return earlier. Differential Revision: https://reviews.llvm.org/D41787 llvm-svn: 321915
* [X86] Add vcvtsd2sil/vcvtsd2siq etc. InstAliases to the EVEX-encoded ↵Craig Topper2018-01-051-10/+18
| | | | | | | | instructions. This matches their VEX equivalents. llvm-svn: 321912
* Re-land "Fix faulty assertion in debug info"Adrian McCarthy2018-01-052-2/+4
| | | | | | | | | | | | This had been reverted because the new test failed on non-X86 bots. I moved the new test to the appropriate subdirectory to correct this. Differential Revision: https://reviews.llvm.org/D41264 Original submission: r321122 (which was reverted by r321125) This reverts commit 3c1639b5703c387a0d8cba2862803b4e68dff436. llvm-svn: 321911
* [ORC] Re-apply just the AsynchronousSymbolLookup class from r321838 while ILang Hames2018-01-053-2/+89
| | | | | | investigate builder / test failures. llvm-svn: 321910
* [Hexagon] Even simpler patterns for sign- and zero-extending HVX vectorsKrzysztof Parzyszek2018-01-051-16/+4
| | | | | | Recommit r321897 with updated testcases. llvm-svn: 321908
* [DebugInfo] Align comments in debug_loc sectionBjorn Pettersson2018-01-051-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit updates the BufferByteStreamer, used by DebugLocStream to buffer bytes/comments to put in the debug_loc section, to make sure that the Buffer and Comments vectors are synced. Previously, when an SLEB128 or ULEB128 was emitted together with a comment, the vectors could be out-of-sync if the LEB encoding added several entries to the Buffer vectors, while we only added a single entry to the Comments vector. The goal with this is to get the comments in the debug_loc section in the .s file correctly aligned. Example (using ARM as target): Instead of .byte 144 @ sub-register DW_OP_regx .byte 128 @ 256 .byte 2 @ DW_OP_piece .byte 147 @ 8 .byte 8 @ sub-register DW_OP_regx .byte 144 @ 257 .byte 129 @ DW_OP_piece .byte 2 @ 8 .byte 147 @ .byte 8 @ we now get .byte 144 @ sub-register DW_OP_regx .byte 128 @ 256 .byte 2 @ .byte 147 @ DW_OP_piece .byte 8 @ 8 .byte 144 @ sub-register DW_OP_regx .byte 129 @ 257 .byte 2 @ .byte 147 @ DW_OP_piece .byte 8 @ 8 Reviewers: JDevlieghere, rnk, aprantl Reviewed By: aprantl Subscribers: davide, Ka-Ka, uabelho, aemerson, javed.absar, kristof.beyls, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D41763 llvm-svn: 321907
* Revert r321894: it requires a part of another commit that is not ready yetKrzysztof Parzyszek2018-01-051-19/+0
| | | | | | | Commit message: [Hexagon] Add patterns for sext_inreg of HVX vector types llvm-svn: 321904
* [X86] Add InstAliases for 'vmovd' with GR64 registers to select EVEX encoded ↵Craig Topper2018-01-051-0/+6
| | | | | | | | | | instructions as well. Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16" This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512. llvm-svn: 321903
* Revert r321897: affected testcases were not updatedKrzysztof Parzyszek2018-01-051-4/+16
| | | | | | | Commit message: [Hexagon] Even simpler patterns for sign- and zero-extending HVX vectors llvm-svn: 321902
* dwarfdump: Match the --uuid output with that of Darwin dwarfdump.Adrian Prantl2018-01-051-1/+2
| | | | | | | | This option is widely used by scripts and there is no reason to break them. rdar://problem/36032398 llvm-svn: 321901
* [X86] Stop printing moves between VR64 and GR64 with 'movd' mnemonic. Use ↵Craig Topper2018-01-052-7/+9
| | | | | | | | 'movq' instead. This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well. llvm-svn: 321898
* [Hexagon] Even simpler patterns for sign- and zero-extending HVX vectorsKrzysztof Parzyszek2018-01-051-16/+4
| | | | llvm-svn: 321897
OpenPOWER on IntegriCloud