summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [NFC][PatternMatch] Refactor code into a proper "matcher for any integral ↵Roman Lebedev2019-07-221-18/+1
| | | | | | | | | constant" Having it as a proper matcher is better for reusability elsewhere (in a follow-up patch.) llvm-svn: 366752
* AMDGPU: Don't use SDNodeXForm for DS offset outputMatt Arsenault2019-07-221-12/+12
| | | | | | | | | | | The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743
* Temporarily Revert "[Attributor] Liveness analysis." as it's breaking the build.Eric Christopher2019-07-222-175/+1
| | | | | | This reverts commit 9285295f75a231dc446fa7cbc10a0a391b3434a5. llvm-svn: 366737
* [Attributor] Liveness analysis.Stefan Stipanovic2019-07-222-1/+175
| | | | | | | | | | | | | Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored. Right now we are only looking at noreturn calls. Reviewers: jdoerfert, uenoku Subscribers: hiraditya, llvm-commits Differential revision: https://reviews.llvm.org/D64162 llvm-svn: 366736
* [X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector ↵Craig Topper2019-07-221-5/+4
| | | | | | | | | | | | | | | | directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it. The build_vector will become a constant pool load. By using the desired type initially, it ensures we don't generate a bitcast of the constant pool load which will need to be folded with the load. While experimenting with another patch, I noticed that when the load type and the constant pool type don't match, then SimplifyDemandedBits can't handle it. While we should probably fix that, this was a simple way to fix the issue I saw. llvm-svn: 366732
* [NFC][PowerPC]Change ADDIStocHA to ADDIStocHA8 to follow 64-bit naming ↵Jason Liu2019-07-227-19/+19
| | | | | | | | | | | | | | | convention Summary: Since we are planning to add ADDIStocHA for 32bit in later patch, we decided to change 64bit one first to follow naming convention with 8 behind opcode. Patch by: Xiangling_L Differential Revision: https://reviews.llvm.org/D64814 llvm-svn: 366731
* [Attributor] NoAlias on return values.Stefan Stipanovic2019-07-221-4/+109
| | | | | | | | | | | | | Porting function return value attribute noalias to attributor. This will be followed with a patch for callsite and function argumets. Reviewers: jdoerfert Subscribers: lebedev.ri, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D63067 llvm-svn: 366728
* Stubs out TLOF for AIX and add support for common vars in assembly output.Sean Fertile2019-07-227-6/+112
| | | | | | | | | Stubs out a TargetLoweringObjectFileXCOFF class, implementing only SelectSectionForGlobal for common symbols. Also adds an override of EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors and adds support for emitting common globals. llvm-svn: 366727
* [SafeStack] Insert the deref after the offsetPetr Hosek2019-07-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While debugging code that uses SafeStack, we've noticed that LLVM produces an invalid DWARF. Concretely, in the following example: int main(int argc, char* argv[]) { std::string value = ""; printf("%s\n", value.c_str()); return 0; } DWARF would describe the value variable as being located at: DW_OP_breg14 R14+0, DW_OP_deref, DW_OP_constu 0x20, DW_OP_minus The assembly to get this variable is: leaq -32(%r14), %rbx The order of operations in the DWARF symbols is incorrect in this case. Specifically, the deref is incorrect; this appears to be incorrectly re-inserted in repalceOneDbgValueForAlloca. With this change which inserts the deref after the offset instead of before it, LLVM produces correct DWARF: DW_OP_breg14 R14-32 Differential Revision: https://reviews.llvm.org/D64971 llvm-svn: 366726
* WholeProgramDevirt: Teach the pass to respect the global's alignment.Peter Collingbourne2019-07-221-4/+7
| | | | | | | | | | | | | | | | | | | The bytes inserted before an overaligned global need to be padded according to the alignment set on the original global in order for the initializer to meet the global's alignment requirements. The previous implementation that padded to the pointer width happened to be correct for vtables on most platforms but may do the wrong thing if the vtable has a larger alignment. This issue is visible with a prototype implementation of HWASAN for globals, which will overalign all globals including vtables to 16 bytes. There is also no padding requirement for the bytes inserted after the global because they are never read from nor are they significant for alignment purposes, so stop inserting padding there. Differential Revision: https://reviews.llvm.org/D65031 llvm-svn: 366725
* [PowerPC] Fix comment on MO_PLT Target Operand Flag. [NFC]Sean Fertile2019-07-221-2/+2
| | | | | | Patch by Xiangling Liao. llvm-svn: 366724
* [Object][XCOFF] Remove extra includes from XCOFF related files. [NFC]Sean Fertile2019-07-221-5/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D60885 llvm-svn: 366723
* LowerTypeTests: Teach the pass to respect global alignments.Peter Collingbourne2019-07-221-19/+26
| | | | | | | | | | | | | | | We were previously ignoring alignment entirely when combining globals together in this pass. There are two main things that we need to do here: add additional padding before each global to meet the alignment requirements, and set the combined global's alignment to the maximum of all of the original globals' alignments. Since we now need to calculate layout as we go anyway, use the calculated layout to produce GlobalLayout instead of using StructLayout. Differential Revision: https://reviews.llvm.org/D65033 llvm-svn: 366722
* Changes to emit CodeView debug info nested type records properly using ↵Nilanjana Basu2019-07-222-2/+23
| | | | | | MCStreamer directives llvm-svn: 366720
* [SLPVectorizer] Fix some MSVC/cppcheck uninitialized variable warnings. NFCI.Simon Pilgrim2019-07-221-3/+3
| | | | llvm-svn: 366712
* Revert "Reland [ELF] Loose a condition for relocation with a symbol"Vlad Tsyrklevich2019-07-221-0/+5
| | | | | | | This reverts commit r366686 as it appears to be causing buildbot failures on sanitizer-x86_64-linux-android and sanitizer-x86_64-linux. llvm-svn: 366708
* TableGen: Support physical register inputs > 255Matt Arsenault2019-07-221-1/+4
| | | | | | | This was truncating register value that didn't fit in unsigned char. Switch AMDGPU sendmsg intrinsics to using a tablegen pattern. llvm-svn: 366695
* [ARM][LowOverheadLoops] Revert remaining pseudosSam Parker2019-07-221-12/+56
| | | | | | | | | | | ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691
* Reland [ELF] Loose a condition for relocation with a symbolNikola Prica2019-07-221-5/+0
| | | | | | | | | | | | | | | | | | | This patch was not the reason of the buildbot failure. Deleted code was introduced as a work around for a bug in the gold linker (http://sourceware.org/PR16794). Test case that was given as a reason for this part of code, the one on previous link, now works for the gold. This condition is too strict and when a code is compiled with debug info it forces generation of numerous relocations with symbol for architectures that do not have relocation addend. Reviewers: arsenm, espindola Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D64327 llvm-svn: 366686
* AMDGPU/GlobalISel: Remove unnecessary codeMatt Arsenault2019-07-221-4/+0
| | | | | | | The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685
* [ARM] Fix for MVE VPT block passDavid Green2019-07-221-3/+18
| | | | | | | | | We need to ensure that the number of T's is correct when adding multiple instructions into the same VPT block. Differential revision: https://reviews.llvm.org/D65049 llvm-svn: 366684
* [X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED)Simon Pilgrim2019-07-221-5/+62
| | | | | | | | | | | | | | This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Fixed out of bounds load assert identified in rL366501 Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366681
* Added address-space mangling for stack related intrinsicsChristudasan Devadasan2019-07-225-10/+21
| | | | | | | | | | | | Modified the following 3 intrinsics: int_addressofreturnaddress, int_frameaddress & int_sponentry. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64561 llvm-svn: 366679
* [IPRA][ARM] Make use of the "returned" parameter attributeOliver Stannard2019-07-224-0/+21
| | | | | | | | | | | | ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669
* [AMDGPU] Save some work when an atomic op has no usesJay Foad2019-07-221-67/+70
| | | | | | | | | | | | | | | | | | Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC. Reviewers: arsenm, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64981 llvm-svn: 366667
* [Loop Peeling] Fix the handling of branch weights of peeled off branches.Serguei Katkov2019-07-221-62/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current algorithm to update branch weights of latch block and its copies is based on the assumption that number of peeling iterations is approximately equal to trip count. However it is not correct. According to profitability check in one case we can decide to peel in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration can be less then estimated trip count. This patch introduces another way to set the branch weights to peeled of branches. Let F is a weight of the edge from latch to header. Let E is a weight of the edge from latch to exit. F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit. Then, Estimated TripCount = F / E. For I-th (counting from 0) peeled off iteration we set the the weights for the peeled latch as (TC - I, 1). It gives us reasonable distribution, The probability to go to exit 1/(TC-I) increases. At the same time the estimated trip count of remaining loop reduces by I. As a result after peeling off N iteration the weights will be (F - N * E, E) and trip count of loop becomes F / E - N or TC - N. The idea is taken from the review of the patch D63918 proposed by Philip. Reviewers: reames, mkuper, iajbar, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D64235 llvm-svn: 366665
* [X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST ↵Simon Pilgrim2019-07-211-19/+13
| | | | | | | | narrowing handling. NFCI. Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes. llvm-svn: 366660
* [InstCombine] Update comment I missed in r366649. NFCCraig Topper2019-07-211-1/+1
| | | | llvm-svn: 366658
* [GISel]: Attach missing range metadata while translating G_LOADsAditya Nandakumar2019-07-211-2/+3
| | | | | | | | | | https://reviews.llvm.org/D65048 Attach range information to G_LOAD when only defining one register. reviewed by: arsenm llvm-svn: 366656
* [InstCombine] Remove insertRangeTest code that handles the equality case.Craig Topper2019-07-211-4/+2
| | | | | | | | | | | | | For equality, the function called getTrue/getFalse with the VT of the comparison input. But getTrue/getFalse need the boolean VT. So if this code ever executed, it would assert. I believe these cases are removed by InstSimplify so we don't get here. So this patch just fixes up an assert to exclude the equality possibility and removes the broken code. llvm-svn: 366649
* [InstCombine] Don't use AddOne/SubOne to see if two APInts are 1 apart. Use ↵Craig Topper2019-07-211-5/+9
| | | | | | | | | | APInt operations instead. NFCI AddOne/SubOne create new Constant objects. That seems heavy for comparing ConstantInts which wrap APInts. Just do the math on on the APInts and compare them. llvm-svn: 366648
* [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvementsRoman Lebedev2019-07-201-35/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Four things here: 1. Generalize the fold to handle non-splat divisors. Reasonably trivial. 2. Unban power-of-two divisors. I don't see any reason why they should be illegal. * There is no ban in Hacker's Delight * I think the ban came from the same bug that caused the miscompile in the base patch - in `floor((2^W - 1) / D)` we were dividing by `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`, which made sense. 3. Unban `1` divisors. I no longer believe Hacker's Delight actually says that the fold is invalid for `D = 0`. Further considerations: * We know that * `(X u% 1) == 0` can be constant-folded to `1`, * `(X u% 1) != 0` can be constant-folded to `0`, * Also, we know that * `X u<= -1` can be constant-folded to `1`, * `X u> -1` can be constant-folded to `0`, * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p * We know will end up with the following: `(setule/setugt (rotr (mul N, P), K), Q)` * Therefore, for given new DAG nodes and comparison predicates (`ule`/`ugt`), we will still produce the correct answer if: `Q` is a all-ones constant; and both `P` and `K` are *anything* other than `undef`. * The fold will indeed produce `Q = all-ones`. 4. Try to re-splat the `P` and `K` vectors - we don't care about their values for the lanes where divisor was `1`. Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00 Reviewed By: RKSimon Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63963 llvm-svn: 366637
* [X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)Simon Pilgrim2019-07-201-7/+38
| | | | | | As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result. llvm-svn: 366636
* [Local] Zap blockaddress without users in ConstantFoldTerminator.Florian Hahn2019-07-201-0/+6
| | | | | | | | | | | | | | | | If the blockaddress is not destoryed, the destination block will still be marked as having its address taken, limiting further transformations. I think there are other places where the dead blockaddress constants are kept around, I'll look into that as follow up. Reviewers: craig.topper, brzycki, davide Reviewed By: brzycki, davide Differential Revision: https://reviews.llvm.org/D64936 llvm-svn: 366633
* [GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREsJessica Paquette2019-07-201-0/+49
| | | | | | | | | | | | | | | | | | | Sometimes, you can end up with cross-bank copies between same-sized GPRs and FPRs, which feed into G_STOREs. When these copies feed only into stores, they aren't necessary; we can just store using the original register bank. This provides some minor code size savings for some floating point SPEC benchmarks. (Around 0.2% for 453.povray and 450.soplex) This issue doesn't seem to show up due to regbankselect or anything similar. So, this patch introduces an early select function, `contractCrossBankCopyIntoStore` which performs the contraction when possible. The selector then continues normally and selects the correct store opcode, eliminating needless copies along the way. Differential Revision: https://reviews.llvm.org/D65024 llvm-svn: 366625
* [WebAssembly] Compute and export TLS block alignmentGuanzhong Chen2019-07-192-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add immutable WASM global `__tls_align` which stores the alignment requirements of the TLS segment. Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang. The expected usage has now changed to: __wasm_init_tls(memalign(__builtin_wasm_tls_align(), __builtin_wasm_tls_size())); Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65028 llvm-svn: 366624
* AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spacesMatt Arsenault2019-07-191-1/+3
| | | | llvm-svn: 366621
* [AMDGPU] Autogenerate register sequences in tuplesStanislav Mekhanoshin2019-07-191-272/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619
* [AMDGPU] Fixed occupancy calculation for gfx10Stanislav Mekhanoshin2019-07-194-28/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
* AMDGPU: Avoid custom predicates for stores with glueMatt Arsenault2019-07-191-18/+24
| | | | llvm-svn: 366613
* AMDGPU: Redefine setcc condition PatLeafsMatt Arsenault2019-07-193-67/+36
| | | | | | Avoid using custom code predicates. llvm-svn: 366609
* AMDGPU: Don't rely on m0 being -1 for GWS offsetsMatt Arsenault2019-07-191-4/+6
| | | | | | | This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608
* AMDGPU: Force s_waitcnt after GWS instructionsMatt Arsenault2019-07-194-5/+26
| | | | | | | This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607
* LiveIntervals: Fix handleMove asserting on BUNDLEMatt Arsenault2019-07-191-1/+4
| | | | | | | | | The top-level BUNDLE instruction should behave as an ordinary instruction. It is supposed to have all relevant registers as implicit operands. Moving it should work as any other instruction. I believe the assert intended to avoid moving instructions inside bundles. llvm-svn: 366605
* Revert "Use the MachineBasicBlock symbol for a callbr target"Nick Desaulniers2019-07-191-7/+2
| | | | | | | | | | | This reverts commit r366523/ccbffefccaff42b0d094c9ef0f49fc3e8c8456ea. Two regressions were immediately reported: - https://github.com/ClangBuiltLinux/linux/issues/614 - https://github.com/ClangBuiltLinux/linux/issues/615 Reported-by: nathanchance llvm-svn: 366600
* [AMDGPU] Allow register tuples to set asm namesStanislav Mekhanoshin2019-07-194-139/+99
| | | | | | | | | | | | This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598
* AMDGPU/GlobalISel: Fix MMO flags for kernel argument loadsMatt Arsenault2019-07-191-1/+1
| | | | | | The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597
* AMDGPU/GlobalISel: Selection for fminnum/fmaxnumMatt Arsenault2019-07-191-2/+4
| | | | | | | v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585
* AMDGPU/GlobalISel: Support arguments with multiple registersMatt Arsenault2019-07-192-30/+47
| | | | | | Handles structs used directly in argument lists. llvm-svn: 366584
* AMDGPU/GlobalISel: Rewrite lowerFormalArgumentsMatt Arsenault2019-07-194-200/+374
| | | | | | | | | | | | | | | | | This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582
OpenPOWER on IntegriCloud