summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [SystemZ] implement hasDivRemOp()Jonas Paulsson2017-11-062-0/+6
| | | | | | | | | SystemZ can do division and remainder in a single instruction for scalar integer types, which are now reflected by returning true in this hook for those cases. Review: Ulrich Weigand llvm-svn: 317477
* [AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bitYaxun Liu2017-11-061-5/+10
| | | | | | | | | | | | The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 llvm-svn: 317476
* [mips] Add movep for microMIPS32R6 and fix microMIPS32r3 versionSimon Dardis2017-11-068-5/+51
| | | | | | | | | | | | | | | | | | | Previously, the 'movep' instruction was defined for microMIPS32r3 and shared that definition with microMIPS32R6. 'movep' was re-encoded for microMIPS32r6, so this patch provides the correct encoding. Secondly, correct the encoding of the 'rs' and 'rt' operands which have an instruction specific encoding for the registers those operands accept. Finally, correct the decoding of the 'dst_regs' operand which was extracting the relevant field from the instruction, but was actually extracting the field from the alreadly extracted field. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39495 llvm-svn: 317475
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-061-1/+4
| | | | | | | | | | Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471
* [mips] Fix PR35140Simon Dardis2017-11-061-4/+4
| | | | | | | | | | | | | | Mark all symbols involved with TLS relocations as being TLS symbols. This resolves PR35140. Thanks to Alex Crichton for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39591 llvm-svn: 317470
* [X86][AVX512] Improve lowering of AVX512 test intrinsicsUriel Korach2017-11-062-4/+20
| | | | | | | | | | | | Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465
* [X86] Replace duplicate function call with variable. NFCUriel Korach2017-11-061-2/+2
| | | | | | | | | | | | | Change from: if (N->getOperand(0).getValueType() == MVT::v8i32 || N->getOperand(0).getValueType() == MVT::v8f32) to: EVT OpVT = N->getOperand(0).getValueType(); if (OpVT == MVT::v8i32 || OpVT == MVT::v8f32) Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25 llvm-svn: 317464
* X86 ISel: Basic support for variable-index vector permutationsZvi Rackover2017-11-061-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Try to lower a BUILD_VECTOR composed of extract-extract chains that can be reasoned to be a permutation of a vector by indices in a non-constant vector. We saw this pattern created by ISPC, which resolts to creating it due to the requirement that shufflevector's mask operand be a *constant* vector. I didn't check this but we could possibly use this pattern for lowering the X86 permute C-instrinsics instead of llvm.x86 instrinsics. This change can be followed by more improvements: 1. Handle vectors with undef elements. 2. Utilize pshufb and zero-mask-blending to support more effiecient construction of vectors with constant-0 elements. 3. Use smaller-element vectors of same width, and "interpolate" the indices, when no native operation available. Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: chandlerc, DavidKreitzer Differential Revision: https://reviews.llvm.org/D39126 llvm-svn: 317463
* Revert "adding a pattern for broadcastm"Jina Nahias2017-11-061-2/+2
| | | | | | | This reverts commit r317457. Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91 llvm-svn: 317462
* [ObjectYAML] Map relocation types for COFF ARMNT and ARM64Martin Storsjo2017-11-061-0/+48
| | | | | | Differential Revision: https://reviews.llvm.org/D39668 llvm-svn: 317459
* [x86][AVX512] Lowering Broadcastm intrinsics to LLVM IRJina Nahias2017-11-063-18/+27
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458
* adding a pattern for broadcastmJina Nahias2017-11-061-2/+2
| | | | | Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 317457
* [X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.Craig Topper2017-11-062-39/+49
| | | | llvm-svn: 317454
* [X86] Add scalar FMA ISD nodes without rounding mode. NFCCraig Topper2017-11-065-37/+92
| | | | | | Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
* [X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics.Craig Topper2017-11-062-10/+19
| | | | | | Fixes PR35161. llvm-svn: 317445
* [PassManager, SimplifyCFG] Revert r316908 and r316869.David L. Jones2017-11-061-7/+2
| | | | | | These cause Clang to crash with a segfault. See PR35210 for details. llvm-svn: 317444
* [X86] Add missing predicate to a pattern. NFCCraig Topper2017-11-051-0/+2
| | | | | | Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order. llvm-svn: 317442
* [X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I ↵Craig Topper2017-11-052-25/+12
| | | | | | missed in r317413. llvm-svn: 317441
* [X86] Fix outdated comment. NFCCraig Topper2017-11-051-1/+1
| | | | llvm-svn: 317440
* [LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies aDorit Nuzman2017-11-051-1/+44
| | | | | | | | | | | | single-iteration loop This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that Stride >= Trip-Count. Such a predicate will effectively optimize a single or zero iteration loop, as Trip-Count <= Stride == 1. Differential Revision: https://reviews.llvm.org/D38785 llvm-svn: 317438
* [REVERT][LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-4/+1
| | | | | | | | | reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-1/+4
| | | | | | | | This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432
* [CGP] Fix the bug found by asan.Serguei Katkov2017-11-051-2/+2
| | | | | | Try to fix the asan failure introduced by r317429. llvm-svn: 317431
* [CGP] Extends the scope of optimizeMemoryInst optimizationSerguei Katkov2017-11-051-5/+438
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an implementation of PR26223. Currently optimizeMemoryInst optimization tries to fold address computation if all possible way to get compute the address are of the form baseGV + base + scale * Index + offset where scale and offset are constants and baseGV, base and Index are exactly the same instructions if defined. The patch extends this optimization to allow different bases. In this case it tries to find/build a Phi node merging all possible bases and use this Phi node as a base for sunk address computation. Also it supports Select instruction on the way. The main motivation for this scope extension is GCRelocateInst. If there is a relocation of derived pointer it will be represented as relocation of base + offset. Also there will be a Phi node merging address computation for relocated derived pointer and derived pointer itself. If we have a Phi node merging original base and relocated base and can fold the address computation of derived pointer then we can potentially reduce the code size and Phi node for derived pointer. The later can have a positive impact to register allocator. Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn Reviewed By: john.brawn Subscribers: javed.absar, john.brawn, dneilson, llvm-commits Differential Revision: https://reviews.llvm.org/D36073 llvm-svn: 317429
* [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy ↵Craig Topper2017-11-046-29/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413
* [X86] Teach EVEX->VEX pass to turn SHUFI32X4/SHUFF32X4/SHUFI64X/SHUFF64X2 ↵Craig Topper2017-11-041-1/+19
| | | | | | | | into VPERM2F128/VPERM2I128. This recovers some of the tests that were changed by r317403. llvm-svn: 317410
* [AMDGPU] Remove hardcoded address space value from AMDGPULibFuncYaxun Liu2017-11-043-24/+29
| | | | | | | | | | | | AMDGPULibFunc hardcodes address space values of the old address space mapping, which causes invalid addrspacecast instructions and undefined functions in APPSDK sample MonteCarloAsianDP. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39616 llvm-svn: 317409
* [LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local.Sean Fertile2017-11-045-10/+43
| | | | | | | | | | | | | Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Originally commited as r317374, but reverted in r317395 to update some missed tests. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317408
* [X86] Teach shuffle lowering to use 256-bit SHUF128 when possible.Craig Topper2017-11-041-0/+10
| | | | | | | | This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary. As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out. llvm-svn: 317403
* Revert "[LTO][ThinLTO] Use the linker resolutions to mark global values ..."Sean Fertile2017-11-045-43/+10
| | | | | | | | | Changes more tests then expected on one of the build bots. reverting to investigate. This reverts https://llvm.org/svn/llvm-project/llvm/trunk@317374 llvm-svn: 317395
* [CallSiteSplitting] clang-format my last commit. NFCI.Davide Italiano2017-11-041-3/+2
| | | | | | Thanks to Rui for pointing out. llvm-svn: 317393
* [CallSiteSplitting] Silence GCC's -Wparentheses. NFCI.Davide Italiano2017-11-031-2/+2
| | | | llvm-svn: 317385
* [X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to ↵Craig Topper2017-11-031-4/+4
| | | | | | make it possible to fold a load. llvm-svn: 317382
* Move TargetFrameLowering.h to CodeGen where it's implementedDavid Blaikie2017-11-0371-71/+71
| | | | | | | | | | | This header already includes a CodeGen header and is implemented in lib/CodeGen, so move the header there to match. This fixes a link error with modular codegeneration builds - where a header and its implementation are circularly dependent and so need to be in the same library, not split between two like this. llvm-svn: 317379
* Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()Adrian Prantl2017-11-032-1/+2
| | | | | | | | | | This preserves the debug info for the cast operation in the original location. rdar://problem/33460652 Reapplied r317340 with the test moved into an ARM-specific directory. llvm-svn: 317375
* [LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local.Sean Fertile2017-11-035-10/+43
| | | | | | | | | | Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317374
* Revert r317046, "Object: Move some code from ELF.h into ELF.cpp."Peter Collingbourne2017-11-031-263/+0
| | | | | | | This change resulted in a measured 1.5-2% perf regression linking chrome. llvm-svn: 317371
* [SimplifyCFG] When merging conditional stores, don't count the store we're ↵Craig Topper2017-11-031-1/+3
| | | | | | | | | | | | merging against the PHINodeFoldingThreshold Merging conditional stores tries to check to see if the code is if convertible after the store is moved. But the store hasn't been moved yet so its being counted against the threshold. The patch adds 1 to the threshold comparison to make sure we don't count the store. I've adjusted a test to use a lower threshold to ensure we still do that conversion with the lower threshold. Differential Revision: https://reviews.llvm.org/D39570 llvm-svn: 317368
* GCOV: Move GCOV from IR & Support into ProfileData to fix layeringDavid Blaikie2017-11-033-2/+2
| | | | | | | | This class was split between libIR and libSupport, which breaks under modular code generation. Move it into the one library that uses it, ProfileData, to resolve this issue. llvm-svn: 317366
* Recommit r317351 : Add CallSiteSplitting passJun Bum Lim2017-11-036-1/+510
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This recommit r317351 after fixing a buildbot failure. Original commit message: Summary: This change add a pass which tries to split a call-site to pass more constrained arguments if its argument is predicated in the control flow so that we can expose better context to the later passes (e.g, inliner, jump threading, or IPA-CP based function cloning, etc.). As of now we support two cases : 1) If a call site is dominated by an OR condition and if any of its arguments are predicated on this OR condition, try to split the condition with more constrained arguments. For example, in the code below, we try to split the call site since we can predicate the argument (ptr) based on the OR condition. Split from : if (!ptr || c) callee(ptr); to : if (!ptr) callee(null ptr) // set the known constant value else if (c) callee(nonnull ptr) // set non-null attribute in the argument 2) We can also split a call-site based on constant incoming values of a PHI For example, from : BB0: %c = icmp eq i32 %i1, %i2 br i1 %c, label %BB2, label %BB1 BB1: br label %BB2 BB2: %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ] call void @bar(i32 %p) to BB0: %c = icmp eq i32 %i1, %i2 br i1 %c, label %BB2-split0, label %BB1 BB1: br label %BB2-split1 BB2-split0: call void @bar(i32 0) br label %BB2 BB2-split1: call void @bar(i32 1) br label %BB2 BB2: %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ] llvm-svn: 317362
* Modularize: Include some required headersDavid Blaikie2017-11-032-1/+3
| | | | | | | | DenseMaps require the definition of a type to be available when using a pointer to that type as a key to know how many bits are available for tombstone/etc. llvm-svn: 317360
* Add llvm::for_each as a range-based extensions to <algorithm> and make use ↵Aaron Ballman2017-11-033-28/+25
| | | | | | of it in some cases where it is a more clear alternative to std::for_each. llvm-svn: 317356
* [X86] Promote athlon, athlon-xp, k8, and k8-sse3 to types instead of ↵Craig Topper2017-11-031-24/+16
| | | | | | | | | | subtypes in getHostCPUName. NFCI This removes the athlon type and simplifies the string decoding. We only really need these type/subtype breaks where we need to match libgcc/compiler-rt and these CPUs aren't part of that. I'm looking into moving some of this information to a .def file to share with clang's __builtin_cpu_is handling. And while these CPUs aren't part of that the less lines I have to deal with in the .def file the better. llvm-svn: 317354
* Revert "Add CallSiteSplitting pass"Jun Bum Lim2017-11-036-509/+1
| | | | | | | | Revert due to Buildbot failure. This reverts commit r317351. llvm-svn: 317353
* Reland "Add support for writing 64-bit symbol tables for archives when ↵Jake Ehrlich2017-11-031-9/+55
| | | | | | | | | | | | | | | | | | offsets become too large for 32-bit" Tests were failing because some bots were running out of address space and memory. Additionally the test was very slow. These issues were solved by changing the test to take advantage of sparse filse and restricting the test to run only on 64-bit systems. This should fix https://bugs.llvm.org//show_bug.cgi?id=34189 This change makes it so that if writing a K_GNU style archive, you need to output a > 32-bit offset it should output in K_GNU64 style instead. Differential Revision: https://reviews.llvm.org/D36812 llvm-svn: 317352
* Add CallSiteSplitting passJun Bum Lim2017-11-036-1/+509
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change add a pass which tries to split a call-site to pass more constrained arguments if its argument is predicated in the control flow so that we can expose better context to the later passes (e.g, inliner, jump threading, or IPA-CP based function cloning, etc.). As of now we support two cases : 1) If a call site is dominated by an OR condition and if any of its arguments are predicated on this OR condition, try to split the condition with more constrained arguments. For example, in the code below, we try to split the call site since we can predicate the argument (ptr) based on the OR condition. Split from : if (!ptr || c) callee(ptr); to : if (!ptr) callee(null ptr) // set the known constant value else if (c) callee(nonnull ptr) // set non-null attribute in the argument 2) We can also split a call-site based on constant incoming values of a PHI For example, from : BB0: %c = icmp eq i32 %i1, %i2 br i1 %c, label %BB2, label %BB1 BB1: br label %BB2 BB2: %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ] call void @bar(i32 %p) to BB0: %c = icmp eq i32 %i1, %i2 br i1 %c, label %BB2-split0, label %BB1 BB1: br label %BB2-split1 BB2-split0: call void @bar(i32 0) br label %BB2 BB2-split1: call void @bar(i32 1) br label %BB2 BB2: %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ] Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide Reviewed By: davidxl Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D39137 llvm-svn: 317351
* [AArch64] Fix the number of iterations for the Newton seriesEvandro Menezes2017-11-031-1/+1
| | | | | | | | | The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349
* The patch fixes PR35131Evgeny Stupachenko2017-11-031-3/+3
| | | | | | | | | | | | | Summary: Fix a misprint which led to false CTLZ recognition. Reviewers: craig.topper Differential Revision: https://reviews.llvm.org/D39585 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 317348
* Revert "Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()"Adrian Prantl2017-11-032-2/+1
| | | | | | This reverts commit 317342 while investigating bot breakage. llvm-svn: 317345
* [CodeGen] Remove unnecessary semicolons to fix a warning. NFCCraig Topper2017-11-031-2/+2
| | | | llvm-svn: 317342
OpenPOWER on IntegriCloud