summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Reapply "AMDGPU: Force inlining if LDS global address is used"Matt Arsenault2018-07-103-2/+87
| | | | | | This reverts commit r336623 llvm-svn: 336675
* [Hexagon] Add implicit uses even when untied explicit uses are presentKrzysztof Parzyszek2018-07-101-0/+24
| | | | | | | | | | | | | | | | | | An explicit untied use is not sufficient to maintain liveness of a register redefined in a predicated instruction. For example %1 = COPY %0 ... %1 = A2_paddif %2, %1, 1 could become $r1 = COPY $r0 ... $r1 = A2_paddif $p0, $r1, 1 and later $r1 = COPY $r0 ;; this is not really dead! ... $r1 = A2_paddif $p0, $r0, 1 llvm-svn: 336662
* [X86] Fast-isel tests for lowered truncation intrinsicsMikhail Dvoretckii2018-07-102-0/+115
| | | | | | | | | This patch adds fast-isel tests for the IR patterns produced for truncation intrinsics in rC336643. Differential Revision: https://reviews.llvm.org/D48822 llvm-svn: 336645
* [X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3)Simon Pilgrim2018-07-104-40/+48
| | | | | | | | | | Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data. This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage. Differential Revision: https://reviews.llvm.org/D48936 llvm-svn: 336642
* [X86] Regenerate vector-shuffle-512-v8.ll so the script will merge the 32 ↵Craig Topper2018-07-101-861/+387
| | | | | | and 64 bit checks together. NFC llvm-svn: 336641
* [X86] Correct vfixupimm load patterns to look for an integer load, not a ↵Craig Topper2018-07-101-4/+2
| | | | | | | | floating point load bitcasted to integer. DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load. llvm-svn: 336626
* [X86] Add test cases that show failure to fold load into vfixupimm ↵Craig Topper2018-07-101-1/+23
| | | | | | instructions due to bad isel pattern. llvm-svn: 336625
* Revert "AMDGPU: Force inlining if LDS global address is used"Vlad Tsyrklevich2018-07-103-87/+2
| | | | | | | This reverts commit r336587, it was causing test failures on the sanitizer bots. llvm-svn: 336623
* [WebAssembly] Support for binary atomic RMW instructionsHeejin Ahn2018-07-092-0/+1274
| | | | | | | | | | | | | | | | | | Summary: This adds support for binary atomic read-modify-write instructions: add, sub, and, or, xor, and xchg. This does not yet support translations of some of LLVM IR atomicrmw instructions (nand, max, min, umax, and umin) that do not have a direct counterpart in wasm instructions. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49088 llvm-svn: 336615
* Fix line endings. NFCI.Simon Pilgrim2018-07-093-177/+177
| | | | llvm-svn: 336602
* [Power9] Add __float128 builtins for Rounding OperationsStefan Pintilie2018-07-091-0/+76
| | | | | | | | | | | | | | | Added __float128 support for a number of rounding operations: trunc rint nearbyint round floor ceil Differential Revision: https://reviews.llvm.org/D48415 llvm-svn: 336601
* [WebAssembly] Improve readability of load/stores and tests. NFC.Heejin Ahn2018-07-094-348/+659
| | | | | | | | | | | | | | | | | Summary: - Changed variable/function names to be more consistent - Improved comments in test files - Added more tests - Fixed a few typos - Misc. cosmetic changes Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49087 llvm-svn: 336598
* [Power9] [LLVM] Add __float128 support for trunc to double round to oddStefan Pintilie2018-07-091-0/+10
| | | | | | | | | Add support for this builtin: double builtin_truncf128_round_to_odd(float128) Differential Revision: https://reviews.llvm.org/D48483 llvm-svn: 336595
* RenameIndependentSubregs: Fix handling of undef tied operandsMark Searles2018-07-091-0/+18
| | | | | | | | | Ensure that, if updating a tied operand pair, to only update that pair. Differential Revision: https://reviews.llvm.org/D49052 llvm-svn: 336593
* [globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchgDaniel Sanders2018-07-091-0/+265
| | | | | | | | | | | | | | | | | | | | Summary: This patch adds support for the atomicrmw instructions and the strong cmpxchg instruction to the IRTranslator. I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what difference it makes to the backend. As far as I can tell from the code, it only matters to AtomicExpandPass which is run at the LLVM-IR level. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D40092 llvm-svn: 336589
* AMDGPU: Force inlining if LDS global address is usedMatt Arsenault2018-07-093-2/+87
| | | | | | | | | | These won't work for the forseeable future. These aren't allowed from OpenCL, but IPO optimizations can make them appear. Also directly set the attributes on functions, regardless of the linkage rather than cloning functions like before. llvm-svn: 336587
* [X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.Roman Lebedev2018-07-093-485/+487
| | | | | | | | | | | | | | | | | | | | | Summary: This adds a reverse transform for the instcombine canonicalizations that were added in D47980, D47981. As discussed later, that was worse at least for the code size, and potentially for the performance, too. https://rise4fun.com/Alive/Zmpl Reviewers: craig.topper, RKSimon, spatel Reviewed By: spatel Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D48768 llvm-svn: 336585
* [Power9] Add __float128 builtins for Round To OddStefan Pintilie2018-07-091-0/+82
| | | | | | | | | | | | GCC has builtins for these round to odd instructions: __float128 __builtin_sqrtf128_round_to_odd (__float128) __float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128) __float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128) Differential Revision: https://reviews.llvm.org/D47550 llvm-svn: 336578
* [X86] In combineFMA, make sure we bitcast the result of isFNEG back the ↵Craig Topper2018-07-091-0/+28
| | | | | | | | expected type before creating the new FMA node. Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing. llvm-svn: 336566
* [X86][AVX] Regenerate AVX1 fast-isel tests. Simon Pilgrim2018-07-091-1900/+1181
| | | | | | Let the update script merge 32/64 tests where possible llvm-svn: 336565
* [Power9] Add __float128 support for compare operationsStefan Pintilie2018-07-091-0/+225
| | | | | | | | Added handling for the select f128. Differential Revision: https://reviews.llvm.org/D48294 llvm-svn: 336548
* [X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT.Craig Topper2018-07-082-110/+32
| | | | llvm-svn: 336514
* [X86][SSE] Combine v16i8 SHL by constants to multipliesSimon Pilgrim2018-07-083-216/+177
| | | | | | | | Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data. Differential Revision: https://reviews.llvm.org/D48963 llvm-svn: 336513
* [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.Craig Topper2018-07-081-92/+272
| | | | | | | | | | | | | | This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma. I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction. Fast-isel tests have been updated to match new clang codegen. We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down. A future patch will also remove the old intrinsics. llvm-svn: 336506
* [X86] Use a rounding mode other than 4 in the scalar fma intrinsic fast-isel ↵Craig Topper2018-07-081-72/+72
| | | | | | tests to match clang test cases. llvm-svn: 336505
* [X86] Regenerate PR14088 test. NFCI.Simon Pilgrim2018-07-071-13/+38
| | | | llvm-svn: 336496
* [DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorEltsSimon Pilgrim2018-07-073-70/+37
| | | | | | | | As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR. Differential Revision: https://reviews.llvm.org/D48825 llvm-svn: 336490
* NFC - Typo fixes in X86 flags-copy-lowering.mir testGabor Buella2018-07-071-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D48934 llvm-svn: 336484
* [MachineOutliner] Add missing liveness tracking info in MIR test.Yvan Roux2018-07-071-0/+1
| | | | | | This should bring the bots back to green state. llvm-svn: 336482
* Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.Nico Weber2018-07-063-213/+20
| | | | llvm-svn: 336453
* [ARM] ParallelDSP: added statistics, NFC.Sjoerd Meijer2018-07-0613-17/+18
| | | | | | | | | Added statistics for the number of SMLAD instructions created, and als renamed the pass name to -arm-parallel-dsp. Differential Revision: https://reviews.llvm.org/D48971 llvm-svn: 336441
* Commit rL336426 cause buildbot failuresDiogo N. Sampaio2018-07-062-6/+6
| | | | | | | | http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50537/testReport/junit/LLVM/CodeGen_AArch64/FoldRedundantShiftedMasking_ll/ This removes the comments of the function label causing this error. llvm-svn: 336440
* [SelectionDAG] https://reviews.llvm.org/D48278Diogo N. Sampaio2018-07-063-20/+213
| | | | | | | | | | | | | | | | D48278 Allow to reduce redundant shift masks. For example: x1 = x & 0xAB00 x2 = (x >> 8) & 0xAB can be reduced to: x1 = x & 0xAB00 x2 = x1 >> 8 It only allows folding when the masks and shift values are constants. llvm-svn: 336426
* [X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead.Craig Topper2018-07-063-16/+28
| | | | | | | | The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector. There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG. llvm-svn: 336416
* [X86] Remove all of the avx512 masked packed fma intrinsics. Use llvm.fma or ↵Craig Topper2018-07-063-356/+1967
| | | | | | | | | | unmasked 512-bit intrinsics with rounding mode. This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking. This matches how clang uses the intrinsics these days. llvm-svn: 336409
* [X86] Cleanup some of the avx512 masked fma tests to prepare for removing ↵Craig Topper2018-07-063-1217/+1099
| | | | | | | | | | and autoupgrading. -Split cases that call 2 intrinsics in the same case. -Remove testing mask3 and maskz intrinsics with an all ones mask. These won't be interesting after the upgrade. -Restore test cases for some intrinsics that are marked for deletion, but haven't been deleted yet. llvm-svn: 336408
* [Power9] Add __float128 library call for fremStefan Pintilie2018-07-061-0/+14
| | | | | | | | Power 9 does not have a hardware instruction for frem but we can call fmodf128. Differential Revision: https://reviews.llvm.org/D48552 llvm-svn: 336406
* [x86]Add a test case to show missed vfnmadd generation.Easwaran Raman2018-07-061-0/+25
| | | | llvm-svn: 336404
* [X86] Remove the last of the 'x86.fma.' intrinsics and autoupgrade them to ↵Craig Topper2018-07-054-184/+1586
| | | | | | | | 'llvm.fma'. Add upgrade tests for all. Still need to remove the AVX512 masked versions. llvm-svn: 336383
* [X86] Add SHUF128 to target shuffle decoding.Craig Topper2018-07-052-65/+127
| | | | | | Differential Revision: https://reviews.llvm.org/D48954 llvm-svn: 336376
* AMDGPU: Don't use spir_kernel in a testMatt Arsenault2018-07-051-3/+2
| | | | | | Also use verify-machineinstrs. llvm-svn: 336374
* AMDGPU/GlobalISel: Implement custom kernel arg loweringMatt Arsenault2018-07-052-20/+789
| | | | | | | | | | | | | Avoid using allocateKernArg / AssignFn. We do not want any of the type splitting properties of normal calling convention lowering. For now at least this exists alongside the IR argument lowering pass. This is necessary to handle struct padding correctly while some arguments are still skipped by the IR argument lowering pass. llvm-svn: 336373
* [Power9] Add lib calls for float128 operations with no equivalent PPC ↵Lei Huang2018-07-051-1/+146
| | | | | | | | | | | instructions Map the following instructions to the proper float128 lib calls: pow[i], exp[2], log[2|10], sin, cos, fmin, fmax Differential Revision: https://reviews.llvm.org/D48544 llvm-svn: 336361
* [X86][SSE] Add srem x, (1 << c) combine testsSimon Pilgrim2018-07-051-0/+227
| | | | | | Now that D45806 has landed we can start trying to avoid scalarizing srem by constant - these tests demonstrate some example cases. llvm-svn: 336360
* [AArch64, PowerPC, x86] add tests for signbit bit hacks; NFCSanjay Patel2018-07-053-0/+459
| | | | llvm-svn: 336348
* [AMDGPU] Add VALU to V_INTERP InstructionsRyan Taylor2018-07-051-0/+19
| | | | | | | | | | | | Wait states are not properly being inserted after buffer_store for v_interp instructions. Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can check and insert the appropriate wait states when needed. Differential Revision: https://reviews.llvm.org/D48772 Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64 llvm-svn: 336339
* Partially revert r336268 in address-offsets.llKrasimir Georgiev2018-07-051-40/+40
| | | | | | | | | | | | | | Summary: There the typos are intentional, explicitly introduced to disable these cases in r280285. Reviewers: bkramer Reviewed By: bkramer Subscribers: dschuff, sbc100, jgravelle-google, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D48962 llvm-svn: 336336
* [X86][SSE] Add extra v16i16 shl x,c -> pmullw testSimon Pilgrim2018-07-051-0/+27
| | | | | | We want to compare shifts with repeated vs non-repeated v8i16 shuffle masks (for PBLENDW ymm) llvm-svn: 336333
* [mips] Fix atomic operations at O0, v3Aleksandar Beserminji2018-07-054-428/+9037
| | | | | | | | | | | | | | | | | | | | | | | Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the stores can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This version addresses issues with the initial implementation and covers all atomic operations. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Patch By: Simon Dardis Differential Revision: https://reviews.llvm.org/D31287 llvm-svn: 336328
* [NEON] Fix combining of vldx_dup intrinsics with updating of base addressesIvan A. Kosarev2018-07-051-0/+43
| | | | | | | | | | | | | Resolves: Unsupported ARM Neon intrinsics in Target-specific DAG combine function for VLDDUP https://bugs.llvm.org/show_bug.cgi?id=38031 Related diff: D48439 Differential Revision: https://reviews.llvm.org/D48920 llvm-svn: 336325
OpenPOWER on IntegriCloud