summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset"Nirav Dave2017-06-306-137/+177
| | | | | | | This reverts commit r306819 which appears be exposing underlying issues in a stage1 ppc64be build llvm-svn: 306820
* [DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffsetNirav Dave2017-06-306-177/+137
| | | | | | | | | | | | | | | | | | | | | | | | As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using generic checks. Also, propagate missing local handling from there to BaseIndexOffset checks. Tests of note: * test/CodeGen/X86/build-vector* - Improved. * test/CodeGen/BPF/undef.ll - Improved store alignment allows an additional store merge * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a case we already do not handle well. Here, the DAG is improved, but scheduling causes a code size degradation. Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D34472 llvm-svn: 306819
* [X86] Updated 32-bit memcmp tests to run with/without SSE2Simon Pilgrim2017-06-301-347/+402
| | | | llvm-svn: 306816
* Remove redundant copy in recurrencesTaewook Oh2017-06-291-0/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code, ``` BB#1: ; Loop Header %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: ; Loop Latch %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0 %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> ``` Existing two-address generation pass generates following code: ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1: ``` .LBB0_6: addl %esi, %edi addl %ebx, %edi cmpl $10, %edi movl %edi, %esi jl .LBB0_1 ``` This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: derived from LLVM BB %bb7 Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` and the final assembly does not have redundant copy: ``` .LBB0_6: addl %edi, %eax addl %ebx, %eax cmpl $10, %eax jl .LBB0_1 ``` Reviewers: qcolombet, MatzeB, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31821 llvm-svn: 306758
* Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"Daniel Jasper2017-06-2955-1058/+243
| | | | | | | | | | I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676
* [GlobalISel][X86] Support vector type G_MERGE_VALUES selection.Igor Breger2017-06-292-0/+126
| | | | | | | | | | | | | | | | Summary: Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33958 llvm-svn: 306665
* [X86][SSE] Dropped -mcpu from palignr testsSimon Pilgrim2017-06-291-112/+147
| | | | | | | | Use triple and attribute only for consistency Add AVX tests as well llvm-svn: 306664
* [X86][SSE] Regenerate shuffle test with update_llc_test_checks.pySimon Pilgrim2017-06-291-5/+6
| | | | llvm-svn: 306663
* [X86][SSE] Dropped -mcpu from vector shift testsSimon Pilgrim2017-06-291-4/+3
| | | | | | Use triple and attribute only for consistency llvm-svn: 306662
* [X86][SSE] Dropped -mcpu from zero insertion testsSimon Pilgrim2017-06-291-9/+6
| | | | | | Use triple and attribute only for consistency llvm-svn: 306661
* [LLVM][X86][Goldmont] Adding new target-cpu: GoldmontMichael Zuckerman2017-06-294-0/+7
| | | | | | | | | | | | | | | | [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658
* [X86] Adding shuffle tests demonstrating missed vcompress opportunities. NFCZvi Rackover2017-06-292-0/+145
| | | | llvm-svn: 306646
* Another test commit.Chih-Hung Hsieh2017-06-281-4/+4
| | | | llvm-svn: 306567
* [X86] Added BSWAP tests for illegal i64/i128/i256 'wide' scalar integersSimon Pilgrim2017-06-281-0/+173
| | | | llvm-svn: 306546
* [X86][SSE] Dropped -mcpu from vector bswap testsSimon Pilgrim2017-06-281-7/+4
| | | | | | Use triple and attribute only for consistency llvm-svn: 306545
* [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-06-281-0/+58
| | | | | | | | | X86InterleavedAccess test. Exapnding the test to include AVX target. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306543
* [GlobalISel][X86] Test G_CONSTANT i32 0 TableGen'erated selection.NFC.Igor Breger2017-06-281-0/+21
| | | | llvm-svn: 306537
* [GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XORIgor Breger2017-06-2810-0/+1098
| | | | | | | | | | | | | | Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: aymanmus Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34605 llvm-svn: 306533
* Reverting commit 306414 on behalf of @gadi.haberMichael Zuckerman2017-06-2830-4307/+3620
| | | | llvm-svn: 306532
* [X86][AVX2] Dropped -mcpu from avx2 arithmetic/intrinsics testsSimon Pilgrim2017-06-2810-384/+364
| | | | | | Use triple and attribute only for consistency llvm-svn: 306531
* [X86] Correct dwarf unwind information in function epiloguePetar Jovanovic2017-06-2855-243/+1058
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529
* [CGP] add specialization for memcmp expansion with only one basic blockSanjay Patel2017-06-271-46/+42
| | | | llvm-svn: 306485
* [CGP] eliminate a sub instruction in memcmp expansionSanjay Patel2017-06-271-46/+39
| | | | | | | | | | | | | | | | | | | | | | As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471
* Another test commitChih-Hung Hsieh2017-06-271-4/+4
| | | | llvm-svn: 306420
* Updated and extended the information about each instruction in HSW and SNB ↵Gadi Haber2017-06-2731-3632/+4339
| | | | | | | | | | | | | | | | | | | to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414
* Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371Ayman Musa2017-06-274-332/+13581
| | | | | | | | | | | | | | | [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402
* [X86][AVX512] Regenerate avx512 arithmetic testsSimon Pilgrim2017-06-271-129/+129
| | | | llvm-svn: 306389
* [GlobalISel][X86] Add fp32/62 legalizer, regbank-select, selection tests for ↵Igor Breger2017-06-2715-157/+970
| | | | | | G_FADD, G_FSUB, G_FMUL, G_FDIV. NFC. llvm-svn: 306370
* DAGCombine: Make sure we only eliminate trunc/extend when the scales of ↵Wolfgang Pieb2017-06-261-0/+35
| | | | | | | | | | | | truncation and extension match. This fixes PR33368. Reviewer: rksimon Differential Revision: https://reviews.llvm.org/D34069 llvm-svn: 306345
* [x86] add tests for missing sbb transforms; NFCSanjay Patel2017-06-261-0/+80
| | | | llvm-svn: 306337
* [X86][SSE] Check SSE2/SSE3 codegen tests on i686 and x86_64Simon Pilgrim2017-06-262-184/+456
| | | | llvm-svn: 306314
* [X86][SSE] Add combine tests for PMULDQ/PMULUDQSimon Pilgrim2017-06-261-0/+110
| | | | | | Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations llvm-svn: 306302
* [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.Ahmed Bougacha2017-06-261-8/+8
| | | | | | | | | | | | The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299
* [X86] Add test case for PR15981Simon Pilgrim2017-06-261-0/+59
| | | | llvm-svn: 306296
* [x86] transform vector inc/dec to use -1 constant (PR33483)Sanjay Patel2017-06-2620-1605/+1714
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289
* [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-06-261-0/+59
| | | | | | | | X86InterleavedAccess test. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306286
* This reverts commit r306272.Serguei Katkov2017-06-262-54/+4
| | | | | | | | Revert "[MBP] do not rotate loop if it creates extra branch" It breaks the sanitizer build bots. Need to fix this. llvm-svn: 306276
* [MBP] do not rotate loop if it creates extra branchSerguei Katkov2017-06-262-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a last fix for the corner case of PR32214. Actually this is not really corner case in general. We should not do a loop rotation if we create an additional branch due to it. Consider the case where we have a loop chain H, M, B, C , where H is header with viable fallthrough from pre-header and exit from the loop M - some middle block B - backedge to Header but with exit from the loop also. C - some cold block of the loop. Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch. Let's compute the change in number of branches: +1 branch from pre-header to header -1 branch from header to exit +1 branch from header to middle block if there is such -1 branch from cold bock to header if there is one So if C is not a predecessor of H then we introduce extra branch. This change actually prohibits rotation of the loop if both true 1) Best Exit has next element in chain as successor. 2) Last element in chain is not a predecessor of first element of chain. Reviewers: iteratee, xur Reviewed By: iteratee Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34271 llvm-svn: 306272
* [X86] Add test case for PR15705Simon Pilgrim2017-06-251-0/+48
| | | | llvm-svn: 306246
* AVX-512: Fixed a crash during legalization of <3 x i8> typeElena Demikhovsky2017-06-251-0/+31
| | | | | | | | | The compiler fails with assertion during legalization of SETCC for <3 x i8> operands. The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>. Differential Revision: https://reviews.llvm.org/D34503 llvm-svn: 306242
* [GlobalISel][X86] Support vector type G_EXTRACT selection.Igor Breger2017-06-252-0/+207
| | | | | | | | | | | | | | | | Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240
* Add bitcast store-merge test.Nirav Dave2017-06-231-2/+29
| | | | llvm-svn: 306158
* [x86] fix value types for SBB transform (PR33560)Sanjay Patel2017-06-231-0/+25
| | | | | | | | | | | I'm not sure yet why this wouldn't fail in the simple case, but clearly I used the wrong value type with: https://reviews.llvm.org/rL306040 ...and the bug manifests with: https://bugs.llvm.org/show_bug.cgi?id=33560 llvm-svn: 306139
* [X86][AVX] Regenerate i256 bitcasted store testSimon Pilgrim2017-06-231-6/+20
| | | | | | Check on slow/fast unaligned memory targets llvm-svn: 306138
* Regenerate extract-store.ll testsSimon Pilgrim2017-06-231-6/+123
| | | | llvm-svn: 306131
* [x86] auto-generate complete checks; NFCSanjay Patel2017-06-231-41/+93
| | | | llvm-svn: 306114
* [x86] auto-generate complete checks; NFCSanjay Patel2017-06-231-85/+212
| | | | llvm-svn: 306113
* [x86] remove overridden target settings in test; NFCSanjay Patel2017-06-231-2/+0
| | | | | | r306109 was supposed to make this change, but I committed the wrong version. llvm-svn: 306110
* [x86] rename test file and auto-generate complete checks; NFCSanjay Patel2017-06-232-23/+35
| | | | | | | The command-line params override the target setting in the file itself, so delete that. Also, remove the cpu and arch because those don't matter and neither does the OS specification in the triple. llvm-svn: 306109
* [X86][AVX] Extended vector average testsSimon Pilgrim2017-06-231-411/+917
| | | | | | Added AVX1 tests and merged AVX1/AVX2/AVX512 checks where possible llvm-svn: 306107
OpenPOWER on IntegriCloud