summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][SSE] Regenerate sqrt testsSimon Pilgrim2017-01-221-13/+15
| | | | llvm-svn: 292764
* Fix test name. NFCI.Simon Pilgrim2017-01-221-6/+6
| | | | llvm-svn: 292763
* Fix some broken CHECK lines.Benjamin Kramer2017-01-2214-23/+23
| | | | | | The colon is important. llvm-svn: 292761
* [x86] avoid crashing with illegal vector type (PR31672)Sanjay Patel2017-01-221-0/+133
| | | | | | https://llvm.org/bugs/show_bug.cgi?id=31672 llvm-svn: 292758
* [X86] Don't allow commuting to form phsub operations.Craig Topper2017-01-211-4/+29
| | | | | | Fixes PR31714. llvm-svn: 292713
* [X86] Add test cases that show bad commuting being allowed to create a phsub ↵Craig Topper2017-01-211-0/+33
| | | | | | operation. llvm-svn: 292712
* [NVPTX] Add explicit check for llvm.sqrt.f32 to intrinsics.ll.Justin Lebar2017-01-211-0/+8
| | | | | | Test-only change. llvm-svn: 292690
* [ValueTracking] recognize variations of 'clamp' to improve codegen (PR31693)Sanjay Patel2017-01-201-18/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By enhancing value tracking, we allow an existing min/max canonicalization to kick in and improve codegen for several targets that have min/max instructions. Unfortunately, recognizing min/max in value tracking may cause us to hit a hack in InstCombiner::visitICmpInst() more often: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html ...but I'm hoping we can remove that soon. Correctness proofs based on Alive: Name: smaxmin Pre: C1 < C2 %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp slt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp sgt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: sminmax Pre: C1 > C2 %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp sgt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp slt i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: smaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: sminmax Done: 1 Optimization is correct! Name: umaxmin Pre: C1 u< C2 %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ult i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ugt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: uminmax Pre: C1 u> C2 %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ugt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ult i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: umaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: uminmax Done: 1 Optimization is correct! llvm-svn: 292660
* AMDGPU/R600: Serialize vector trunc stores to private ASJan Vesely2017-01-201-15/+15
| | | | | | | | | | | Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
* [WebAssembly] Don't create bitcast-wrappers for varargs.Dan Gohman2017-01-201-0/+17
| | | | | | | | | WebAssembly varargs functions use a significantly different ABI than non-varargs functions, and the current code in WebAssemblyFixFunctionBitcasts doesn't handle that difference. For now, just avoid creating wrapper functions in the presence of varargs. llvm-svn: 292645
* [x86] add tests to show missed min/max vector codegen (PR31693)Sanjay Patel2017-01-201-12/+73
| | | | llvm-svn: 292640
* AArch64LoadStoreOptimizer: Update kill flags when merging storesMatthias Braun2017-01-202-133/+132
| | | | | | | | | | | | | | Kill flags need to be updated correctly when moving stores up/down to form store pair instructions. Those invalid flags have been ignored before but as of r290014 they are recognized when using -mllvm -verify-machineinstrs. Also simplifies test/CodeGen/AArch64/ldst-opt-dbg-limit.mir, renames it to ldst-opt.mir test and adds a new tests for this change. Differential Revision: https://reviews.llvm.org/D28875 llvm-svn: 292625
* [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".Wei Mi2017-01-203-0/+454
| | | | | | | | | | | | | | | The recommit fixes a bug related with live interval update after the partial redundent copy is moved. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 llvm-svn: 292621
* [Thumb] Add support for tMUL in the compare instruction peephole optimizer.Sjoerd Meijer2017-01-202-0/+186
| | | | | | | | | | | | | | | | | We also want to optimise tests like this: return a*b == 0. The MULS instruction is flag setting, so we don't need the CMP instruction but can instead branch on the result of the MULS. The generated instructions sequence for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the boolean values resulting from the select instruction, but these MOVS instructions are flag setting and were thus preventing this optimisation. Now we first reorder and move the MULS to before the CMP and generate sequence MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the MULS and MOVS is safe to do because the subsequent MOVS instructions just set the CPSR register and don't use it, i.e. the CPSR is dead. Differential Revision: https://reviews.llvm.org/D27990 llvm-svn: 292608
* [AVX-512] Fix a couple test cases to not pass an undef mask to gather ↵Craig Topper2017-01-201-6/+14
| | | | | | intrinsic. This could break if any future optimizations taken advantage of the undef. llvm-svn: 292585
* [test] Remove a unwanted match for `XFAIL:`.Greg Parker2017-01-201-1/+1
| | | | llvm-svn: 292567
* [AArch64][GlobalISel] Widen scalar int->fp conversions.Ahmed Bougacha2017-01-201-6/+12
| | | | | | | It's incorrect to ignore the higher bits of the integer source. Teach the legalizer how to widen it. llvm-svn: 292563
* [AMDGPU] Prevent spills before exec mask is restoredStanislav Mekhanoshin2017-01-201-0/+78
| | | | | | | | | | | | | Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554
* [AArch64][GlobalISel] Split FP conversion legalizer tests. NFC.Ahmed Bougacha2017-01-203-88/+426
| | | | | | | | | Big functions with large vreg # are quite unwieldy to update. Change it to have one function per test (it does increase boilerplate, but makes the core hopefully more readable and maintanable). llvm-svn: 292552
* [AArch64][GlobalISel] Split legalizer combine tests. NFC.Ahmed Bougacha2017-01-201-69/+86
| | | | | | | | | | | Big functions with large vreg # are quite unwieldy to update. This test also relied on legal s8 operations which we're considering removing. Change it to have one function per test (it does increase boilerplate, but makes the core hopefully more readable and maintanable), and use 100% legal operations throughout. llvm-svn: 292551
* [MIRParser] Allow generic register specification on operand.Ahmed Bougacha2017-01-201-0/+3
| | | | | | | | This completes r292321 by adding support for generic registers, e.g.: %2:_(s32) = G_ADD %0, %1 llvm-svn: 292550
* AArch64: fall back to DAG ISel for inline assembly.Tim Northover2017-01-191-0/+10
| | | | | | | We can't currently handle "calls" to inlineasm strings so it's better to let the DAG handle it than generate rubbish. llvm-svn: 292540
* [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)Simon Pilgrim2017-01-191-12/+2
| | | | | | | | | | | | This patch improves the knownbits logic for unsigned integer min/max opcodes. For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits. This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set. Differential Revision: https://reviews.llvm.org/D28853 llvm-svn: 292528
* [XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify ↵Serge Rogatch2017-01-192-0/+12
| | | | | | | | | | | | | | | | | | | such problem earlier Summary: Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future. This patch is one of a series, see also - https://reviews.llvm.org/D28623 Reviewers: rengolin, dberris Reviewed By: dberris Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown Differential Revision: https://reviews.llvm.org/D28624 llvm-svn: 292516
* [X86][SSE] Attempt to pre-truncate arithmetic operations that have already ↵Simon Pilgrim2017-01-191-229/+81
| | | | | | | | been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292493
* [X86][SSE] Added tests for pre-truncating arithmetic operations that have ↵Simon Pilgrim2017-01-191-0/+127
| | | | | | | | already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292487
* [DAG] Don't increase SDNodeOrder for dbg.value/declare.Mikael Holmen2017-01-192-0/+193
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The SDNodeOrder is saved in the IROrder field in the SDNode, and this field may affects scheduling. Thus, letting dbg.value/declare increase the order numbers may in turn affect scheduling. Because of this change we also need to update the code deciding when dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues. Dbg values now have the same order as the SDNode they are connected to, not the following orders. Test cases provided by Florian Hahn. Reviewers: bogner, aprantl, sunfish, atrick Reviewed By: atrick Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D25318 llvm-svn: 292485
* [GlobalISel] Pointers are legal operands for G_SELECT on AArch64Kristof Beyls2017-01-191-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D28805 llvm-svn: 292481
* Recommiting unsigned saturation with a bugfix.Elena Demikhovsky2017-01-192-0/+253
| | | | | | | A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479
* GlobalISel: Implement widening for shiftsJustin Bogner2017-01-191-0/+44
| | | | llvm-svn: 292476
* [AVX-512] Add test cases that show where we are using two subvector inserts ↵Craig Topper2017-01-192-0/+58
| | | | | | | | | | to broadcast a 128-bit subvector into a 512-bit vector. We'd be better off using something like SHUFF32X4. If the subvector comes from a load, we convert to SUBV_BROADCAST and use a broadcast instruction. But if there is no load we keep the inserts. I think we should create the SUBV_BROADCAST even without the load and let isel use the fallback patterns that are used if the load can't be folded. This will use the SHUFF32X4 or similar instruction for the 128-bit into 512-bit case and a single insert for 128 into 256 or 256 into 512. This should be fixed so subvector broadcast intrinsics can be replaced with native IR since some of those currently lower directly to SHUFF32X4. llvm-svn: 292475
* [AVX-512] Support ADD/SUB/MUL of mask vectorsCraig Topper2017-01-192-0/+204
| | | | | | | | | | | | | | | | | Summary: Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND. We already do this for scalar i1 operations so I just extended it to vectors of i1. Reviewers: zvi, delena Reviewed By: delena Subscribers: guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D28888 llvm-svn: 292474
* AMDGPU: Disable some fneg combines unless nszMatt Arsenault2017-01-192-41/+106
| | | | | | | | | | | | For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
* AMDGPU: Remove modifiers from v_div_scale_*Matt Arsenault2017-01-192-3/+5
| | | | | | | | They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472
* [AVX-512] Use VSHUF instructions instead of two inserts as fallback for ↵Craig Topper2017-01-191-12/+6
| | | | | | subvector broadcasts that can't fold the load. llvm-svn: 292466
* [AVX-512] Add additional test cases for broadcast intrinsics that ↵Craig Topper2017-01-194-0/+158
| | | | | | demonstates that we don't fold the loads to use a broadcast instruction. llvm-svn: 292465
* GlobalISel: Implement narrowing for G_LOADJustin Bogner2017-01-191-0/+10
| | | | llvm-svn: 292461
* Use an actual valid register in testMatthias Braun2017-01-191-2/+2
| | | | llvm-svn: 292459
* [NVPTX] Fix lowering of fp16 ISD::FNEG.Artem Belevich2017-01-191-0/+15
| | | | | | | | | There's no neg.f16 instruction, so negation has to be done via subtraction from zero. Differential Revision: https://reviews.llvm.org/D28876 llvm-svn: 292452
* Treat segment [B, E) as not overlapping block with boundaries [A, B)Krzysztof Parzyszek2017-01-181-0/+143
| | | | llvm-svn: 292446
* [Hexagon] Remove dead defs from the live set when expanding wstoresKrzysztof Parzyszek2017-01-181-0/+216
| | | | llvm-svn: 292445
* Revert r291670 because it introduces a crash.Michael Kuperstein2017-01-182-231/+0
| | | | | | | | | r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444
* [AArch64] Generate literals by the little endEvandro Menezes2017-01-1818-108/+108
| | | | | | | | | | | ARM seems to prefer that long literals be formed from their little end in order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on Cortex A57 and others (v. "Cortex A57 Software Optimisation Guide", section 4.14). Differential revision: https://reviews.llvm.org/D28697 llvm-svn: 292422
* [AMDGPU] Do not allow register coalescer to create big superregsStanislav Mekhanoshin2017-01-182-9/+80
| | | | | | | | | | | | | | | | Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
* GlobalISel: Implement narrowing for G_STOREJustin Bogner2017-01-181-0/+12
| | | | | | | Legalize stores of types that are too wide by breaking them up into sequences of smaller stores. llvm-svn: 292412
* Don't create a comdat group for a dropped def with initializerTeresa Johnson2017-01-181-0/+19
| | | | | | | | | | | | | | | | | | Non-prevailing weak/linkonce odr symbols will be dropped by ThinLTO to available_externally when possible. If they had an initializer in the global_ctors list, a comdat group was being created. This code already had logic to skip available_externally defs, but now the EliminateAvailableExternally pass will drop these symbols to declarations earlier. Change the check to skip all declarations for linker (which includes available_externally along with declarations). Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28737 llvm-svn: 292408
* Fixed parser error on windows shell evaluation of RUN script lineSimon Pilgrim2017-01-181-3/+1
| | | | llvm-svn: 292363
* [X86][SSE] Simplify umax knownbits testSimon Pilgrim2017-01-181-1/+1
| | | | | | combineSRA doesn't detect sign bits splats that it does itself so just use -1 as the demanded input so that its already splatted llvm-svn: 292361
* [X86] Improve mul combine for negative multiplayer (2^c - 1)Michael Zuckerman2017-01-181-0/+230
| | | | | | | | | | | This patch improves the mul instruction combine function (combineMul) by adding new layer of logic. In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective. Differential Revision: https://reviews.llvm.org/D28232 llvm-svn: 292358
* Revert "[XRay][Arm] Repair XRay table emission on Arm32 and add tests to ↵Renato Golin2017-01-182-12/+0
| | | | | | | | | | | identify such problem earlier" This reverts commit r292210, as it broke the Thumb buldbot with: clang-5.0: error: the clang compiler does not support '-fxray-instrument on thumbv7-unknown-linux-gnueabihf'. llvm-svn: 292357
OpenPOWER on IntegriCloud