summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombine] Make sure we check the ResNo from UADDO before combiningAmaury Sechet2017-06-111-0/+24
| | | | | | | | | | | | Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid. Reviewers: joerg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34088 llvm-svn: 305162
* [X86][SSE] Extended PR32368 to SSE/AVX1/AVX2Simon Pilgrim2017-06-101-8/+142
| | | | llvm-svn: 305154
* [X86][AVX512] Added test case for PR32368Simon Pilgrim2017-06-101-0/+19
| | | | llvm-svn: 305153
* AMDGPU : Fix ISA Version Definitions.Wei Ding2017-06-101-0/+13
| | | | | | Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137
* [PowerPC] add memcmp test with one constant operand and equality cmp; NFCSanjay Patel2017-06-091-3/+29
| | | | llvm-svn: 305131
* [AArch64] Add fallback in FastISel fp16 conversionsI-Jui (Ray) Sung2017-06-091-0/+131
| | | | | | | | | | | | | | | | | Summary: - Fix assertion failures on F16 to/from int types in FastISel by falling back to regular ISel - Add a testcase of various conversion cases with FastISel (-O0) Reviewers: kristof.beyls, jmolloy, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D33734 llvm-svn: 305127
* [AMDGPU] Add intrinsics for alignbit and alignbyte instructionsStanislav Mekhanoshin2017-06-091-0/+23
| | | | | | Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098
* [X86][SSE] Add support for PACKSS nodes to faux shuffle extractionSimon Pilgrim2017-06-091-273/+265
| | | | | | If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle llvm-svn: 305091
* Reland "[SelectionDAG] Enable target specific vector scalarization of calls ↵Simon Dardis2017-06-094-24/+1697
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and returns" By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns. The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'. Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register. By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors. Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)". This patch enables the MIPS backend to take either form for vector types. The previous version of this patch had a "conditional move or jump depends on uninitialized value". Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D27845 llvm-svn: 305083
* [AMDGPU] Fix for issue in alloca to vector promotion passDavid Stuttard2017-06-091-0/+131
| | | | | | | | | | | | | | | Summary: Alloca promotion pass not dealing with non-canonical input Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical) Also added some test cases in non-canonical form to check that it no longer crashes Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31710 llvm-svn: 305079
* Prevent RemoveDeadNodes from deleted already deleted node.Nirav Dave2017-06-091-0/+83
| | | | | | | | | | | | | | | | | | | | | This prevents against assertion errors like PR32659 which occur from a replacement deleting a node after it's been added to the list argument of RemoveDeadNodes. The specific failure from PR32659 does not currently happen, but it is still potentially possible. The underlying cause is that the callers of the change dfunction builds up a list of nodes to delete after having moved their uses and it possible that a move of a later node will cause a previously deleted nodes to be deleted. Reviewers: bkramer, spatel, davide Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33731 llvm-svn: 305070
* [ARM] Add scheduling info for VFMSOliver Stannard2017-06-091-5/+86
| | | | | | | | | | The scalar VFMS instructions did not have scheduling information attached (but VFMA did), which was causing assertion failures with the Cortex-A57 scheduling model and -fp-contract=fast. Differential Revision: https://reviews.llvm.org/D34040 llvm-svn: 305064
* RegAllocPBQP: Do not assign reserved physical registerMatthias Braun2017-06-081-0/+35
| | | | | | | | | | | | | | | | (0) RegAllocPBQP: Since getRawAllocationOrder() may return a collection that includes reserved physical registers, iterate to find an un-reserved physical register. (1) VirtRegMap: Enforce the invariant: "no reserved physical registers" in assignVirt2Phys(). Previously, this was checked only after the fact in VirtRegRewriter::rewrite. (2) MachineVerifier: updated the test per MatzeB's review. (3) +testcase Patch by Nick Johnson<Nicholas.Paul.Johnson@deshawresearch.com>! Differential Revision: https://reviews.llvm.org/D33947 llvm-svn: 305016
* [Hexagon] Skip mux generation when predicate register is undefinedKrzysztof Parzyszek2017-06-081-0/+27
| | | | llvm-svn: 305014
* [CGP] don't expand a memcmp with nobuiltin attributeSanjay Patel2017-06-081-6/+15
| | | | | | | | | | | | | | | | This matches the behavior used in the SDAG when expanding memcmp. For reference, we're intentionally treating the earlier fortified call transforms differently after: https://bugs.llvm.org/show_bug.cgi?id=23093 https://reviews.llvm.org/rL233776 One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers: https://reviews.llvm.org/D19781 https://reviews.llvm.org/D19801 Differential Revision: https://reviews.llvm.org/D34043 llvm-svn: 305007
* AMDGPU: Use correct register names in inline assemblyMatt Arsenault2017-06-0814-401/+401
| | | | | | Fixes using physical registers in inline asm from clang. llvm-svn: 305004
* [PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64Guozhi Wei2017-06-082-14/+33
| | | | | | | | | | In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64. This patch fixed PR32442. Differential Revision: https://reviews.llvm.org/D31407 llvm-svn: 305001
* [AMDGPU] Force qsads instrs to use different dest register than source registersMark Searles2017-06-083-37/+68
| | | | | | | | The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes. Differential Revision: https://reviews.llvm.org/D33783 llvm-svn: 304998
* [Power9] Exploit vector integer extend instructionsZaara Syeda2017-06-081-0/+90
| | | | | | | | | | | | | | This patch adds build vector patterns to exploit the vector integer extend instructions: vextsb2w - Vector Extend Sign Byte To Word vextsb2d - Vector Extend Sign Byte To Doubleword vextsh2w - Vector Extend Sign Halfword To Word vextsh2d - Vector Extend Sign Halfword To Doubleword vextsw2d - Vector Extend Sign Word To Doubleword Differential Revision: https://reviews.llvm.org/D33510 llvm-svn: 304992
* [PowerPC] add memcmp test with nobuiltin attr; NFCSanjay Patel2017-06-081-0/+15
| | | | | | | | | | | In SDAG, we don't expand libcalls with a nobuiltin attribute. It's not clear if that's correct from the existing code comment: "Don't do the check if marked as nobuiltin for some reason." ...adding a test here either way to show that there is currently a different behavior implemented in the CGP-based expansion. llvm-svn: 304991
* [x86] remove unused param from tests; NFCSanjay Patel2017-06-081-10/+10
| | | | llvm-svn: 304989
* [CGP / PowerPC] avoid multi-block overhead for simple memcmp expansionSanjay Patel2017-06-081-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test diff for PowerPC shows we can better optimize if this case is one block. For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed cheap and SDAG can't optimize across blocks. Instead of this: _cmp_eq8: movq (%rdi), %rax cmpq (%rsi), %rax je LBB23_1 ## BB#2: ## %res_block movl $1, %ecx jmp LBB23_3 LBB23_1: xorl %ecx, %ecx LBB23_3: ## %endblock xorl %eax, %eax testl %ecx, %ecx sete %al retq We get this: cmp_eq8: movq (%rdi), %rcx xorl %eax, %eax cmpq (%rsi), %rcx sete %al retq And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we can lower the IR produced here optimally. Differential Revision: https://reviews.llvm.org/D34005 llvm-svn: 304987
* Add scheduler classes to integer/float horizontal operations.Andrew V. Tischenko2017-06-081-16/+16
| | | | | | | This patch will close PR32801. Differential Revision: https://reviews.llvm.org/D33203 llvm-svn: 304986
* [x86] add tests for memcmp expansion; NFCSanjay Patel2017-06-081-37/+293
| | | | | | | | | | | | We already had a test to demonstrate PR33325: https://bugs.llvm.org/show_bug.cgi?id=33325 I'm adding tests for general memcmp expansion (see D34005 / D33963) and: https://bugs.llvm.org/show_bug.cgi?id=33329 ...plus non-power-of-2 sizes, so we can see what that looks like currently or if expanded. llvm-svn: 304979
* This patch closes PR28513: an optimization of multiplication by different ↵Andrew V. Tischenko2017-06-084-360/+4253
| | | | | | | | constants. The initial patch was rejected: I fixed the issue and re-apply it. llvm-svn: 304972
* [ARM] GlobalISel: Add more tests. NFCDiana Picus2017-06-081-0/+149
| | | | | | | | Add a couple of tests to increase coverage for the TableGen'erated code, in particular for rules where 2 generic instructions may be combined into a single machine instruction. llvm-svn: 304971
* [Hexagon] Generate 'inbounds' GEPs in HexagonCommonGEPKrzysztof Parzyszek2017-06-071-0/+20
| | | | llvm-svn: 304937
* [CGP] avoid zext/trunc of a memcmp expansion compareSanjay Patel2017-06-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | This could be viewed as another shortcoming of the DAGCombiner: when both operands of a compare are zexted from the same source type, we should be able to compare the original types. The effect on PowerPC perf is likely unnoticeable, but there's a visible regression for x86 if we feed the suboptimal IR for memcmp expansion to the DAG: _cmp_eq4_zexted_to_i64: movl (%rdi), %ecx movl (%rsi), %edx xorl %eax, %eax cmpq %rdx, %rcx sete %al _cmp_eq4_better: movl (%rdi), %ecx xorl %eax, %eax cmpl (%rsi), %ecx sete %al llvm-svn: 304923
* [mips][dsp] Modify repl.ph to accept signed immediate valuesPetar Jovanovic2017-06-071-1/+11
| | | | | | | | | | | | Changed immediate type for repl.ph from uimm10 to simm10 as per the specs. Repl.qb still accepts uimm8. Both instructions now mimic the behaviour of GNU as. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D33594 llvm-svn: 304918
* [X86] Add test to demonstrate inefficient lowering of v48i8 shuffle.Guy Blank2017-06-071-0/+49
| | | | llvm-svn: 304915
* AMDGPU/GlobalISel: Mark 32-bit G_SELECT as legalTom Stellard2017-06-071-0/+28
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33949 llvm-svn: 304910
* [x86] avoid flipping sign bits for vector icmp by using known bitsSanjay Patel2017-06-072-104/+34
| | | | | | | | | | | | | | | | If we know that both operands of an unsigned integer vector comparison are non-negative, then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality integer vector compare predicate provided by SSE/AVX). We're intentionally not changing the condition code to signed in order to preserve the existing transforms that use min/max/psubus below here. This should solve PR33276: https://bugs.llvm.org/show_bug.cgi?id=33276 Differential Revision: https://reviews.llvm.org/D33862 llvm-svn: 304909
* [PowerPC] Eliminate integer compare instructions - vol. 5Nemanja Ivanovic2017-06-075-6/+567
| | | | | | | | Adds handling for i64 SETNE comparison (both sign and zero extended). Differential Revision: https://reviews.llvm.org/D33720 llvm-svn: 304907
* [mips] do not use FastISel when -mxgot is presentPetar Jovanovic2017-06-071-0/+3
| | | | | | | | | | | | | | | The clang compiler by default uses FastISel when invoked with -O0, which is also the default. In that case, passing of -mxgot does not get honored, i.e. the code path that is to deal with large got is not taken. Clang produces same output regardless of -mxgot being present or not. This change checks whether -mxgot is passed as an option, and turns off FastISel if it is. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D33593 llvm-svn: 304906
* [ARM] GlobalISel: Purge G_SEQUENCEDiana Picus2017-06-075-64/+46
| | | | | | | | | | | | | | | | | According to the commit message from r296921, G_MERGE_VALUES and G_INSERT are to be preferred over G_SEQUENCE. Therefore, stop generating G_SEQUENCE in the ARM backend and remove the code dealing with it. This boils down to the code breaking up double values for the soft float calling convention. Use G_MERGE_VALUES + G_UNMERGE_VALUES instead of G_SEQUENCE + G_EXTRACT for it. This maps very nicely to VMOVDRR + VMOVRRD and simplifies the code in the instruction selector. There's one occurence of G_SEQUENCE left in arm-irtranslator.ll, but that is part of the target-independent code for translating constant structs. Therefore, it is beyond the scope of this commit. llvm-svn: 304902
* [PowerPC] Eliminate integer compare instructions - vol. 3Nemanja Ivanovic2017-06-079-39/+798
| | | | | | | | Adds handling for i32 SETNE comparison (both sign and zero extended). Differential Revision: https://reviews.llvm.org/D33718 llvm-svn: 304901
* [ARM] GlobalISel: Support G_XORDiana Picus2017-06-074-0/+168
| | | | | | | | | Same as the other binary operators: - legalize to 32 bits - map to GPRs - select to EORrr via TableGen'erated code llvm-svn: 304898
* evert "[mips] Fix test mips64fpldst.ll with machine verifier enabled"Simon Dardis2017-06-078-19/+41
| | | | | | | This reverts commit r301394. It broke some internal buildbots, reverting while the issue is being investigated. llvm-svn: 304896
* [X86][SSE] Fix an issue with PEXTRW/PEXTRB indices during shuffle combiningSimon Pilgrim2017-06-071-40/+4
| | | | | | We were checking that the index was in range of the destination vector type, not the (larger) source vector type llvm-svn: 304894
* [ARM] GlobalISel: Support G_ORDiana Picus2017-06-074-0/+168
| | | | | | | | | Same as the other binary operators: - legalize to 32 bits - map to GPRs - select ORRrr thanks to TableGen'erated code llvm-svn: 304890
* [ARM] GlobalISel: Support G_ANDDiana Picus2017-06-074-0/+170
| | | | | | | | | This is identical to the support for the other binary operators: - widen to s32 - map into GPR - select ANDrr (via TableGen'erated code) llvm-svn: 304885
* [CGP / PowerPC] use direct compares if there's only one load per block in ↵Sanjay Patel2017-06-071-20/+14
| | | | | | | | | | | | | | | | | memcmp() expansion I'd like to enable CGP memcmp expansion for x86, but the output from CGP would regress the special cases (memcmp(x,y,N) != 0 for N=1,2,4,8,16,32 bytes) that we already handle. I'm not sure if we'll actually be able to produce the optimal code given the block-at-a-time limitation in the DAG. We might have to just avoid those special-cases here in CGP. But regardless of that, I think this is a win for the more general cases. http://rise4fun.com/Alive/cbQ Differential Revision: https://reviews.llvm.org/D33963 llvm-svn: 304849
* [PowerPC] auto-generate full checks and increase test coverageSanjay Patel2017-06-061-77/+160
| | | | | | | 3 of the tests were testing exactly the same thing: memcmp(x, y, 16) != 0. I changed that to test 4, 7, and 16 bytes, so we can see how those differ. llvm-svn: 304838
* Added tests for X86InterleavedStore.Evgeny Stupachenko2017-06-061-0/+93
| | | | | | | | | | Reviewers: RKSimon, DavidKreitzer Differential Revision: https://reviews.llvm.org/D33684 Patch by: Aleen Farhana <Farhana.aleen@gmail.com> llvm-svn: 304834
* llc: Add ability to parse mir from stdinMatthias Braun2017-06-061-0/+20
| | | | | | | | - Add -x <language> option to switch between IR and MIR inputs. - Change MIR parser to read from stdin when filename is '-'. - Add a simple mir roundtrip test. llvm-svn: 304825
* Fix PR23384 (part 3 of 3)Evgeny Stupachenko2017-06-067-67/+67
| | | | | | | | | | | | | Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824
* MIRPrinter: Avoid assert() when printing empty INLINEASM strings.Matthias Braun2017-06-061-0/+12
| | | | | | | | | | | CodeGen uses MO_ExternalSymbol to represent the inline assembly strings. Empty strings for symbol names appear to be invalid. For now just special case the output code to avoid hitting an `assert()` in `printLLVMNameWithoutPrefix()`. This fixes https://llvm.org/PR33317 llvm-svn: 304815
* [mips] Add madd4 subtarget featurePetar Jovanovic2017-06-061-222/+233
| | | | | | | | | | | Addition of a feature and a predicate used to control generation of madd.fmt and similar instructions. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D33400 llvm-svn: 304801
* [X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it ↵Simon Pilgrim2017-06-061-6/+30
| | | | | | | | non-temporal (PR32744) Extension to D33728 llvm-svn: 304798
* AMDGPU/GlobalISel: Mark 32-bit G_ICMP as legalTom Stellard2017-06-061-0/+24
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33890 llvm-svn: 304797
OpenPOWER on IntegriCloud