summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [AVX-512] Add more integer vector comparison tests with loads. Some of these ↵Craig Topper2016-09-091-0/+198
| | | | | | | | show opportunities where we can commute to fold loads. Commutes will be added in a followup commit. llvm-svn: 281012
* [X86] Add more baseline tests for "irregular" shuffles. NFC.Michael Kuperstein2016-09-091-13/+1136
| | | | | | | This adds more tests for shuffles where the output width does not match the input width and/or the output is generated from more than two inputs. llvm-svn: 281005
* Win64: Don't use REX prefix for direct tail callsHans Wennborg2016-09-083-3/+3
| | | | | | | | | | The REX prefix should be used on indirect jmps, but not direct ones. For direct jumps, the unwinder looks at the offset to determine if it's inside the current function. Differential Revision: https://reviews.llvm.org/D24359 llvm-svn: 281003
* [RDF] Further improve handling of multiple phis reached from shadowsKrzysztof Parzyszek2016-09-081-0/+40
| | | | llvm-svn: 280987
* [Hexagon] Expand sext- and zextloads of vector types, not just extloadsKrzysztof Parzyszek2016-09-081-0/+10
| | | | | | Recent change exposed this issue, breaking the Hexagon buildbots. llvm-svn: 280973
* AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32Matt Arsenault2016-09-082-0/+32
| | | | llvm-svn: 280972
* AMDGPU: Support commuting with immediate in src0Matt Arsenault2016-09-0830-79/+80
| | | | llvm-svn: 280970
* Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"Renato Golin2016-09-082-48/+0
| | | | | | | | | | And associated commits, as they broke the Thumb bots. This reverts commit r280935. This reverts commit r280891. This reverts commit r280888. llvm-svn: 280967
* [SDAGBuilder] Don't create a binary tree for switches in minsize modeJames Molloy2016-09-081-0/+34
| | | | | | This bloats codesize - all of the non-leaf nodes are extra code. llvm-svn: 280932
* [Thumb1] AND with a constant operand can be converted into BICJames Molloy2016-09-081-0/+18
| | | | | | | So model the cost of materializing the constant operand C as the minimum of C and ~C. llvm-svn: 280929
* [Thumb1] Fix cost calculation for complemented immediatesJames Molloy2016-09-081-0/+21
| | | | | | | | | | | | Materializing something like "-3" can be done as 2 instructions: MOV r0, #3 MVN r0, r0 This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256. There were no tests failing after changing this... :/ llvm-svn: 280928
* [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and ↵Simon Pilgrim2016-09-083-14/+10
| | | | | | | | | | | | SimplifyDemandedBits Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element. This is an initial step towards determining the sign bits of a vector (PR29079). Differential Revision: https://reviews.llvm.org/D24253 llvm-svn: 280927
* [DAGCombiner] Enable AND combines of splatted constant vectorsSimon Pilgrim2016-09-081-4/+2
| | | | | | | | Allow AND combines to use a vector splatted constant as well as a constant scalar. Preliminary part of D24253. llvm-svn: 280926
* Revert "[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)"Pablo Barrio2016-09-081-43/+0
| | | | | | | | | | This reverts commit r280808. It is possible that this change results in an infinite loop. This is causing timeouts in some tests on ARM, and a Chromebook bot is failing. llvm-svn: 280918
* [CGP] Be less conservative about tail-duplicating a ret to allow tail callsMichael Kuperstein2016-09-081-0/+20
| | | | | | | | | | | | | | | | | CGP tail-duplicates rets into blocks that end with a call that feed the ret. This puts the call in tail position, potentially allowing the DAG builder to lower it as a tail call. To avoid tail duplication in cases where we won't form the tail call, CGP tried to predict whether this is going to be possible, and avoids doing it when lowering as a tail call will definitely fail. However, it was being too conservative by always throwing away calls to functions with a signext/zeroext attribute on the return type. Instead, we can use the same logic the builder uses to determine whether the attributes work out. Differential Revision: https://reviews.llvm.org/D24315 llvm-svn: 280894
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-082-0/+48
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: 1. https://reviews.llvm.org/D23932 (Clang test) 2. https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 280888
* Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.Elena Demikhovsky2016-09-071-0/+33
| | | | | | | | | | | https://llvm.org/bugs/show_bug.cgi?id=29058. While node legalization we tried to legalize its operands. If an operand node is replaced during legalization the user node may be destroyed. Differential Revision: https://reviews.llvm.org/D24244 llvm-svn: 280862
* [RDF] Fix liveness analysis for phi nodes with shadow usesKrzysztof Parzyszek2016-09-071-0/+64
| | | | | | | | Shadow uses need to be analyzed together, since each individual shadow will only have a partial reaching def. All shadows together may cover a given register ref, while each individual shadow may not. llvm-svn: 280855
* Rename test pr30298.ll to shrink_vmul_sse.ll, to make the name more ↵Wei Mi2016-09-071-0/+4
| | | | | | | | meaningful, NFC. Add PR number and comment in pr30298.ll to explain what is testing. llvm-svn: 280843
* Don't reduce the width of vector mul if the target doesn't support SSE2.Wei Mi2016-09-071-0/+43
| | | | | | | | | The patch is to fix PR30298, which is caused by rL272694. The solution is to bail out if the target has no SSE2. Differential Revision: https://reviews.llvm.org/D24288 llvm-svn: 280837
* Add more triple to conditional-tailcall.ll testHans Wennborg2016-09-071-1/+1
| | | | llvm-svn: 280835
* CodeGen: ensure that libcalls are always AAPCS CCSaleem Abdulrasool2016-09-072-2/+2
| | | | | | | The original commit was too aggressive about marking LibCalls as AAPCS. The libcalls contain libc/libm/libunwind calls which are not AAPCS, but C. llvm-svn: 280833
* X86: Fold tail calls into conditional branches where possible (PR26302)Hans Wennborg2016-09-071-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte. Example: define void @f(i32 %x, i32 %y) { entry: %p = icmp eq i32 %x, %y br i1 %p, label %bb1, label %bb2 bb1: tail call void @foo() ret void bb2: tail call void @bar() ret void } before: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne .LBB0_2 jmp foo .LBB0_2: jmp bar after: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne bar .LBB0_1: jmp foo I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter. This patch only does 32-bit, but 64-bit would work similarly. Differential Revision: https://reviews.llvm.org/D24108 llvm-svn: 280832
* AMDGPU: Add hidden kernel arguments to runtime metadataYaxun Liu2016-09-071-31/+1326
| | | | | | | | OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 llvm-svn: 280829
* Fix typo in test - it should be masking bits0-15 not bit16Simon Pilgrim2016-09-071-1/+1
| | | | llvm-svn: 280816
* [X86][SSE] Added or combine tests for known bits of vectorsSimon Pilgrim2016-09-071-0/+51
| | | | | | Part of the yak shaving for D24253 llvm-svn: 280813
* [X86][SSE] Added and+or+zext combine tests for known bits of vectorsSimon Pilgrim2016-09-071-0/+32
| | | | | | Part of the yak shaving for D24253 llvm-svn: 280810
* [X86][SSE] Added and+or combine tests currently failing with vectorsSimon Pilgrim2016-09-071-1/+28
| | | | | | | | (and (or x, C), D) -> D if (C & D) == D Part of the yak shaving for D24253 llvm-svn: 280809
* [ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)Pablo Barrio2016-09-071-0/+43
| | | | | | | | | | | | | | | Summary: This saves a library call to __aeabi_uidivmod. However, the processor must feature hardware division in order to benefit from the transformation. Reviewers: scott-0, jmolloy, compnerd, rengolin Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits Differential Revision: https://reviews.llvm.org/D24133 llvm-svn: 280808
* [mips] Disable the TImode shift libcalls for 32-bit targets.Vasileios Kalintiris2016-09-073-9/+9
| | | | | | | | | | | | | | Summary: The o32 ABI doesn't not support the TImode helpers. For the time being, disable just the shift libcalls as they break recursive builds on MIPS. Reviewers: sdardis Subscribers: llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D24259 llvm-svn: 280798
* [PowerPC] Fix address-offset folding for plain addiHal Finkel2016-09-071-0/+40
| | | | | | | | | | | | | | | | | | | | | | | When folding an addi into a memory access that can take an immediate offset, we were implicitly assuming that the existing offset was zero. This was incorrect. If we're dealing with an addi with a plain constant, we can add it to the existing offset (assuming that doesn't overflow the immediate, etc.), but if we have anything else (i.e. something that will become a relocation expression), we'll go back to requiring the existing immediate offset to be zero (because we don't know what the requirements on that relocation expression might be - e.g. maybe it is paired with some addis in some relevant way). On the other hand, when dealing with a plain addi with a regular constant immediate, the alignment restrictions (from the TOC base pointer, etc.) are irrelevant. I've added the test case from PR30280, which demonstrated the bug, but also demonstrates a missed optimization opportunity (i.e. we don't need the memory accesses at all). Fixes PR30280. llvm-svn: 280789
* AVX512F: FMA intrinsic + FNEG - sequence optimizationElena Demikhovsky2016-09-071-20/+15
| | | | | | | | | | | The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set. FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))). It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead. I added pattern match for integer XOR. Differential Revision: https://reviews.llvm.org/D24221 llvm-svn: 280785
* [AVX-512] Add support for commuting masked instructions in ↵Craig Topper2016-09-071-0/+52
| | | | | | findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input. llvm-svn: 280781
* Revert "CodeGen: ensure that libcalls are always AAPCS CC"Saleem Abdulrasool2016-09-075-176/+171
| | | | | | | This reverts SVN r280683. Revert until I figure out why this is breaking lli tests. llvm-svn: 280778
* [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)Hal Finkel2016-09-061-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I might have called this "r246507, the sequel". It fixes the same issue, as the issue has cropped up in a few more places. The underlying problem is that isSetCCEquivalent can pick up select_cc nodes with a result type that is not legal for a setcc node to have, and if we use that type to create new setcc nodes, nothing fixes that (and so we've violated the contract that the infrastructure has with the backend regarding setcc node types). Fixes PR30276. For convenience, here's the commit message from r246507, which explains the problem is greater detail: [DAGCombine] Fixup SETCC legality checking SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and|or node are actual setcc nodes, then this is not an issue (because the and|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 280767
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-0/+190
| | | | | | - Add missing test llvm-svn: 280749
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-0618-62/+299
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU : Add XNACK feature to GPUs that support it.Wei Ding2016-09-061-1/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D24276 llvm-svn: 280742
* [RDF] Ignore undef use operandsKrzysztof Parzyszek2016-09-061-0/+55
| | | | llvm-svn: 280717
* [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ↵Simon Pilgrim2016-09-061-9/+4
| | | | | | | | | | | | ), Idx ) -> In If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector. This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes. Differential Revision: https://reviews.llvm.org/D24254 llvm-svn: 280715
* [mips] Tighten FastISel restrictionsSimon Dardis2016-09-061-0/+14
| | | | | | | | | | | | | | | | | | LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower arguments assuming that it was using the paired 32bit registers to perform operations for f64. This mode of operation is not supported for MIPSR6. This patch resolves the reported issue by adding additional checks for unsupported floating point unit configuration. Thanks to mike.k for reporting this issue! Reviewers: seanbruno, vkalintiris Differential Review: https://reviews.llvm.org/D23795 llvm-svn: 280706
* [PPC] Claim stack frame before storing into it, if no red zone is presentKrzysztof Parzyszek2016-09-065-21/+29
| | | | | | | | | | | | | Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it there is no guarantee that this part of the stack will not be modified by any interrupt. To avoid this, make sure to claim the stack frame first before storing into it. This fixes https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24093 llvm-svn: 280705
* [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.Craig Topper2016-09-061-3/+2
| | | | | | We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type. llvm-svn: 280696
* [AVX-512] Add a test case to show that we don't select masked vpermi2ps when ↵Craig Topper2016-09-061-4/+17
| | | | | | | | | | the index operand comes from a bitcast. It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG. Taken from optimized output from clang's test case. llvm-svn: 280695
* ARM: workaround bundled operation predicationSaleem Abdulrasool2016-09-061-0/+24
| | | | | | | | | | | | | | | | | | | | | | This is a Windows ARM specific issue. If the code path in the if conversion ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up with a bundle to ensure that the mov.w/mov.t pair is not split up. This is normally fine, however, if the branch is also predicated, then we end up trying to predicate the bundle. For now, report a bundle as being unpredicatable. Although this is false, this would trigger a failure case previously anyways, so this is no worse. That is, there should not be any code which would previously have been if converted and predicated which would not be now. Under certain circumstances, it may be possible to "predicate the bundle". This would require scanning all bundle instructions, and ensure that the bundle contains only predicatable instructions, and converting the bundle into an IT block sequence. If the bundle is larger than the maximal IT block length (4 instructions), it would require materializing multiple IT blocks from the single bundle. llvm-svn: 280689
* [AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.Craig Topper2016-09-063-17/+16
| | | | llvm-svn: 280684
* CodeGen: ensure that libcalls are always AAPCS CCSaleem Abdulrasool2016-09-065-171/+176
| | | | | | | | | | | | | All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM AAPCS VFP CC hosts. Tweak the default initialisation to ARM AAPCS CC rather than C CC for ARM/thumb targets. The changes to the tests are necessary to ensure that the calling convention for the lowered library calls are honoured. Furthermore, these adjustments cause certain branch invocations to change to branch-and-link since the returned value needs to be moved across registers (d0 -> r0, r1). llvm-svn: 280683
* [AVX-512] Teach fastisel load/store handling to use EVEX encoded ↵Craig Topper2016-09-052-90/+314
| | | | | | | | instructions for 128/256-bit vectors and scalar single/double. Still need to fix the register classes to allow the extended range of registers. llvm-svn: 280682
* [X86] Update fast-isel store test to have more 256 and 512-bit test cases. ↵Craig Topper2016-09-051-18/+719
| | | | | | Add command lines for AVX and AVX512 feature sets. llvm-svn: 280681
* [X86] Update fast-isel vector load test to have more 256 and 512-bit test ↵Craig Topper2016-09-051-82/+651
| | | | | | cases. Add a command line for SKX features too. llvm-svn: 280680
OpenPOWER on IntegriCloud