summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Bring elf flags in sync with the specKonstantin Zhuravlyov2018-02-164-49/+124
| | | | | | | | | | | - Add MACH flags - Add XNACK flag - Add reserved flags - Minor cleanups in docs Differential Revision: https://reviews.llvm.org/D43356 llvm-svn: 325399
* AMDGPU: Bring processors and features in sync with the specKonstantin Zhuravlyov2018-02-166-23/+21
| | | | | | | | | | - Remove gfx800 - Make iceland gfx802 - Add xnack to gfx902 Differential Revision: https://reviews.llvm.org/D43355 llvm-svn: 325393
* [AArch64] Fix BITCAST lowering crashEvandro Menezes2018-02-161-0/+28
| | | | | | | | | | | The data type is assumed to be a vector, but sometimes it is not, leading to an assertion. Add simple test-case to verify this. Differential revision: https://reviews.llvm.org/D42599 llvm-svn: 325378
* AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elementsChangpeng Fang2018-02-167-27/+29
| | | | | | | | | | | | | | Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372
* [X86] Only reorder srl/and on last DAG combiner runCraig Topper2018-02-164-28/+33
| | | | | | | | | | This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST. We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering. Differential Revision: https://reviews.llvm.org/D43201 llvm-svn: 325371
* AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested ↵Changpeng Fang2018-02-161-39/+23
| | | | | | | | | | | | | | | | | | instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355
* [SelectionDAG] Enable SimplifyDemandedVectorElts support for simplifying ↵Simon Pilgrim2018-02-163-55/+31
| | | | | | | | | | shuffle masks Based off the DemandedElts mask the and UNDEF elements returned from the SimplifyDemandedVectorElts calls to the shuffle operands, we can attempt to simplify the shuffle mask. I had to be very conservative here as accepting post-legalized shuffle masks could cause problems for targets that legalize UNDEF mask elements back to inrange values (PowerPC), similarly combining to identity shuffle masks could cause too much UNDEF information to disappear for later combines. llvm-svn: 325354
* [X86][SSE] Allow float domain crossing if we are merging 2 or more shuffles ↵Simon Pilgrim2018-02-165-43/+26
| | | | | | and the root started as a float domain shuffle llvm-svn: 325349
* [mips] Remove codegen support from some 16 bit instructionsSimon Dardis2018-02-166-319/+266
| | | | | | | | | | | | These instructions conflict with their full length variants for the purposes of FastISel as they cannot be distingushed based on the number and type of operands and predicates. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D41285 llvm-svn: 325341
* [SelectionDAG] Add initial SimplifyDemandedVectorElts support for ↵Simon Pilgrim2018-02-161-4/+0
| | | | | | | | simplifying VSELECT operands This just adds a basic pass through - we can add constant selection mask handling in a future patch to fully match InstCombine. llvm-svn: 325338
* [ARM] Return true in enableMultipleCopyHints().Jonas Paulsson2018-02-167-147/+144
| | | | | | | | | | Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Eli Friedman llvm-svn: 325327
* [LegalizeDAG] Fix legalization of SETCCMikhail Maltsev2018-02-161-0/+23
| | | | | | | | | | | | | | | | | | | Summary: Currently when expanding a SETCC node into a SELECT_CC, LLVM uses an incorrect type for determining BooleanContent of the result. This patch fixes the issue. Fixes PR36079. Reviewers: rogfer01, javed.absar, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43282 llvm-svn: 325325
* [ARM] Materialise some boolean values to avoid a branchRoger Ferrer Ibanez2018-02-1620-713/+840
| | | | | | | | | | | | | This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form a != b ? x : 0 a == b ? 0 : x and that currently (e.g. in Thumb1) are emitted as branches. Differential Revision: https://reviews.llvm.org/D34515 llvm-svn: 325323
* [X86] Allow CMOVs of constants to be sign extended from i32.Craig Topper2018-02-162-19/+13
| | | | | | Sign extending i32 constants only requires a REX prefix as does widening the CMOV. This is cheaper than the explicit sign extend op. llvm-svn: 325318
* [X86] Don't zero_extend cmov up to i64, stop at i32.Craig Topper2018-02-161-1/+1
| | | | | | Zero extend from i32 to i64 is free. So extend from i16 to i32, and then use a free zero extend to finish. llvm-svn: 325317
* [X86] Add the test cases that were supposed to go with r325287.Craig Topper2018-02-161-0/+242
| | | | llvm-svn: 325306
* [AMDGPU] Combine adjacent waitcounts in a single strongest waitStanislav Mekhanoshin2018-02-151-2/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D43350 llvm-svn: 325299
* [X86] Enable BT to be used in place of TEST for single bit checks under optsizeCraig Topper2018-02-151-6/+6
| | | | | | | | We already do this for 64-bit when it won't fit into a 64-bit AND/TEST's immediate field. This adds an additional qualifier to do it for any single bit constant larger than 8-bits under optsize Differential Revision: https://reviews.llvm.org/D43346 llvm-svn: 325290
* [DAGCombiner] Call ExtendUsesToFormExtLoad in (zext (and (load)))->(and ↵Craig Topper2018-02-151-3/+4
| | | | | | | | | | | | | | (zextload)) even when the and does not have multiple uses Same for the sign extend case. Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses. This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free? Differential Revision: https://reviews.llvm.org/D43063 llvm-svn: 325289
* [ARM] Fix redirect in inline assembly testPablo Barrio2018-02-151-1/+1
| | | | | | | | | | | | Summary: Fix silly mistake in a test Reviewers: gkistanova, apilipenko Subscribers: javed.absar, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D43342 llvm-svn: 325283
* [X86] Add test cases for opportunities for using BT instead of TEST under ↵Craig Topper2018-02-151-0/+372
| | | | | | optsize. llvm-svn: 325277
* [X86][SSE] Add saturated truncation tests for storing illegal v8i8 typesSimon Pilgrim2018-02-153-0/+1605
| | | | | | Tests showing missing opportunities to use PACK instructions in cases where we need to truncate to illegal types for stores llvm-svn: 325270
* bpf: fix a bug in dag2dag optimization for loads from readonly sectionYonghong Song2018-02-151-0/+53
| | | | | | | | | | | | | | | | | | | | | The reference '&' is missing in the function parameter. If there are back-to-back optimizations in terms of dag node list like below: t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)+12](dereferenceable), zext from i32> t3, t43, undef:i64 t34: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64 The bug will trigger a segfault for the added test case remove_truncate_5.ll: LLVMSymbolizer: error reading file: No such file or directory #0 0x000000000241c4d9 (llc+0x241c4d9) #1 0x000000000241c56a (llc+0x241c56a) #2 0x000000000241aa50 (llc+0x241aa50) ... #22 0x0000000000fd5edf (llc+0xfd5edf) #23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05) #24 0x0000000000fd3e69 (llc+0xfd3e69) ... Segmentation fault Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 325267
* [Hexagon] Fix lowering of formal arguments after r324737Krzysztof Parzyszek2018-02-151-0/+12
| | | | | | Lowering of formal arguments needs to be aware of vararg functions. llvm-svn: 325255
* [ARM] Allow 64- and 128-bit types with 't' inline asm constraintPablo Barrio2018-02-152-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In LLVM, 't' selects a floating-point/SIMD register and only supports 32-bit values. This is appropriately documented in the LLVM Language Reference Manual. However, this behaviour diverges from that of GCC, where 't' selects the s0-s31 registers and its qX and dX variants depending on additional operand modifiers (q/P). For example, the following C code: #include <arm_neon.h> float32x4_t a, b, x; asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b)) results in the following assembly if compiled with GCC: vadd.f32 s0, s0, s1 whereas LLVM will show "error: couldn't allocate output register for constraint 't'", since a, b, x are 128-bit variables, not 32-bit. This patch extends the use of 't' to mean that of GCC, thus allowing selection of the lower Q vector regs and their D/S variants. For example, the earlier code will now compile as: vadd.f32 q0, q0, q1 This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result, as follows: asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b)) Since this is only an extension of functionality, existing code should not be affected by this change. Note that operand modifiers q/P are already supported by LLVM, so this patch should suffice to support inline assembly with constraint 't' originally built for GCC. Reviewers: grosbach, rengolin Reviewed By: rengolin Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42962 llvm-svn: 325244
* [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain ↵Simon Pilgrim2018-02-151-142/+13
| | | | | | | | PACKUS vXi32-vXi8 saturated truncation We can use PACKSS/PACKUS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKUSWB to [0,255]. llvm-svn: 325243
* [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain ↵Simon Pilgrim2018-02-151-163/+14
| | | | | | | | | | PACKSS vXi32-vXi8 saturated truncation We can use PACKSS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKSSWB to [-128,127]. PACKUS is a little trickier and will be handled in a separate patch. llvm-svn: 325235
* [SelectionDAG] Add initial implementation of ↵Simon Pilgrim2018-02-1510-312/+211
| | | | | | | | | | | | TargetLowering::SimplifyDemandedVectorElts This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation. Further features can be moved/added in future patches. Differential Revision: https://reviews.llvm.org/D42896 llvm-svn: 325232
* [ARM] f16 vcmp fixesSjoerd Meijer2018-02-151-24/+50
| | | | | | | | This adds f16 VCMP match rules and fixes the test cases. Differential Revision: https://reviews.llvm.org/D43291 llvm-svn: 325228
* [X86] Regnerate test to show scheduling comments. NFCCraig Topper2018-02-151-26/+26
| | | | | | These must have not been printing the last time the test was re-generated. llvm-svn: 325198
* [X86] Reverse the operand order of invlpga in at&t syntax to match gas.Craig Topper2018-02-141-11/+11
| | | | llvm-svn: 325190
* [X86] Don't swap argument on BOUND instruction in at&t syntax.Craig Topper2018-02-141-21/+21
| | | | | | | | | | | | The bound instruction does not have reversed operands in gas. Fixes PR27653. Patch by Maya Madhavan. Differential Revision: https://reviews.llvm.org/D43243 llvm-svn: 325178
* [Hexagon] Split HVX vector pair loads/stores, expand unaligned loadsKrzysztof Parzyszek2018-02-142-13/+39
| | | | llvm-svn: 325169
* [X86][SSE] truncateVectorWithPACK - Use src type instead of dst to select ↵Simon Pilgrim2018-02-145-15/+13
| | | | | | | | | | between PACK*SDW/PACK*SWB Try to keep PACK*SDW/PACK*SWB as wide as possible, this helps ComputeNumSignBits as it can only peek through bitcasts to wider types, pre-AVX2 codegen was already doing this as it could peek through bitcasts/subvectors more easily than AVX2 could through shuffles. This shouldn't affect existing results as calls to truncateVectorWithPACK ensure we have enough sign bits to pack to the same value, but it should make it possible to use truncateVectorWithPACK chains to perform saturation in combineTruncateWithSat with a future patch. llvm-svn: 325149
* [AMDGPU] Remove non-temporal flag from argument loadsStanislav Mekhanoshin2018-02-144-28/+28
| | | | | | | | | Kernel arguments likely read by all workitems and should not bypass cache. Fixes performance hit in sub-dword argument loads. Differential Revision: https://reviews.llvm.org/D43249 llvm-svn: 325146
* [x86] add baseline vector compare tests for D42948; NFCSanjay Patel2018-02-141-0/+353
| | | | llvm-svn: 325138
* [SelectionDAG][X86] Fix incorrect offset generated for VMASKMOVAlexander Ivchenko2018-02-142-5/+5
| | | | | | | | When creating high MachineMemOperand for MSTORE/MLOAD we supply it with the original PointerInfo, while the pointer itself had been incremented. The patch adds the proper offset to the PointerInfo. llvm-svn: 325135
* [ARM] f16 stack spill/reloadsSjoerd Meijer2018-02-141-0/+26
| | | | | | | | This adds support for handling f16 stack spills/reloads. Differential Revision: https://reviews.llvm.org/D43280 llvm-svn: 325130
* [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346Lama Saba2018-02-142-0/+3304
| | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Change-Id: Ic41aa9ade6512e0478db66e07e2fde41b4fb35f9 llvm-svn: 325128
* [X86][SSE] Relax type legality for combineTruncateWithSat PACKSS/PACKUS ↵Simon Pilgrim2018-02-142-496/+45
| | | | | | | | truncation While the AVX512 VTRUNCS/VTRUNCUS instructions require legal types, truncateVectorWithPACK handles cases with multiples of legal types through splitting/concatenation. So we just need to ensure that the src/dst scalar types are correct and leave truncateVectorWithPACK to handle the rest of it. llvm-svn: 325127
* [X86] Use EDI for retpoline when no scratch regs are leftReid Kleckner2018-02-132-9/+47
| | | | | | | | | | | | | | | | | | | | Summary: Instead of solving the hard problem of how to pass the callee to the indirect jump thunk without a register, just use a CSR. At a call boundary, there's nothing stopping us from using a CSR to hold the callee as long as we save and restore it in the prologue. Also, add tests for this mregparm=3 case. I wrote execution tests for __llvm_retpoline_push, but they never got committed as lit tests, either because I never rewrote them or because they got lost in merge conflicts. Reviewers: chandlerc, dwmw2 Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43214 llvm-svn: 325049
* [AMDGPU] Cleanup in memory legalizer tests. NFC.Stanislav Mekhanoshin2018-02-135-527/+531
| | | | llvm-svn: 325042
* [CodeGen] Print bundled instructions using the MIR syntax in -debug outputFrancis Visoiu Mistrih2018-02-131-3/+3
| | | | | | | | | | | | | | | | | Old syntax: BUNDLE implicit-def %r0, implicit-def %r1, implicit %r2 * %r0 = SOME_OP %r2 * %r1 = ANOTHER_OP internal %r0 New syntax: BUNDLE implicit-def %r0, implicit-def %r1, implicit %r2 { %r0 = SOME_OP %r2 %r1 = ANOTHER_OP internal %r0 } llvm-svn: 325032
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-1381-1183/+1183
| | | | | | Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030
* [DAGCombiner] Add one use check to fold (not (and x, y)) -> (or (not x), ↵Craig Topper2018-02-131-7/+2
| | | | | | | | | | | | | | | | | | | (not y)) Summary: If the and has an additional use we shouldn't invert it. That creates an additional instruction. While there add a one use check to the transform above that looked similar. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43225 llvm-svn: 325019
* [X86] Add combine to shrink 64-bit ands when one input is an any_extend and ↵Craig Topper2018-02-133-46/+45
| | | | | | | | | | | | | | | | the other input guarantees upper 32 bits are 0. Summary: This gets the shift case from PR35792. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43222 llvm-svn: 325018
* [ARM] Allow half types in ConstantPoolSjoerd Meijer2018-02-132-0/+169
| | | | | | | | | | | Change ARMConstantIslandPass to: - accept f16 literals as litpool entries, - if the litpool needs to be inserted in the middle of a big block, then we need to 4-byte align the next instruction in ARM mode. Differential Revision: https://reviews.llvm.org/D42784 llvm-svn: 325012
* [DAG] fix type of undef returned by getNode()Sanjay Patel2018-02-131-0/+15
| | | | | | | | The bug has been lying dormant, but apparently was never exposed, until after rL324941 because we didn't return the correct result for shifts with undef operands. llvm-svn: 325010
* [X86] Rename function main->foo in CodeGen/X86/pr35316.ll. NFCAlexander Ivchenko2018-02-131-2/+2
| | | | | | Using "void main" might be confusing for some cases. llvm-svn: 324997
* [Thumb] Handle addressing mode AddrMode5FP16Sjoerd Meijer2018-02-131-0/+24
| | | | | | | | This addressing mode wasn't checked, so we were running in an assert. Differential Revision: https://reviews.llvm.org/D43179 llvm-svn: 324996
OpenPOWER on IntegriCloud