summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Basic legality for load/storeMatt Arsenault2018-03-172-0/+253
| | | | llvm-svn: 327772
* [X86] Added support for nocf_check attribute for indirect Branch TrackingOren Ben Simhon2018-03-172-21/+88
| | | | | | | | | | | | | | | X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET). IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp. The `nocf_check` attribute has two roles in the context of X86 IBT technology: 1. Appertains to a function - do not add ENDBR instruction at the beginning of the function. 2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction. This patch implements `nocf_check` context for Indirect Branch Tracking. It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks. Differential Revision: https://reviews.llvm.org/D41879 llvm-svn: 327767
* [SystemZ] Add 'REQUIRES: asserts' to test case using debug output.Jonas Paulsson2018-03-171-0/+1
| | | | llvm-svn: 327766
* [SystemZ] computeKnownBitsForTargetNode() / ComputeNumSignBitsForTargetNode()Jonas Paulsson2018-03-177-2/+1265
| | | | | | | | | | | Improve/implement these methods to improve DAG combining. This mainly concerns intrinsics. Some constant operands to SystemZISD nodes have been marked Opaque to avoid transforming back and forth between generic and target nodes infinitely. Review: Ulrich Weigand llvm-svn: 327765
* [SelectionDAG] Handle big endian target BITCAST in computeKnownBits()Jonas Paulsson2018-03-171-0/+29
| | | | | | | | | | | | | | | The BITCAST handling in computeKnownBits() previously only worked for little endian. This patch reverses the iteration over elements for a big endian target which allows this to work in this case also. SystemZ test case. Review: Eli Friedman https://reviews.llvm.org/D44249 llvm-svn: 327764
* [MachineOutliner] Make KILLs invisibleJessica Paquette2018-03-161-0/+3
| | | | | | | | | At the point the outliner runs, KILLs don't impact anything, but they're still considered unique instructions. This commit makes them invisible like DebugValues so that they can still be outlined without impacting outlining decisions. llvm-svn: 327760
* [Hexagon] Avoid bank conflicts in post-RA schedulerKrzysztof Parzyszek2018-03-161-0/+156
| | | | | | | | | | Avoid scheduling two loads in such a way that they would end up in the same packet. If there is a load in a packet, try to schedule a non-load next. Patch by Brendon Cahoon. llvm-svn: 327742
* [Hexagon] Add lit testcases for atomic intrinsicsKrzysztof Parzyszek2018-03-163-0/+177
| | | | | | Patch by Ben Craig. llvm-svn: 327737
* [IR] Avoid the need to prefix MS C++ symbols with '\01'Reid Kleckner2018-03-161-0/+60
| | | | | | | | | | | | | | | | | | | | Now the Windows mangling modes ('w' and 'x') do not do any mangling for symbols starting with '?'. This means that clang can stop adding the hideous '\01' leading escape. This means LLVM debug logs are less likely to contain ASCII escape characters and it will be easier to copy and paste MS symbol names from IR. Finally. For non-Windows platforms, names starting with '?' still get IR mangling, so once clang stops escaping MS C++ names, we will get extra '_' prefixing on MachO. That's fine, since it is currently impossible to construct a triple that uses the MS C++ ABI in clang and emits macho object files. Differential Revision: https://reviews.llvm.org/D7775 llvm-svn: 327734
* [AMDGPU] Supported ds_write_b128 generation.Farhana Aleen2018-03-166-17/+43
| | | | | | | | | | | | | | Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726
* [X86] Post process the DAG after isel to remove vector moves that were added ↵Craig Topper2018-03-164-5/+0
| | | | | | | | | | | | to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724
* [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP ↵Dmitry Preobrazhensky2018-03-161-43/+43
| | | | | | | | | | | opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723
* [Hexagon] Fix zero-extending non-HVX bool vectorsKrzysztof Parzyszek2018-03-161-0/+69
| | | | llvm-svn: 327712
* [X86][Btver2] Add correct mul/imul schedule costsSimon Pilgrim2018-03-161-4/+4
| | | | | | Integer multiply is performed on the JMul function unit and i64 requires double pumping llvm-svn: 327707
* [X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costsSimon Pilgrim2018-03-163-18/+18
| | | | | | Don't use WriteIMul defaults llvm-svn: 327706
* [ARM] FP16 codegen support for VSELSjoerd Meijer2018-03-161-1/+39
| | | | | | | | | This implements lowering of SELECT_CC for f16s, which enables codegen of VSEL with f16 types. Differential Revision: https://reviews.llvm.org/D44518 llvm-svn: 327695
* [SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job ↵Craig Topper2018-03-153-317/+419
| | | | | | | | | | | | | | | | picking the result type for the setcc. Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type. But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86. This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck. This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to. For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing. llvm-svn: 327683
* [X86] Make sure we use FSUB instruction as the reference for operand order ↵Craig Topper2018-03-151-5/+19
| | | | | | | | in isAddSubOrSubAdd when recognizing subadd The FADD part of the addsub/subadd pattern can have its operands commuted, but when checking for fsubadd we were using the fadd as reference and commuting the fsub node. llvm-svn: 327660
* [X86] Add test case showing bad fmsubadd creation due to bad commuting.Craig Topper2018-03-151-0/+19
| | | | | | The code that creates fmsubadd from shuffle vector has some code to allow commuting the operands of the fadd node. This code was originally created when we only recognized fmaddsub. When fmsubadd support was added this code was not updated and is now commuting the fsub operands instead. llvm-svn: 327659
* [PPC] Avoid non-simple MVT in STBRX optimizationGuozhi Wei2018-03-151-0/+18
| | | | | | | | | | PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash. This patch detects the non-simple MVT and returns early. Differential Revision: https://reviews.llvm.org/D44500 llvm-svn: 327651
* [PowerPC] Optimize TLS initial-exec sequence to use X-Form loads/storesZaara Syeda2018-03-151-0/+169
| | | | | | | | | This patch adds new load/store instructions for integer scalar types which can be used for X-Form when fed by add with an @tls relocation. Differential Revision: https://reviews.llvm.org/D43315 llvm-svn: 327635
* [X86][Btver2] Remove JAny resource, and map system/microcoded instructions ↵Simon Pilgrim2018-03-155-123/+123
| | | | | | | | to JALU pipes Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6) llvm-svn: 327632
* [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore ↵Simon Pilgrim2018-03-157-36/+36
| | | | | | | | | | | | scheduler classes As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types. I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428. Differential Revision: https://reviews.llvm.org/D44471 llvm-svn: 327630
* [X86] Regenerate schedule tests with zero latency commentsSimon Pilgrim2018-03-154-78/+78
| | | | llvm-svn: 327628
* [AArch64] Codegen tests for the Armv8.2-A FP16 intrinsicsSjoerd Meijer2018-03-157-0/+893
| | | | | | | | | This is a follow up of the AArch64 FP16 intrinsics work; the codegen tests had not been added yet. Differential Revision: https://reviews.llvm.org/D44510 llvm-svn: 327624
* [X86] Add test cases for 512-bit addsub from build_vector.Craig Topper2018-03-151-30/+260
| | | | | | | | There is no 512 bit addsub instruction, but we partially match it handle fmaddsub matching. We explicitly bail out for 512 bit vectors after failing the fmaddsub match, but we had no test coverage for that bail out. We might want to consider splitting and using 256 bit instructions instead of the long sequence seen here. llvm-svn: 327605
* [X86] Add support for matching FMSUBADD from build_vector.Craig Topper2018-03-151-431/+24
| | | | llvm-svn: 327604
* [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case ↵Mark Searles2018-03-141-0/+26
| | | | | | | | of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed. Differential Revision: https://reviews.llvm.org/D44434 llvm-svn: 327583
* [FastISel] Sink local value materializations to first useReid Kleckner2018-03-1428-393/+561
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Local values are constants, global addresses, and stack addresses that can't be folded into the instruction that uses them. For example, when storing the address of a global variable into memory, we need to materialize that address into a register. FastISel doesn't want to materialize any given local value more than once, so it generates all local value materialization code at EmitStartPt, which always dominates the current insertion point. This allows it to maintain a map of local value registers, and it knows that the local value area will always dominate the current insertion point. The downside is that local value instructions are always emitted without a source location. This is done to prevent jumpy line tables, but it means that the local value area will be considered part of the previous statement. Consider this C code: call1(); // line 1 ++global; // line 2 ++global; // line 3 call2(&global, &local); // line 4 Today we end up with assembly and line tables like this: .loc 1 1 callq call1 leaq global(%rip), %rdi leaq local(%rsp), %rsi .loc 1 2 addq $1, global(%rip) .loc 1 3 addq $1, global(%rip) .loc 1 4 callq call2 The LEA instructions in the local value area have no source location and are treated as being on line 1. Stepping through the code in a debugger and correlating it with the assembly won't make much sense, because these materializations are only required for line 4. This is actually problematic for the VS debugger "set next statement" feature, which effectively assumes that there are no registers live across statement boundaries. By sinking the local value code into the statement and fixing up the source location, we can make that feature work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and https://crbug.com/793819. This change is obviously not enough to make this feature work reliably in all cases, but I felt that it was worth doing anyway because it usually generates smaller, more comprehensible -O0 code. I measured a 0.12% regression in code generation time with LLC on the sqlite3 amalgamation, so I think this is worth doing. There are some special cases worth calling out in the commit message: 1. local values materialized for phis 2. local values used by no-op casts 3. dead local value code Local values can be materialized for phis, and this does not show up as a vreg use in MachineRegisterInfo. In this case, if there are no other uses, this patch sinks the value to the first terminator, EH label, or the end of the BB if nothing else exists. Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. Lastly, if the local value register has no other uses, we can delete it. This comes up when fastisel tries two instruction selection approaches and the first materializes the value but fails and the second succeeds without using the local value. Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43093 llvm-svn: 327581
* [CodeGen] Use MIR syntax for MachineMemOperand printingFrancis Visoiu Mistrih2018-03-1415-47/+47
| | | | | | | | | | Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)". rdar://38163529 Differential Revision: https://reviews.llvm.org/D42377 llvm-svn: 327580
* [X86] Add haswell testing for PR35635 as well.Simon Pilgrim2018-03-141-16/+34
| | | | | | To improve complete model testing for schedulers for instructions with multiple results. llvm-svn: 327572
* [AArch64] Emit CSR loads in the same order as storesFrancis Visoiu Mistrih2018-03-141-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optionally allow the order of restoring the callee-saved registers in the epilogue to be reversed. The flag -reverse-csr-restore-seq generates the following code: ``` stp x26, x25, [sp, #-64]! stp x24, x23, [sp, #16] stp x22, x21, [sp, #32] stp x20, x19, [sp, #48] ; [..] ldp x24, x23, [sp, #16] ldp x22, x21, [sp, #32] ldp x20, x19, [sp, #48] ldp x26, x25, [sp], #64 ret ``` Note how the CSRs are restored in the same order as they are saved. One exception to this rule is the last `ldp`, which allows us to merge the stack adjustment and the ldp into a post-index ldp. This is done by first generating: ldp x26, x27, [sp] add sp, sp, #64 which gets merged by the arm64 load store optimizer into ldp x26, x25, [sp], #64 The flag is disabled by default. llvm-svn: 327569
* [X86] Add back fast-isel code for handling i8 shifts.Craig Topper2018-03-141-0/+12
| | | | | | | | | | I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds. This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0. Fixes PR36731. llvm-svn: 327540
* [AArch64] Keep track of MIFlags in the LoadStoreOptimizerFrancis Visoiu Mistrih2018-03-141-0/+99
| | | | | | | | | | | | | | | | Merging: * $x26, $x25 = frame-setup LDPXi $sp, 0 * $sp = frame-destroy ADDXri $sp, 64, 0 into an LDPXpost should preserve the flags from both instructions as following: * frame-setup frame-destroy LDPXpost Differential Revision: https://reviews.llvm.org/D44446 llvm-svn: 327533
* [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set ↵Craig Topper2018-03-143-11/+18
| | | | | | | | | | | | non-demanded bits if it helps created an and mask that can be matched as a zero extend. I had to modify the bswap recognition to allow unshrunk masks to make this work. Fixes PR36689. Differential Revision: https://reviews.llvm.org/D44442 llvm-svn: 327530
* [X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructionsSimon Pilgrim2018-03-142-10/+10
| | | | | | | | They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327524
* SjLjEHPrepare: Don't reg-to-mem swifterror valuesArnold Schwaighofer2018-03-141-2/+31
| | | | | | | | | | | | | | | | | swifterror llvm values model the swifterror register as memory at the LLVM IR level. ISel will perform adhoc mem-to-reg on them. swifterror values are constraint in how they can be used. Spilling them to memory is not allowed. SjLjEHPrepare tried to lower swifterror values to memory which is unecessary since the back-end will spill and reload the register as neccessary (as long as clobbering calls are marked as such which is the case here) and further leads to invalid IR because swifterror values can't be stored to memory. rdar://38164004 llvm-svn: 327521
* [GlobalIsel][X86] Support for G_SDIV instructionAlexander Ivchenko2018-03-145-0/+607
| | | | | | | | Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44430 llvm-svn: 327520
* [X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costsSimon Pilgrim2018-03-142-15/+15
| | | | | | Account for ymm double pumping and add proper pshufb/permutevar support llvm-svn: 327510
* [X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem ↵Simon Pilgrim2018-03-142-15/+18
| | | | | | | | | | instructions They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327505
* [AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NCMartin Storsjo2018-03-141-2/+3
| | | | | | | | | Support for this relocation is missing in both LLD and GNU binutils at the moment. This reverts the ELF parts of SVN r327316. llvm-svn: 327503
* [GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHLAlexander Ivchenko2018-03-1413-13/+2331
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for shift instructions : shift gpr, shift imm, shift 1. Currently GlobalIsel TableGen generate patterns for shift imm and shift 1, but with shiftCount i8. In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments has the same type, so for now only shift i8 can use auto generated TableGen patterns. The support of G_SHL/G_ASHR enables tryCombineSExt from LegalizationArtifactCombiner.h to hit, which results in different legalization for the following tests: LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll LLVM :: CodeGen/X86/GlobalISel/gep.ll LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir -; X64-NEXT: movsbl %dil, %eax +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: shll %cl, %edi +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: sarl %cl, %edi +; X64-NEXT: movl %edi, %eax ..which is not optimal and should be addressed later. Rework of the patch by igorb Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44395 llvm-svn: 327499
* [GlobalIsel][X86] Support for G_ZEXT instructionAlexander Ivchenko2018-03-143-0/+812
| | | | | | | | Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44378 llvm-svn: 327482
* [X86] Re-generate test to get proper capitalization of its CHECK lines. NFCCraig Topper2018-03-131-41/+41
| | | | llvm-svn: 327462
* [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.Craig Topper2018-03-132-6/+0
| | | | | | | | | | This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454
* [DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS ↵Craig Topper2018-03-137-165/+190
| | | | | | | | between LegalizeVectorOps and LegalizeDAG. BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling. llvm-svn: 327446
* [MIR] Allow frame-setup and frame-destroy on the same instructionFrancis Visoiu Mistrih2018-03-131-0/+4
| | | | | | | | | | | | | | | Nothing prevents us from having both frame-setup and frame-destroy on the same instruction. When merging: * frame-setup OPCODE1 * frame-destroy OPCODE2 into * frame-setup frame-destroy OPCODE3 we want to be able to print and parse both flags. llvm-svn: 327442
* [x86] add test for WriteZero sched class instructions; NFCSanjay Patel2018-03-131-0/+92
| | | | | | | | Nops should have zero latency because there is no result. Idioms like 'xorps xmm0, xmm0' may have zero latency because they are handled without using an execution unit. llvm-svn: 327435
* [DAGCombine] visitREM - Don't assume that one divrem isn't driving anotherSimon Pilgrim2018-03-131-0/+30
| | | | | | | | | | | Under some circumstances the divrems won't have been combined together before getting to this code. So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen. Reduced from OSS-Fuzz #6883 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883 llvm-svn: 327424
* [X86][Btver2] Split i8/i16/i32/i64 div/idiv costsSimon Pilgrim2018-03-131-14/+14
| | | | | | We were assuming a mixture of 32/64 division costs. llvm-svn: 327407
OpenPOWER on IntegriCloud