summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ.Craig Topper2016-06-101-1/+9
| | | | llvm-svn: 272371
* [AVX512] Fix shuffle comment printing to handle the masked versions of some ↵Craig Topper2016-06-101-30/+46
| | | | | | shuffles. Previously we were printing the mask operands as the register names. llvm-svn: 272367
* AMDGPU: Fix trailing whitespaceMatt Arsenault2016-06-1010-40/+40
| | | | llvm-svn: 272364
* AMDGPU: v_cndmask_b32 does not def vccMatt Arsenault2016-06-101-2/+2
| | | | | | Fixes verifier errors after SIShrinkInstructions. llvm-svn: 272351
* AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permuteTom Stellard2016-06-101-2/+2
| | | | | | | | | | | | | | | Summary: This fixes a bug with ds_*permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349
* AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMITom Stellard2016-06-091-64/+3
| | | | | | | | | | Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19823 llvm-svn: 272346
* AMDGPU: Fix flat atomicsMatt Arsenault2016-06-094-41/+90
| | | | | | | | The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345
* AMDGPU: Fix i64 global cmpxchgMatt Arsenault2016-06-094-40/+82
| | | | | | | | | | This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344
* Add aliases for mfvrsave/mtvrsave.Eric Christopher2016-06-091-0/+4
| | | | | | | Update a test as we're now going to emit it for easier reading of generated assembly as well. llvm-svn: 272339
* AMDGPU: Run verifer after insert waits passMatt Arsenault2016-06-091-1/+1
| | | | llvm-svn: 272338
* AMDGPU: Remove incorrect assertionMatt Arsenault2016-06-091-4/+0
| | | | | | | I'm still not sure under what circumstances the offset here is non-0, but private memory is not limited to 27-bits. llvm-svn: 272337
* AMDGPU: Properly initialize SIShrinkInstructionsMatt Arsenault2016-06-093-8/+6
| | | | llvm-svn: 272336
* [X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction commentsSimon Pilgrim2016-06-091-0/+12
| | | | llvm-svn: 272319
* [X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsicsSimon Pilgrim2016-06-091-2/+0
| | | | | | Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ llvm-svn: 272308
* [X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNRSimon Pilgrim2016-06-091-1/+1
| | | | llvm-svn: 272307
* [X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte ↵Simon Pilgrim2016-06-091-19/+41
| | | | | | | | shifts 512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget llvm-svn: 272300
* [NVPTX] Add intrinsics for shfl instructions.Justin Lebar2016-06-091-1/+42
| | | | | | | | | | | | | | | Summary: Currently clang emits these instructions via inline (volatile) asm in the CUDA headers. Switching to intrinsics will let the optimizer reason across calls to these intrinsics. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D21160 llvm-svn: 272298
* AMDGPU/SI: Fix 32-bit fdiv loweringWei Ding2016-06-091-16/+53
| | | | | | | | | We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 llvm-svn: 272292
* [SystemZ] Enable long displacement constraints for inline ASM operandsUlrich Weigand2016-06-092-9/+21
| | | | | | | | | | | | | | | | | | This enables use of the 'S' constraint for inline ASM operands on SystemZ, which allows for a memory reference with a signed 20-bit immediate displacement. This patch includes corresponding documentation and test case updates. I've changed the 'T' constraint to match the new behavior for 'S', as 'T' also uses a long displacement (though index constraints are still not implemented). I also changed 'm' to match the behavior for 'S' as this will allow for a wider range of displacements for 'm', though correct me if that's not the right decision. Author: colpell Differential Revision: http://reviews.llvm.org/D21097 llvm-svn: 272266
* [mips][microMIPS] Implement BOVC, BNVC, EXT, INS and JALRC instructionsHrvoje Varga2016-06-099-13/+241
| | | | | | Differential Revision: http://reviews.llvm.org/D11798 llvm-svn: 272259
* [Thumb] A branch is not part of an IT blockJames Molloy2016-06-091-1/+1
| | | | | | | | ReplaceTailWithBranchTo assumed that if an instruction is predicated, it must be part of an IT block. This is not correct for conditional branches. No testcase as this was triggered by the reverted patch r272017 - test coverage will occur when that patch is re-reverted and there is no known way to trigger this in the meantime. llvm-svn: 272258
* [AVX512] Remove masked_move/blendm intrinsic from back-end. Igor Breger2016-06-092-45/+1
| | | | | | | | This is complement patch to D21060. Differential Revision: http://reviews.llvm.org/D21174 llvm-svn: 272257
* [mips][microMIPS] Add CodeGen support for SEL.*, SELEQZ, SELNEZ, SELEQZ.*, ↵Zlatko Buljan2016-06-094-80/+137
| | | | | | | | SELNEZ.* and CMP.condn.fmt instructions Differential Revision: http://reviews.llvm.org/D20862 llvm-svn: 272256
* [AMDGPU] Disassembler: Support for sdwa instructionsSam Kolton2016-06-091-1/+5
| | | | | | | | | | Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21129 llvm-svn: 272255
* [AVX512] Fix shuffle decode printing for several instructions with write ↵Craig Topper2016-06-091-3/+3
| | | | | | masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix. llvm-svn: 272252
* [Thumb] Select a BIC instead of AND if the immediate can be encoded more ↵James Molloy2016-06-091-1/+40
| | | | | | | | | | | | | | | | | | | | | | optimally negated If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead; int i(int a) { return a & 0xfffffeec; } Used to produce: ldr r1, [CONSTPOOL] ands r0, r1 CONSTPOOL: 0xfffffeec And now produces: movs r1, #255 adds r1, #20 ; Less costly immediate generation bics r0, r1 llvm-svn: 272251
* [X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR ↵Craig Topper2016-06-094-39/+35
| | | | | | instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. llvm-svn: 272249
* [X86] Fix bad comment in assert. NFCCraig Topper2016-06-091-1/+1
| | | | llvm-svn: 272248
* AArch64: support the `.arch` directive in the IASSaleem Abdulrasool2016-06-091-0/+28
| | | | | | | | | | Add support to the AArch64 IAS for the `.arch` directive. This allows the assembly input to use architectural functionality in part of a file. This is used in existing code like BoringSSL. Resolves PR26016! llvm-svn: 272241
* Apply most suggestions of clang-tidy's performance-unnecessary-value-paramBenjamin Kramer2016-06-088-14/+15
| | | | | | | Avoids unnecessary copies. All changes audited & pass tests with asan. No functional change intended. llvm-svn: 272190
* [AArch64][RegisterBankInfo] G_OR are fine on either GPR or FPR.Quentin Colombet2016-06-082-0/+63
| | | | | | | | | | Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or FPR for 64-bit or 32-bit values. Add test cases demonstrating how this information is used to coalesce a computation on a single register bank. llvm-svn: 272170
* [ARM] MSR instructions implicitly set CPSROliver Stannard2016-06-082-0/+4
| | | | | | | | | | | | | | | The MSR instructions can write to the CPSR, but we did not model this fact, so we could emit them in the middle of IT blocks, changing the condition flags for later instructions in the block. The tests use two calls to llvm.write_register.i32 because it is valid to use these instructions at the end of an IT block, which if conversion does do in some cases. With two calls, the first clobbers the flags, so a branch has to be used to make the second one conditional. Differential Revision: http://reviews.llvm.org/D21139 llvm-svn: 272154
* [mips] Add a proper file header in MipsFastISel.cppVasileios Kalintiris2016-06-081-2/+15
| | | | llvm-svn: 272138
* [Hexagon] Modify HexagonExpandCondsets to handle subregistersKrzysztof Parzyszek2016-06-081-507/+454
| | | | | | | | | Also, switch to using functions from LiveIntervalAnalysis to update live intervals, instead of performing the updates manually. Re-committing r272045. llvm-svn: 272135
* [ARM] Remove redundant check. NFCDiana Picus2016-06-081-1/+1
| | | | | | isSwift is tested earlier and known to be false when we reach this code. llvm-svn: 272127
* Avoid copies of std::strings and APInt/APFloats where we only read from itBenjamin Kramer2016-06-088-12/+12
| | | | | | | | As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126
* [AVX512] Fix cvtusi2sd instruction Opcode, it should be 0x7B instead of 0x2A.Igor Breger2016-06-081-1/+1
| | | | llvm-svn: 272122
* [AArch64][RegisterBankInfo] Use the generic implementation of copyCost.Quentin Colombet2016-06-081-1/+2
| | | | | | Long term we may want to give high cost at FPR to/from GPR copies. llvm-svn: 272086
* [RegisterBankInfo] Add a size argument for the cost of copy.Quentin Colombet2016-06-082-4/+9
| | | | | | | The cost of a copy may be different based on how many bits we have to copy around. E.g., a 8-bit copy may be different than a 32-bit copy. llvm-svn: 272084
* AMDGPU: Add amdgpu-ps-wqm-outputs function attributesNicolai Haehnle2016-06-071-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 llvm-svn: 272063
* Revert "Differential Revision: http://reviews.llvm.org/D20557"Eric Christopher2016-06-071-55/+17
| | | | | | | | | | | | | | | | Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056
* [stack-protection] Add support for MSVC buffer security checkEtienne Bergeron2016-06-072-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch is adding support for the MSVC buffer security check implementation The buffer security check is turned on with the '/GS' compiler switch. * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx * To be added to clang here: http://reviews.llvm.org/D20347 Some overview of buffer security check feature and implementation: * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/ * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html For the following example: ``` int example(int offset, int index) { char buffer[10]; memset(buffer, 0xCC, index); return buffer[index]; } ``` The MSVC compiler is adding these instructions to perform stack integrity check: ``` push ebp mov ebp,esp sub esp,50h [1] mov eax,dword ptr [__security_cookie (01068024h)] [2] xor eax,ebp [3] mov dword ptr [ebp-4],eax push ebx push esi push edi mov eax,dword ptr [index] push eax push 0CCh lea ecx,[buffer] push ecx call _memset (010610B9h) add esp,0Ch mov eax,dword ptr [index] movsx eax,byte ptr buffer[eax] pop edi pop esi pop ebx [4] mov ecx,dword ptr [ebp-4] [5] xor ecx,ebp [6] call @__security_check_cookie@4 (01061276h) mov esp,ebp pop ebp ret ``` The instrumentation above is: * [1] is loading the global security canary, * [3] is storing the local computed ([2]) canary to the guard slot, * [4] is loading the guard slot and ([5]) re-compute the global canary, * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling. Overview of the current stack-protection implementation: * lib/CodeGen/StackProtector.cpp * There is a default stack-protection implementation applied on intermediate representation. * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie. * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast). * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling. * Guard manipulation and comparison are added directly to the intermediate representation. * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls). * see long comment above 'class StackProtectorDescriptor' declaration. * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr). * 'getSDagStackGuard' returns the appropriate stack guard (security cookie) * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'. * include/llvm/Target/TargetLowering.h * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'. * lib/Target/X86/X86ISelLowering.cpp * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm. Function-based Instrumentation: * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions. * To support function-based instrumentation, this patch is * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h), * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue. * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation, * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp), * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp). Modifications * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp) * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h) Results * IR generated instrumentation: ``` clang-cl /GS test.cc /Od /c -mllvm -print-isel-input ``` ``` *** Final LLVM Code input to ISel *** ; Function Attrs: nounwind sspstrong define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 { entry: %StackGuardSlot = alloca i8* <<<-- Allocated guard slot %0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot) %index.addr = alloca i32, align 4 %offset.addr = alloca i32, align 4 %buffer = alloca [10 x i8], align 1 store i32 %index, i32* %index.addr, align 4 store i32 %offset, i32* %offset.addr, align 4 %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0 %1 = load i32, i32* %index.addr, align 4 call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false) %2 = load i32, i32* %index.addr, align 4 %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2 %3 = load i8, i8* %arrayidx, align 1 %conv = sext i8 %3 to i32 %4 = load volatile i8*, i8** %StackGuardSlot <<<-- Loading Guard slot call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check ret i32 %conv } ``` * SelectionDAG generated instrumentation: ``` clang-cl /GS test.cc /O1 /c /FA ``` ``` "?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z" # BB#0: # %entry pushl %esi subl $16, %esp movl ___security_cookie, %eax <<<-- Loading Stack Guard value movl 28(%esp), %esi movl %eax, 12(%esp) <<<-- Store to Guard slot leal 2(%esp), %eax pushl %esi pushl $204 pushl %eax calll _memset addl $12, %esp movsbl 2(%esp,%esi), %esi movl 12(%esp), %ecx <<<-- Loading Guard slot calll @__security_check_cookie@4 <<<-- Epilogue function-based check movl %esi, %eax addl $16, %esp popl %esi retl ``` Reviewers: kcc, pcc, eugenis, rnk Subscribers: majnemer, llvm-commits, hans, thakis, rnk Differential Revision: http://reviews.llvm.org/D20346 llvm-svn: 272053
* Revert r272045 since GCC doesn't know how to compile it.Krzysztof Parzyszek2016-06-071-449/+507
| | | | llvm-svn: 272048
* [Hexagon] Modify HexagonExpandCondsets to handle subregistersKrzysztof Parzyszek2016-06-071-507/+449
| | | | | | | Also, switch to using functions from LiveIntervalAnalysis to update live intervals, instead of performing the updates manually. llvm-svn: 272045
* Differential Revision: http://reviews.llvm.org/D20557Wei Ding2016-06-071-17/+55
| | | | llvm-svn: 272044
* Reapply [AArch64] Fix isLegalAddImmediate() to return true for valid ↵Geoff Berry2016-06-071-2/+5
| | | | | | | | negative values. Originally reviewed here: http://reviews.llvm.org/D17463 llvm-svn: 272023
* [ARM] Accept conditional versions of BXNS and BLXNSOliver Stannard2016-06-071-0/+1
| | | | | | | | | These instructions end in "S" but are not flag-setting, so they need including in the list of special cases in the assembly parser. Differential Revision: http://reviews.llvm.org/D21077 llvm-svn: 272015
* [X86][SSE] Add general lowering of nontemporal vector loads (fixed bad merge)Simon Pilgrim2016-06-071-9/+37
| | | | | | | | | | | | | | Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272011
* [X86][SSE] Add general lowering of nontemporal vector loadsSimon Pilgrim2016-06-072-0/+69
| | | | | | | | | | | | | | Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272010
* [Thumb-1] Add optimized constant materialization for integers [256..512)James Molloy2016-06-072-0/+13
| | | | | | We can materialize these integers using a MOV; ADDi8 pair. llvm-svn: 272007
OpenPOWER on IntegriCloud