summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: R600 code splitting cleanupMatt Arsenault2016-03-1132-105/+93
| | | | | | | Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204
* AMDGPU: Materialize sign bits with bfrevMatt Arsenault2016-03-111-0/+24
| | | | | | | If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. llvm-svn: 263201
* AArch64: only try to use scaled fcvt ops on legal vector types.Tim Northover2016-03-101-1/+2
| | | | | | | Before we ended up calling getSimpleVectorType on a <3 x float>, which asserted. llvm-svn: 263169
* [x86] don't use a shuffle when a vselect will do; NFCISanjay Patel2016-03-101-16/+5
| | | | | | | | Looking at the IR definition of a masked load made me realize there was no reason to use a shuffle here, so we don't need to convert the format of the mask at all. llvm-svn: 263167
* [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ↵Simon Pilgrim2016-03-101-9/+24
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159
* ARM: Support relative references using the PREL31 symbol variant.Peter Collingbourne2016-03-101-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D17937 llvm-svn: 263156
* AArch64: remove pseudo-instructions used only for their patterns.Tim Northover2016-03-101-15/+42
| | | | | | | There's no real reason for these pseudos to exist, we should be writing real patterns even if it is slightly less convenient. NFC. llvm-svn: 263141
* AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsicsNicolai Haehnle2016-03-102-15/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140
* [X86] Correctly select registers to pop into for x86_64Michael Kuperstein2016-03-101-1/+2
| | | | | | | | | | | | | | When trying to replace an add to esp with pops, we need to choose dead registers to pop into. Registers clobbered by the call and not imp-def'd by it should be safe. Except that it's not enough to check the register itself isn't defined, we also need to make sure no overlapping registers are defined either. This fixes PR26711. Differential Revision: http://reviews.llvm.org/D18029 llvm-svn: 263139
* [AArch64] Optimize compare and branch sequence when the compare's constant ↵Balaram Makam2016-03-101-25/+82
| | | | | | | | | | | | | | | | | | | | | | | operand is power of 2 Summary: Peephole optimization that generates a single TBZ/TBNZ instruction for test and branch sequences like in the example below. This handles the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC Examples: and w8, w8, #0x400 cbnz w8, L1 to tbnz w8, #10, L1 Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17942 llvm-svn: 263136
* [ARM] Cortex-R8 supportAlexandros Lamprineas2016-03-101-0/+12
| | | | | | | | | | | This patch adds Cortex-R8 to Target Parser and TableGen. It also adds CodeGen tests for the build attributes. Patch by Pablo Barrio. Differential Revision: http://reviews.llvm.org/D17925 llvm-svn: 263132
* AMDGPU/SI: Define S_GETREG IntrinsicChangpeng Fang2016-03-101-0/+12
| | | | | | | | | | | | | | Summary: Define s_getreg intrinsic to generate s_getreg instruction to read hardware registers. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17892 llvm-svn: 263124
* ARM: follow up improvements for SVN r263118Saleem Abdulrasool2016-03-104-4/+14
| | | | | | | | | | | | | | The initial change was insufficiently complete for always getting the semantics of __builtin_longjmp correct. The builtin is translated into a `tInt_eh_sjlj_longjmp` DAG node. This node set R7 as clobbered. However, the code would then follow up with a clobber of R11. I had failed to notice the imp-def,kill on R7 in the isel. Unfortunately, it seems that it is not possible to conditionalise the Defs list via an !if. Instead, construct a new parallel WIN node and prefer that when targeting windows. This ensures that we now both correctly model the __builtin_longjmp as well as construct the frame in a more ABI conformant manner. llvm-svn: 263123
* Unified the handling of returns in the X87 stackifier so that the stackifierDavid L Kreitzer2016-03-101-90/+93
| | | | | | | | runs successfully on routines containing IRETs. This fixes PR26410. Differential Revision: http://reviews.llvm.org/D17643 llvm-svn: 263120
* ARM: correct __builtin_longjmp on WoASaleem Abdulrasool2016-03-101-1/+3
| | | | | | | | WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast to AAPCS which states r7. This adjusts __builtin_longjmp to not clobber r7 and to properly restore the frame pointer on execution. llvm-svn: 263118
* AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)Elena Demikhovsky2016-03-101-23/+27
| | | | | | | | (failed on instruction selection phase) Differential Revision: http://reviews.llvm.org/D17924 llvm-svn: 263111
* [AMDGPU] Fix SMEM instructions encoding/operand namingsValery Pykhtin2016-03-105-37/+91
| | | | | | Differential Revision: http://reviews.llvm.org/D17651 llvm-svn: 263108
* [X86][AVX] Improve target shuffle combining of BLEND+zeroSimon Pilgrim2016-03-101-1/+2
| | | | | | | | The BLEND+zero combine was failing to combine equivalent BLEND masks. Follow up to D17483 and D17858 llvm-svn: 263105
* [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.Simon Pilgrim2016-03-101-12/+18
| | | | | | | | | | | | This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask. This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining. There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for. Differential Revision: http://reviews.llvm.org/D17858 llvm-svn: 263102
* AVX-512: Fixed a bug in shuffle for v64i8 typeElena Demikhovsky2016-03-101-0/+2
| | | | | | | | Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on". Differential Revision: http://reviews.llvm.org/D17994 llvm-svn: 263097
* Add support for a preserve_most calling convention to the AArch64 backend.Roman Levenstein2016-03-105-1/+19
| | | | | | | | | | This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64. There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention. Differential Revision: http://reviews.llvm.org/D18016 llvm-svn: 263092
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-4/+4
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* [WebAssembly] Update known gcc test failuresDerek Schuff2016-03-091-3/+0
| | | | llvm-svn: 263068
* [x86, AVX] optimize masked loads with constant masksSanjay Patel2016-03-091-2/+44
| | | | | | | | Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067
* [AArch64] Minor reformatting (NFC).Evandro Menezes2016-03-091-8/+6
| | | | llvm-svn: 263054
* This change adds co-processor condition branching and conditional traps to ↵Chris Dewhurst2016-03-097-42/+223
| | | | | | | | | | | | | | | | the Sparc back-end. This will allow inline assembler code to utilize these features, but no automatic lowering is provided, except for the previously provided @llvm.trap, which lowers to "ta 5". The change also separates out the different assembly language syntaxes for V8 and V9 Sparc. Previously, only V9 Sparc assembly syntax was provided. The change also corrects the selection order of trap disassembly, allowing, e.g. "ta %g0 + 15" to be rendered, more readably, as "ta 15", ignoring the %g0 register. This is per the sparc v8 and v9 manuals. Check-in includes many extra unit tests to check this works correctly on both V8 and V9 Sparc processors. Code Reviewed at http://reviews.llvm.org/D17960. llvm-svn: 263044
* [PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructionsKit Barton2016-03-091-0/+2
| | | | | | | This has to be committed before the FE changes Phabricator: http://reviews.llvm.org/D17837 llvm-svn: 263035
* [AArch64] Move helper functions into TII, so they can be reused elsewhere. NFC.Chad Rosier2016-03-093-47/+56
| | | | llvm-svn: 263032
* [AArch64] Minor cleanup/remove redundant code. NFC.Chad Rosier2016-03-092-12/+8
| | | | llvm-svn: 263024
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-098-13/+13
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* Fix build error due to unsigned compare >= 0 in r263008 (NFC)Teresa Johnson2016-03-091-1/+1
| | | | | | | | | | | | Fixes error from building with clang: /usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12: error: comparison of unsigned expression >= 0 is always true [-Werror,-Wtautological-compare] if ((Imm >= 0x000) && (Imm <= 0x0ff)) { ~~~ ^ ~~~~~ llvm-svn: 263014
* [AMDGPU] Assembler: Support DPP instructions.Sam Kolton2016-03-098-46/+350
| | | | | | | | | | | | | | | | | | | | Supprot DPP syntax as used in SP3 (except several operands syntax). Added dpp-specific operands in td-files. Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter. Support for VOP2 DPP instructions in td-files. Some tests for DPP instructions. ToDo: - VOP2bInst: - vcc is considered as operand - AsmMatcher doesn't apply mnemonic aliases when parsing operands - v_mac_f32 - v_nop - disable instructions with 64-bit operands - change dpp_ctrl assembler representation to conform sp3 Review: http://reviews.llvm.org/D17804 llvm-svn: 263008
* [AMDGPU] Assembler: Support abs() syntax.Nikolay Haustov2016-03-091-2/+19
| | | | | | | | | Support legacy SP3 abs(v1) syntax. InstPrinter still uses |v1|. Add tests. Differential Revision: http://reviews.llvm.org/D17887 llvm-svn: 263006
* [AMDGPU] Assembler: Fix s_setpc_b64Nikolay Haustov2016-03-091-1/+1
| | | | | | | | s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to. Differential Revision: http://reviews.llvm.org/D17888 llvm-svn: 263005
* [WebAssembly] Update comments about irreducible control flow.Dan Gohman2016-03-092-8/+13
| | | | llvm-svn: 262995
* [WebAssembly] Implement irreducible control flow.Dan Gohman2016-03-095-35/+297
| | | | | | | | This implements a very simple conservative transformation that doesn't require more than linear code size growth. There's room for much more optimization in this space. llvm-svn: 262982
* Revert r262759 and r262760.Quentin Colombet2016-03-081-9/+0
| | | | | | | | The fix consisting in using the library call for atomic compare and swap when the instruction is not safe to use may be incorrect. Indeed the library call may not exist on all platform. In other words, we need a better fix! llvm-svn: 262943
* [AArch64] Add MMOs to unscaled pairs.Chad Rosier2016-03-081-3/+2
| | | | | | | Test to be committed in follow up commit, per discussion in D17097. http://reviews.llvm.org/D17097 llvm-svn: 262942
* [ARM] Simplify ARMInstr*.td by getting rid of identity PatFrags (NFC)Artyom Skrobov2016-03-083-107/+74
| | | | | | | | | | Reviewers: t.p.northover, grosbach, resistor Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D17636 llvm-svn: 262936
* Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ↵Hans Wennborg2016-03-081-24/+9
| | | | | | | | ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935
* AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.Igor Breger2016-03-081-0/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D17953 llvm-svn: 262929
* [Power9] Implement new vsx instructions: load, store instructions for vector ↵Kit Barton2016-03-087-0/+214
| | | | | | | | | | | | | | | | | | | | and scalar We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to implement this new patch. This patch implements the following vsx instructions: Vector load/store: lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx stxv stxvb16x stxvh8x stxvl stxvll stxvx Scalar load/store: lxsd lxssp lxsibzx lxsihzx stxsd stxssp stxsibx stxsihx 21 instructions Phabricator: http://reviews.llvm.org/D16919 llvm-svn: 262906
* [WebAssembly] Update for spec change from tableswitch to br_table.Dan Gohman2016-03-085-18/+18
| | | | | | | Also note that the operand order changed; the default label is now listed after the regular labels. llvm-svn: 262903
* [AArch64] Initialize GlobalISel as part of the target initialization.Quentin Colombet2016-03-081-0/+2
| | | | llvm-svn: 262897
* A couple more UB fixes for C++14 sized deallocation.Richard Smith2016-03-081-0/+4
| | | | llvm-svn: 262891
* AMDGPU: Match more med3 integer patternsMatt Arsenault2016-03-072-0/+33
| | | | llvm-svn: 262864
* AMDGPU: Remove a fixme for ptrrtoint handlingMatt Arsenault2016-03-071-1/+0
| | | | llvm-svn: 262854
* AMDGPU: Move function only used by R600Matt Arsenault2016-03-074-18/+17
| | | | llvm-svn: 262853
* [ms-inline-asm][AVX512] Add ability to use k registers in MS inline asm + ↵Marina Yatsina2016-03-071-1/+11
| | | | | | | | | | | | | | | | | | | | | | fix bag with curly braces Until now curly braces could only be used in MS inline assembly to mark block start/end. All curly braces were removed completely at a very early stage. This approach caused bugs like: "m{o}v eax, ebx" turned into "mov eax, ebx" without any error. In addition, AVX-512 added special operands (e.g., k registers), which are also surrounded by curly braces that mark them as such. Now, we need to keep the curly braces and identify at a later stage if they are marking block start/end (if so, ignore them), or surrounding special AVX-512 operands (if so, parse them as such). This patch fixes the bug described above and enables the use of AVX-512 special operands. This commit is the the llvm part of the patch. The clang part of the review is: http://reviews.llvm.org/D17766 The llvm part of the review is: http://reviews.llvm.org/D17767 Differential Revision: http://reviews.llvm.org/D17767 llvm-svn: 262843
* [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target ↵Simon Pilgrim2016-03-063-11/+9
| | | | | | | | | | | | | | shuffle combining. Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles. This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed. Removed non-constant pool mask decode path as we have no way of testing it right now. Differential Revision: http://reviews.llvm.org/D17916 llvm-svn: 262809
OpenPOWER on IntegriCloud