summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Allow X86ISD::Wrapper to be folded into the base of gather/scatter addressCraig Topper2017-11-131-20/+35
| | | | | | | | | | | | If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address. This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase. We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it. Differential Revision: https://reviews.llvm.org/D39927 llvm-svn: 318057
* AMDGPU: Drop duplicate setOperationActionJan Vesely2017-11-131-2/+0
| | | | | | | | These are set with other scalar int ops few lines up Differential Revision: https://reviews.llvm.org/D39928 llvm-svn: 318051
* [X86] test/testn intrinsics lowering to IR. llvm part.Uriel Korach2017-11-131-24/+0
| | | | | | | | | Remove builtins from llvm and add AutoUpgrade support. Also add fast-isel tests for the TEST and TESTN instructions. Differential Revision: https://reviews.llvm.org/D38736 llvm-svn: 318036
* [ARM] Place jump table as the first operand in additionsMomchil Velikov2017-11-133-10/+10
| | | | | | | | | | | | When generating table jump code for switch statements, place the jump table label as the first operand in the various addition instructions in order to enable addressing mode selectors to better match index computation and possibly fold them into the addressing mode of the table entry load instruction. Differential revision: https://reviews.llvm.org/D39752 llvm-svn: 318033
* Test commitSander de Smalen2017-11-131-1/+1
| | | | llvm-svn: 318027
* [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IRJina Nahias2017-11-131-16/+0
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38671 Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661 llvm-svn: 318026
* [X86][SKX] Adding scheduling info of non-intrinsic + commutable SKX opcodes.Gadi Haber2017-11-131-102/+102
| | | | | | | | | | | | | | Updated the scheduling information of the SKX subtarget in the file X86SchedSkylakeServer.td under lib/Target/X86 to: 1. add regular opcodes in addition to the suffixed "_Int" opcodes 2. add the (V)MAXCPD/MAXCPS/MAXCSD/MAXCSS/MINCPD/MINCPS/MINCSD/MINCSS instructions that are equivalent to their counterparts without the 'C' as they are part of a hack to make floating point min/max commutable under fast math. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D39833 Change-Id: Ie13702a5ce1b1a08af91ca637a52b6962881e7d6 llvm-svn: 318024
* [X86] Limit NOPs to 7 bytes when 'slm' is spelled 'silvermont'.Craig Topper2017-11-131-1/+1
| | | | | | We support 2 spelling for silvermont and we should accept both here. llvm-svn: 318023
* [X86] Use sse_load_f32/f64 to improve load folding of scalar vfscalefss/sd, ↵Craig Topper2017-11-131-5/+4
| | | | | | vrcp14ss/sd, rsqrt14ss/sd instructions. llvm-svn: 318022
* [X86] Use sse_load_f32/f64 to improve load folding for scalar VFPCLASS ↵Craig Topper2017-11-131-4/+4
| | | | | | intrinsics. llvm-svn: 318019
* AMDGPU: Preserve nuw in shl add ptr combineMatt Arsenault2017-11-131-1/+6
| | | | llvm-svn: 318017
* [X86] Fix SQRTSS/SQRTSD/RCPSS/RCPSD intrinsics to use ↵Craig Topper2017-11-132-10/+13
| | | | | | sse_load_f32/sse_load_f64 to increase load folding opportunities. llvm-svn: 318016
* AMDGPU: Fix multi-use shl/add combineMatt Arsenault2017-11-132-31/+15
| | | | | | | | | | | This was using a custom function that didn't handle the addressing modes properly for private. Use isLegalAddressingMode to avoid duplicating this. Additionally, skip the combine if there is only one use since the standard combine will handle it. llvm-svn: 318013
* [X86] Attempt to fix signed and unsigned comparison warning.Craig Topper2017-11-131-2/+2
| | | | llvm-svn: 318010
* [X86] Use sse_load_f32/f64 in patterns for the memory forms of VRNDSCALESS/SD.Craig Topper2017-11-131-3/+2
| | | | llvm-svn: 318009
* [X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round ↵Craig Topper2017-11-134-29/+55
| | | | | | | | | | | | | | intrinsics. The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0. This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space. We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used. I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches. llvm-svn: 318008
* [X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with ↵Craig Topper2017-11-135-99/+157
| | | | | | | | and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007
* AMDGPU: Select d16 loads into low component of registerMatt Arsenault2017-11-136-5/+147
| | | | llvm-svn: 318005
* [X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.Craig Topper2017-11-125-6/+8
| | | | | | This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999
* [X86] Remove some no longer needed intrinsic lowering code.Craig Topper2017-11-122-18/+1
| | | | llvm-svn: 317997
* [llvm] Remove redundant return [NFC]Mandeep Singh Grang2017-11-122-2/+0
| | | | | | | | | | | | Reviewers: davidxl, olista01, Eugene.Zelenko Reviewed By: Eugene.Zelenko Subscribers: sdardis, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D39917 llvm-svn: 317995
* [X86] Use vrndscaleps/pd for 128/256 ffloor/ftrunc/fceil/fnearbyint/frint ↵Craig Topper2017-11-112-1/+47
| | | | | | | | when avx512vl is enabled. This matches what we do for scalar and 512-bit types. llvm-svn: 317991
* [X86] Attempt to match multiple binary reduction ops at once. NFCISimon Pilgrim2017-11-111-61/+67
| | | | | | | | | | matchBinOpReduction currently matches against a single opcode, but we already have a case where we repeat calls to try to match against AND/OR and I'll be shortly adding another case for SMAX/SMIN/UMAX/UMIN (D39729). This NFCI patch alters matchBinOpReduction to try and pattern match against any of the provided list of candidate bin ops at once to save time. Differential Revision: https://reviews.llvm.org/D39726 llvm-svn: 317985
* [X86] Add scalar register class versions of VRNDSCALE instructions and ↵Craig Topper2017-11-112-36/+56
| | | | | | | | | | rename the existing versions to _Int. This is consistent with out normal implementation of scalar instructions. While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice. llvm-svn: 317977
* [X86] Inline some SDNode operand multiclass operands that don't vary. NFCCraig Topper2017-11-111-33/+28
| | | | llvm-svn: 317975
* [X86] Set the execution domain for VFPCLASS to SSEPackedSingle/Double.Craig Topper2017-11-111-1/+3
| | | | llvm-svn: 317974
* [X86] Set the execution domain for vptest instruction to the integer domain.Craig Topper2017-11-111-0/+3
| | | | llvm-svn: 317973
* [X86] Correct the execution domain on ROUND/VROUND instructions.Craig Topper2017-11-111-6/+12
| | | | llvm-svn: 317968
* [X86] Remove the default for one of the arguments to some tablegen ↵Craig Topper2017-11-111-5/+3
| | | | | | | | multiclasses. NFC No one ever uses this default and probably shouldn't since it sets the execution domain to generic. llvm-svn: 317967
* Recommit r317904: [Hexagon] Create HexagonISelDAGToDAG.h, NFCKrzysztof Parzyszek2017-11-102-109/+139
| | | | | | | The Windows builder did not reconstruct the HexagonGenDAGISel.inc file after the TableGen binary has changed. llvm-svn: 317921
* AMDGPU/NFC: Split Processors.td into GCNProcessors.td and R600Processors.tdKonstantin Zhuravlyov2017-11-104-218/+258
| | | | | | Differential Revision: https://reviews.llvm.org/D39880 llvm-svn: 317920
* Revert "[Hexagon] Create HexagonISelDAGToDAG.h, NFC"Krzysztof Parzyszek2017-11-102-139/+109
| | | | | | This reverts r317904: broke Windows build. llvm-svn: 317916
* [X86] Merge the template method selectAddrOfGatherScatterNode into ↵Craig Topper2017-11-101-25/+16
| | | | | | | | selectVectorAddr. NFCI Just need to initialize a couple variables differently based on the node type. No need for a whole separate template method. llvm-svn: 317915
* [RISCV] Silence an unused variable warning in release builds [NFC]Mandeep Singh Grang2017-11-102-5/+5
| | | | | | | | | | | | | | | | | | Summary: Also minor cleanups: 1. Avoided multiple calls to Fixup.getKind() 2. Avoided multiple calls to getFixupKindInfo() 3. Removed a redundant return. Reviewers: asb, apazos Reviewed By: asb Subscribers: rbar, johnrusso, llvm-commits Differential Revision: https://reviews.llvm.org/D39881 llvm-svn: 317908
* [Hexagon] Create HexagonISelDAGToDAG.h, NFCKrzysztof Parzyszek2017-11-102-109/+139
| | | | llvm-svn: 317904
* [RegAlloc, SystemZ] Increase number of LOCRs by passing "hard" regalloc hints.Jonas Paulsson2017-11-105-4/+99
| | | | | | | | | | | | | | | | | | | | * The method getRegAllocationHints() is now of bool type instead of void. If true is returned, regalloc (AllocationOrder) will *only* try to allocate the hints, as opposed to merely trying them before non-hinted registers. * TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with an increase in number of LOCRs. In this case, it is desired to force the hints even though there is a slight increase in spilling, because if a non-hinted register would be allocated, the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR (Load On Condition) SystemZ instruction must have both operands in either the low or high part of the 64 bit register. Reviewers: Quentin Colombet and Ulrich Weigand https://reviews.llvm.org/D36795 llvm-svn: 317879
* [X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C)Craig Topper2017-11-101-0/+31
| | | | | | Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG. llvm-svn: 317878
* [AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz ↵Yaxun Liu2017-11-101-3/+7
| | | | | | | | | | | | | | | | environment r600 uses dummy pointer info for lowering load/store. Since dummy pointer info assumes address space 0, this causes isel failure when temporary load/store SDNodes are generated for amdgiz environment. Since the offest is not constant, FixedStack pseudo source value cannot be used to create the pointer info. This patch creates pointer info using llvm undef value. At least this provides correct address space so that isel can be done correctly. Differential Revision: https://reviews.llvm.org/D39698 llvm-svn: 317862
* [AMDGPU] Fix pointer info for pseudo source for r600Yaxun Liu2017-11-102-0/+21
| | | | | | | | | | | The pointer info for pseudo source for r600 is not correct when alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39670 llvm-svn: 317861
* [SystemZ] Add support for the "o" inline asm constraintUlrich Weigand2017-11-092-0/+5
| | | | | | | | | We don't really need any special handling of "offsettable" memory addresses, but since some existing code uses inline asm statements with the "o" constraint, add support for this constraint for compatibility purposes. llvm-svn: 317807
* [mips] Correct microMIP's jump and add unconditional branch pseudoSimon Dardis2017-11-094-18/+29
| | | | | | | | | | | | | | Correct the definition of 'j' as being unavailable for microMIPS32R6 and provide the 'b' assembly idiom for codegen purposes for microMIPS32r3. Provide the necessary 'br' pattern for microMIPS32R6 as it now longer incorrectly uses the 'j' instruction. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39741 llvm-svn: 317801
* [RISCV] MC layer support for the standard RV32A instruction set extensionAlex Bradbury2017-11-096-12/+128
| | | | llvm-svn: 317791
* [RISCV] MC layer support for the standard RV32M instruction set extensionAlex Bradbury2017-11-094-4/+45
| | | | llvm-svn: 317788
* Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.Andrew V. Tischenko2017-11-091-11/+26
| | | | | | Differential Revision: https://reviews.llvm.org/D39802 llvm-svn: 317785
* Add -print-schedule scheduling comments to inline asm.Andrew V. Tischenko2017-11-093-14/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D39728 llvm-svn: 317782
* [X86] Give priority to EVEX FMA instructions over FMA4 instructions.Craig Topper2017-11-093-63/+69
| | | | | | No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority. llvm-svn: 317775
* Fix "default label in switch which covers all enumeration values" warningVitaly Buka2017-11-091-2/+0
| | | | llvm-svn: 317771
* [X86] Make X86ISD::FMADDS3 isel patterns commutable.Craig Topper2017-11-091-4/+4
| | | | | | This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND. llvm-svn: 317769
* AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4Marek Olsak2017-11-091-4/+109
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Only 56 shaders (out of 48486) are affected. Totals from affected shaders (changed stats only): SGPRS: 2420 -> 2460 (1.65 %) Spilled VGPRs: 94 -> 112 (19.15 %) Scratch size: 524 -> 528 (0.76 %) dwords per thread Code Size: 187400 -> 184992 (-1.28 %) bytes One DiRT Showdown shader spills 6 more VGPRs. One Grid Autosport shader spills 12 more VGPRs. The other 54 shaders only have a decrease in code size. (I'm ignoring the SGPR noise) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39012 llvm-svn: 317755
* AMDGPU: Lower buffer store and atomic intrinsics manuallyMarek Olsak2017-11-095-20/+206
| | | | | | | | | | | | | | Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754
OpenPOWER on IntegriCloud