summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* GlobalISel: Try to make legalize rules more useful for vectorsMatt Arsenault2019-02-071-18/+7
| | | | | | | Mostly keep the existing functions on scalars, but add versions which also operate based on the vector element size. llvm-svn: 353430
* [x86] split more 256/512-bit shuffles in loweringSanjay Patel2019-02-071-1/+5
| | | | | | | | | | | | This is intentionally a small step because it's hard to know exactly where we might introduce a conflicting transform with the code that tries to form wider shuffles. But I think this is safe - if we have a wide shuffle with 2 operands, then we should do better with an extract + narrow shuffle. Differential Revision: https://reviews.llvm.org/D57867 llvm-svn: 353427
* [X86] Simplify casing. NFC.Nirav Dave2019-02-071-8/+8
| | | | llvm-svn: 353417
* [LSR] Generate cross iteration indexesSam Parker2019-02-071-0/+6
| | | | | | | | | | | | | | Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 llvm-svn: 353403
* [ARM GlobalISel] Support G_ICMP for Thumb2Diana Picus2019-02-072-12/+25
| | | | | | | Mark as legal and use the t2* equivalents of the arm mode instructions, e.g. t2CMPrr instead of plain CMPrr. llvm-svn: 353392
* [ARM] Reformat isRedundantFlagInstr for D57833. NFCDavid Green2019-02-071-8/+4
| | | | llvm-svn: 353386
* [BPF] add code-gen support for JMP32 instructionsJiong Wang2019-02-078-46/+104
| | | | | | | | | | | | | | | | | | | | | | | JMP32 instructions has been added to eBPF ISA. They are 32-bit variants of existing BPF conditional jump instructions, but the comparison happens on low 32-bit sub-register only, therefore some unnecessary extensions could be saved. JMP32 instructions will only be available for -mcpu=v3. Host probe hook has been updated accordingly. JMP32 instructions will only be enabled in code-gen when -mattr=+alu32 enabled, meaning compiling the program using sub-register mode. For JMP32 encoding, it is a new instruction class, and is using the reserved eBPF class number 0x6. This patch has been tested by compiling and running kernel bpf selftests with JMP32 enabled. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 353384
* AArch64: implement copy for paired GPR registers.Tim Northover2019-02-072-0/+45
| | | | | | | When doing 128-bit atomics using CASP we might need to copy a GPRPair to a different register, but that was unimplemented up to now. llvm-svn: 353383
* [X86][DAG] Avoid creating dangling bitcast.Nirav Dave2019-02-061-1/+2
| | | | | | | | combineExtractWithShuffle may leave a dangling bitcast which may prevent further optimization in later passes. Avoid constructing it unless it is used. llvm-svn: 353333
* [SystemZ] Improved handling of the @llvm.ctlz intrinsic.Jonas Paulsson2019-02-062-0/+2
| | | | | | | | | | | | | | Since SystemZ supports counting of leading zeros with the FLOGR instruction, isCheapToSpeculateCtlz() should return true, which it now does. ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only expanded to CTLZ if it is Legal or Custom. Review: Ulrich Weigand https://reviews.llvm.org/D57710 llvm-svn: 353330
* [SystemZ] Wait with VGBM selection until after DAGCombine2.Jonas Paulsson2019-02-065-41/+46
| | | | | | | | | | | | | | | | | Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs to the DAGCombiner and select them to VGBM in Select(). This allows the DAGCombiner to understand the constant vector values. For floating point, only all-zeros vectors are now generated with VGBM, as it turned out to be somewhat complicated to handle any arbitrary constants, while in practice this is very rare and hardly needed. The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed. Review: Ulrich Weigand https://reviews.llvm.org/D57152 llvm-svn: 353325
* AArch64: enforce even/odd register pairs for CASP instructions.Tim Northover2019-02-062-6/+8
| | | | | | | | ARMv8.1a CASP instructions need the first of the pair to be an even register (otherwise the encoding is unallocated). We enforced this during assembly, but not CodeGen before. llvm-svn: 353308
* [InlineAsm][X86] Add backend support for X86 flag output parameters.Nirav Dave2019-02-062-1/+71
| | | | | | | Allow custom handling of inline assembly output parameters and add X86 flag parameter support. llvm-svn: 353307
* [SystemZ] Do not return INT_MIN from strcmp/memcmpUlrich Weigand2019-02-064-125/+91
| | | | | | | | | | | | | | | | | | | The IPM sequence currently generated to compute the strcmp/memcmp result will return INT_MIN for the "less than zero" case. While this is in compliance with the standard, strictly speaking, it turns out that common applications cannot handle this, e.g. because they negate a comparison result in order to implement reverse compares. This patch changes code to use a different sequence that will result in -2 for the "less than zero" case (same as GCC). However, this requires that the two source operands of the compare instructions are inverted, which breaks the optimization in removeIPMBasedCompare. Therefore, I've removed this (and all of optimizeCompareInstr), and replaced it with a mostly equivalent optimization in combineCCMask at the DAGcombine level. llvm-svn: 353304
* AArch64: annotate atomics with dropped acquire semantics when printing.Tim Northover2019-02-063-62/+50
| | | | | | | | | | | A quirk of the v8.1a spec is that when the writeback regiser for an atomic read-modify-write instruction is wzr/xzr, the instruction no longer enforces acquire ordering. However, it's still written with the misleading 'a' mnemonic. So this adds an annotation when disassembling such instructions, mentioning the change. llvm-svn: 353303
* [x86] vectorize cast ops in lowering to avoid register file transfersSanjay Patel2019-02-061-0/+57
| | | | | | | | | | | | | | | The proposal in D56796 may cross the line because we're trying to avoid vectorization transforms in generic DAG combining. So this is an alternate, later, x86-specific translation of that patch. There are several potential follow-ups to enhance this: 1. Allow extraction from non-zero element index. 2. Peek through extends of smaller width integers. 3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P Differential Revision: https://reviews.llvm.org/D56864 llvm-svn: 353302
* [WebAssembly] Tidy up `let` statements in .td files (NFC)Heejin Ahn2019-02-065-71/+59
| | | | | | | | | | | | | | | | | Summary: - Delete {} for one-line `let` statements - Don't indent within `let` blocks - Add comments after `let` block's closing braces Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57730 llvm-svn: 353248
* [NFC][GlobalISel]: Add a convenience method to MachineInstrBuilder to ↵Aditya Nandakumar2019-02-052-2/+2
| | | | | | | | | | | | simplify getOperand(i).getReg() https://reviews.llvm.org/D57608 It's a common pattern in GISel to have a MachineInstrBuilder from which we get various regs (commonly MIB->getOperand(0).getReg()). This adds a helper method and the above can be replaced with MIB.getReg(0). llvm-svn: 353223
* [WebAssembly] Lower memmove to memory.copyThomas Lively2019-02-053-1/+17
| | | | | | | | | | | | | | Summary: The lowering is identical to the memcpy lowering. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57727 llvm-svn: 353216
* [AMDGPU] Consider XOR in waterfall loop as a terminatorScott Linder2019-02-051-1/+1
| | | | | | | | Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator. Differential Revision: https://reviews.llvm.org/D57703 llvm-svn: 353207
* AMDGPU: Fix assert on trunc from bitcast of build_vectorMatt Arsenault2019-02-051-1/+1
| | | | | | | | | The v2i64 argument is lowered to a bitcast of v4i32 build_vector. This would then attempt to use the i32-element as the source of the vector truncate. This really would need to collect 2 elements from the build_vector to produce the intended truncate. llvm-svn: 353202
* [X86][SSE] Disable ZERO_EXTEND shuffle combiningSimon Pilgrim2019-02-051-2/+2
| | | | | | rL352997 enabled ZERO_EXTEND from non-shuffle-able value types. I've disabled it for now to fix a regression identified by @asbirlea until I can fix this properly. llvm-svn: 353198
* Enable integrated assembler on MSP430 by default.Anton Korobeynikov2019-02-051-0/+1
| | | | | | | | Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56787 llvm-svn: 353192
* [AArch64][Outliner] Don't outline BTI instructionsOliver Stannard2019-02-051-0/+8
| | | | | | | | | | | We can't outline BTI instructions, because they need to be the very first instruction executed after an indirect call or branch. If we outline them, then an indirect call might go to the branch to the outlined function, which will fault. Differential revision: https://reviews.llvm.org/D57753 llvm-svn: 353190
* [X86][AVX] Attempt to combine shuffles to subvector broadcast loadSimon Pilgrim2019-02-051-0/+18
| | | | llvm-svn: 353189
* AArch64/GlobalISel: Don't clamp from 2 to 2Matt Arsenault2019-02-051-2/+2
| | | | | | This is equivalent to clampMaxNumElements, but saves a check. llvm-svn: 353188
* [X86][SSE] Add SimplifyDemandedVectorElts support for X86ISD::BLENDVSimon Pilgrim2019-02-051-0/+21
| | | | llvm-svn: 353165
* [X86][AVX] Attempt to share broadcasts of different widths (PR39454)Simon Pilgrim2019-02-051-0/+8
| | | | | | | | If we have broadcasts of different vector widths, keep the longest vector width and extract subvectors for the shorter vectors (which should be free). Differential Revision: https://reviews.llvm.org/D57663 llvm-svn: 353154
* [CGP] Add support for sinking operands to their users, if they are free.Florian Hahn2019-02-052-0/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152
* [ARM GlobalISel] Support G_GEP for Thumb2Diana Picus2019-02-052-3/+3
| | | | | | Same as ARM, but use a different opcode in the instruction selection. llvm-svn: 353151
* [X86] Connect the default fpsr and dirflag clobbers in inline assembly to ↵Craig Topper2019-02-052-1/+9
| | | | | | | | | | | | | | | | | | | | | the registers we have defined for them. Summary: We don't currently map these constraints to physical register numbers so they don't make it to the MachineIR representation of inline assembly. This could have problems for proper dependency tracking in the machine schedulers though I don't have a test case that shows that. Reviewers: rnk Reviewed By: rnk Subscribers: eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57641 llvm-svn: 353141
* [WebAssembly] Fix indentation after adding IsCanonical property (NFC)Heejin Ahn2019-02-052-11/+11
| | | | llvm-svn: 353132
* [WebAssembly] Make disassembler always emit most canonical name.Wouter van Oortmerssen2019-02-054-11/+22
| | | | | | | | | | | | | | | | | | | | | Summary: There are a few instructions that all map to the same opcode, so when disassembling, we have to pick one. That was just the first one before (the except_ref variant in the case of "call"), now it is the one marked as IsCanonical in tablegen, or failing that, the shortest name (which is typically the "canonical" one). Also introduced a canonical "end" instruction for this purpose. Reviewers: dschuff, tlively Subscribers: sbc100, jgravelle-google, aheejin, llvm-commits, sunfish Tags: #llvm Differential Revision: https://reviews.llvm.org/D57713 llvm-svn: 353131
* [WebAssembly] memory.copyThomas Lively2019-02-056-0/+66
| | | | | | | | | | | | Summary: Depends on D57495. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish Differential Revision: https://reviews.llvm.org/D57498 llvm-svn: 353127
* AMDGPU: Don't rematerialize mov with implicit operandsMatt Arsenault2019-02-041-1/+2
| | | | | | | This was pulling the mov used for register indexing on gfx9 out of the loop. llvm-svn: 353101
* [CodeGen][ARC][SystemZ][WebAssembly] Use MachineInstr::isInlineAsm in more ↵Craig Topper2019-02-044-7/+7
| | | | | | | | | | places instead of just comparing opcode. NFCI I'm looking at adding a second INLINEASM opcode for better modeling asm-goto as a terminator. Using the existing predicate will reduce teh number of places that will need to use the new opcode. llvm-svn: 353095
* [AMDGPU] Support emitting GOT relocations for function callsScott Linder2019-02-043-41/+20
| | | | | | Differential Revision: https://reviews.llvm.org/D57416 llvm-svn: 353083
* [WebAssembly] clang-tidy (NFC)Heejin Ahn2019-02-0434-408/+398
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes clang-tidy warnings on wasm-only files. The list of checks used is: `-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,readability-identifier-naming,modernize-*` (LLVM's default .clang-tidy list is the same except it does not have `modernize-*`. But I've seen in multiple CLs in LLVM the modernize style was recommended and code was fixed based on the style, so I added it as well.) The common fixes are: - Variable names start with an uppercase letter - Function names start with a lowercase letter - Use `auto` when you use casts so the type is evident - Use inline initialization for class member variables - Use `= default` for empty constructors / destructors - Use `using` in place of `typedef` Reviewers: sbc100, tlively, aardappel Subscribers: dschuff, sunfish, jgravelle-google, yurydelendik, kripken, MatzeB, mgorny, rupprecht, llvm-commits Differential Revision: https://reviews.llvm.org/D57500 llvm-svn: 353075
* [X86] X86DAGToDAGISel::matchBitExtract(): prepare 'control' in 32 bitsRoman Lebedev2019-02-041-17/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Noticed while looking at D56052. ``` // The 'control' of BEXTR has the pattern of: // [15...8 bit][ 7...0 bit] location // [ bit count][ shift] name // I.e. 0b000000011'00000001 means (x >> 0b1) & 0b11 ``` I.e. we do not care about any of the bits aside from the low 16 bits. So there is no point in doing the `slh`,`or` in 64 bits, let's just do everything in 32 bits, and anyext if needed. We could do that in 16 even, but we intentionally don't zext to i16 (longer encoding IIRC), so i'm guessing the same applies here. Reviewers: craig.topper, andreadb, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D56715 llvm-svn: 353073
* [X86] Add ST0 as an implicit def/use of x87 load/store instructions during ↵Craig Topper2019-02-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | FP stackifying. These instructions implicitly operate on ST0, but we don't currently add that information to the MachineInstr. We also don't add it the tablegen definitions either. For the most part this doesn't cause any problems because the stackifying occurs after register allocation. All the instructions are marked as having side effects so the postRA scheduler won't reorder them amongst themselves. But nothing stops inline assembly using X87 instructions from being reordered around other x87 instructions if that inline assembly wasn't marked volatile. The two test cases I've identified so far in PR40539 involve loads and stores used to set up the inline assembly or capture the results of the inline assembly ending up in the wrong order. This patch adds implicit ST0 uses/defs to the load/store instructions to prevent this from happening. I plan to fix all of the FP instructions, but the binops are bit trickier to get right. So I've chosen fixing the known test cases as a good first step. I think we also need to update the tablegen descriptions so MS inline assembly infers the right clobbers, but I haven't checked that yet. Differential Revision: https://reviews.llvm.org/D57644 llvm-svn: 353070
* [WebAssembly] Make segment/size/type directives optional in asmWouter van Oortmerssen2019-02-041-1/+35
| | | | | | | | | | | | | | | | | | | | Summary: These were "boilerplate" that repeated information already present in .functype and end_function, that needed to be repeated to Please the particular way our object writing works, and missing them would generate errors. Instead, we generate the information for these automatically so the user can concern itself with writing more canonical wasm functions that always work as expected. Reviewers: dschuff, sbc100 Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D57546 llvm-svn: 353067
* [WebAssembly] Rename relocations from R_WEBASSEMBLY_ to R_WASM_Sam Clegg2019-02-041-11/+11
| | | | | | | | | | | | | | | | See https://github.com/WebAssembly/tool-conventions/pull/95. This is less typing and IMHO more readable, and it also fits with our naming around the binary format which tends to use the short name. e.g. include/llvm/BinaryFormat/Wasm.h tools/llvm-objdump/WasmDump.cpp etc.. Differential Revision: https://reviews.llvm.org/D57611 llvm-svn: 353062
* [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two ↵Craig Topper2019-02-042-42/+44
| | | | | | | | | | arguments where on is %st. All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read. This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior. llvm-svn: 353061
* [X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign maskSimon Pilgrim2019-02-041-0/+7
| | | | | | | | For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly. Differential Revision: https://reviews.llvm.org/D57667 llvm-svn: 353051
* AMDGPU/GlobalISel: Legalize select for v4s16Matt Arsenault2019-02-041-3/+3
| | | | | | | Also add some more select tests to help show future legalization changes. llvm-svn: 353045
* [AsmPrinter] Remove hidden flag -print-schedule.Andrea Di Biagio2019-02-0414-48/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
* Use auto for dyn_cast case to save a line. NFCI.Simon Pilgrim2019-02-041-2/+1
| | | | llvm-svn: 353041
* [ARM] Mark 255 and 65535 as cheap for Thumb1 "And"David Green2019-02-041-3/+7
| | | | | | | | | | This prevents Constant Hoisting from pulling the constant out of the block, allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap by the default getIntImmCost, but I've added it for clarity. Differential Revision: https://reviews.llvm.org/D57671 llvm-svn: 353040
* Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."Craig Topper2019-02-042-2/+6
| | | | | | | | | | | | | | We now print ST0 as 'st' when generating the clobber list for MS inline assembly in clang. This matches what the gcc reg name list expects. Original commit message: This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler. Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler. Differential Revision: https://reviews.llvm.org/D57298 llvm-svn: 353016
* [X86] Print %st(0) as %st when its implicit to the instruction. Continue ↵Craig Topper2019-02-048-49/+75
| | | | | | | | printing it as %st(0) when its encoded in the instruction. This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior. llvm-svn: 353015
OpenPOWER on IntegriCloud