summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Remove an unnecessary 'if' that prevented treating INT64_MAX and ↵Craig Topper2018-07-271-2/+27
| | | | | | | | -INT64_MAX as power of 2 minus 1 in the multiply expansion code. Not sure why they were being explicitly excluded, but I believe all the math inside the if works. I changed the absolute value to be uint64_t instead of int64_t so INT64_MIN+1 wouldn't be signed wrap. llvm-svn: 338101
* [X86] Add matching for another pattern of PMADDWD.Craig Topper2018-07-271-0/+370
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the pattern you get from the loop vectorizer for something like this int16_t A[1024]; int16_t B[1024]; int32_t C[512]; void pmaddwd() { for (int i = 0; i != 512; ++i) C[i] = (A[2*i]*B[2*i]) + (A[2*i+1]*B[2*i+1]); } In this case we will have (add (mul (build_vector), (build_vector)), (mul (build_vector), (build_vector))). This is different than the pattern we currently match which has the build_vectors between an add and a single multiply. I'm not sure what C code would get you that pattern. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49636 llvm-svn: 338097
* [X86] When removing sign extends from gather/scatter indices, make sure we ↵Craig Topper2018-07-271-0/+51
| | | | | | | | handle UpdateNodeOperands finding an existing node to CSE with. If this happens the operands aren't updated and the existing node is returned. Make sure we pass this existing node up to the DAG combiner so that a proper replacement happens. Otherwise we get stuck in an infinite loop with an unoptimized node. llvm-svn: 338090
* [SelectionDAGBuilder] Add masked loads to PendingLoads rather than calling ↵Craig Topper2018-07-263-26/+23
| | | | | | | | | | DAG.setRoot. Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load. This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization. llvm-svn: 338085
* [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bitsScott Linder2018-07-261-0/+213
| | | | | | | | | | Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060
* [RISCV] Add support for _interrupt attributeAna Pazos2018-07-265-0/+1317
| | | | | | | | | | | | | | | | | | | | | - Save/restore only registers that are used. This includes Callee saved registers and Caller saved registers (arguments and temporaries) for integer and FP registers. - If there is a call in the interrupt handler, save/restore all Caller saved registers (arguments and temporaries) and all FP registers. - Emit special return instructions depending on "interrupt" attribute type. Based on initial patch by Zhaoshi Zheng. Reviewers: asb Reviewed By: asb Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D48411 llvm-svn: 338047
* MacroFusion: Fix macro fusion with ExitSU failing in top-down schedulingMatthias Braun2018-07-261-0/+28
| | | | | | | | | | | | | | | | When fusing instructions A and B, we must add all predecessors of B as predecessors of A to avoid instructions getting scheduling in between. There is a special case involving ExitSU: Every other node must be scheduled before it by design and we don't need to make this explicit in the graph, however when fusing with a different node we need to schedule every othere node before the fused node too and we need to make this explicit now: This patch adds a dependency from the fused node to all roots in the graph. Differential Revision: https://reviews.llvm.org/D49830 llvm-svn: 338046
* [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes.Roman Lebedev2018-07-262-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: A follow-up for D49266 / rL337166. At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyhZZ We won't get to these cases with I1 being -1, as that will be constant-folded to true or false. I'm also not sure we actually hit the 'ule' case, but i think the worst think that could happen is that being dead code. Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49497 llvm-svn: 338044
* [mips] Sign extend i32 return values on MIPS64Stefan Maksimovic2018-07-267-38/+37
| | | | | | | | | | | | | Override getTypeForExtReturn so that functions returning an i32 typed value have it sign extended on MIPS64. Also provide patterns to get rid of unneeded sign extensions for arithmetic instructions which implicitly sign extend their results. Differential Revision: https://reviews.llvm.org/D48374 llvm-svn: 338019
* Revert "[COFF] Use comdat shared constants for MinGW as well"Martin Storsjo2018-07-262-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r337951. While that kind of shared constant generally works fine in a MinGW setting, it broke some cases of inline assembly that worked before: $ cat const-asm.c int MULH(int a, int b) { int rt, dummy; __asm__ ( "imull %3" :"=d"(rt), "=a"(dummy) :"a"(a), "rm"(b) ); return rt; } int func(int a) { return MULH(a, 1); } $ clang -target x86_64-win32-gnu -c const-asm.c -O2 const-asm.c:4:9: error: invalid variant '00000001' "imull %3" ^ <inline asm>:1:15: note: instantiated into assembly here imull __real@00000001(%rip) ^ A similar error is produced for i686 as well. The same test with a target of x86_64-win32-msvc or i686-win32-msvc works fine. llvm-svn: 338018
* [X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner ↵Craig Topper2018-07-261-9/+9
| | | | | | | | | | | | worklist in combineMul. I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited. So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations. Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now. llvm-svn: 338007
* [GlobalISel] Fall back to SDISel for swifterror/swiftself attributes.Amara Emerson2018-07-261-0/+23
| | | | | | We don't currently support these, fall back until we do. llvm-svn: 337994
* bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in orderYonghong Song2018-07-251-0/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some BPF JIT backends would want to optimize memcpy in their own architecture specific way. However, at the moment, there is no way for JIT backends to see memcpy semantics in a reliable way. This is due to LLVM BPF backend is expanding memcpy into load/store sequences and could possibly schedule them apart from each other further. So, BPF JIT backends inside kernel can't reliably recognize memcpy semantics by peephole BPF sequence. This patch introduce new intrinsic expand infrastructure to memcpy. To get stable in-order load/store sequence from memcpy, we first lower memcpy into BPF::MEMCPY node which then expanded into in-order load/store sequences in expandPostRAPseudo pass which will happen after instruction scheduling. By this way, kernel JIT backends could reliably recognize memcpy through scanning BPF sequence. This new memcpy expand infrastructure is gated by a new option: -bpf-expand-memcpy-in-order Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 337977
* [SelectionDAG] try to convert funnel shift directly to rotate if legalSanjay Patel2018-07-253-12/+6
| | | | | | | | | | | If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966
* [AArch, PowerPC] add more tests for legal rotate ops; NFCSanjay Patel2018-07-252-4/+47
| | | | llvm-svn: 337964
* [COFF] Use comdat shared constants for MinGW as wellMartin Storsjo2018-07-252-19/+2
| | | | | | | | | | | | | | | | | GNU binutils tools have no problems with this kind of shared constants, provided that we actually hook it up completely in AsmPrinter and produce a global symbol. This effectively reverts SVN r335918 by hooking the rest of it up properly. This feature was implemented originally in SVN r213006, with no reason for why it can't be used for MinGW other than the fact that GCC doesn't do it while MSVC does. Differential Revision: https://reviews.llvm.org/D49646 llvm-svn: 337951
* [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinterMartin Storsjo2018-07-251-0/+11
| | | | | | | | | | | | | | | | | | | In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950
* [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.Eli Friedman2018-07-251-0/+125
| | | | | | | | | | | | | Saves materializing the immediate for the "ands". Corresponding patterns exist for lsrs+lsls, but that seems less common in practice. Now implemented as a DAGCombine. Differential Revision: https://reviews.llvm.org/D49585 llvm-svn: 337945
* Fix corruption of result number in LegalizeVectorOps.cppUlrich Weigand2018-07-251-14/+14
| | | | | | | | | | | | When VectorLegalizer::LegalizeOp creates a new SDValue after iterating over its arguments, we need to refer to the same result number of the new node that the original value used. Reviewed by: cameron.mcinally Differential Revision: https://reviews.llvm.org/D49805 llvm-svn: 337939
* [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansionStanislav Mekhanoshin2018-07-251-0/+43
| | | | | | Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938
* [Hexagon] Properly scale bit index when extracting elements from vNi1Krzysztof Parzyszek2018-07-251-0/+18
| | | | | | | | For example v = <2 x i1> is represented as bbbbaaaa in a predicate register, where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4 from the predicate register. llvm-svn: 337934
* [MIPS GlobalISel] Lower pointer argumentsPetar Jovanovic2018-07-255-0/+323
| | | | | | | | | | | | Add support for lowering pointer arguments. Changing type from pointer to integer is already done in MipsTargetLowering::getRegisterTypeForCallingConv. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49419 llvm-svn: 337912
* Fix PR34170: Crash on inline asm with 64bit output in 32bit GPRThomas Preud'homme2018-07-251-0/+42
| | | | | | | | Add support for inline assembly with output operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR). llvm-svn: 337903
* [X86] Autogenerate complete checks and fix a failure introduced in r337875.Craig Topper2018-07-251-57/+178
| | | | llvm-svn: 337889
* [x86/SLH] Teach the x86 speculative load hardening pass to hardenChandler Carruth2018-07-251-16/+179
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | against v1.2 BCBS attacks directly. Attacks using spectre v1.2 (a subset of BCBS) are described in the paper here: https://people.csail.mit.edu/vlk/spectre11.pdf The core idea is to speculatively store over the address in a vtable, jumptable, or other target of indirect control flow that will be subsequently loaded. Speculative execution after such a store can forward the stored value to subsequent loads, and if called or jumped to, the speculative execution will be steered to this potentially attacker controlled address. Up until now, this could be mitigated by enableing retpolines. However, that is a relatively expensive technique to mitigate this particular flavor. Especially because in most cases SLH will have already mitigated this. To fully mitigate this with SLH, we need to do two core things: 1) Unfold loads from calls and jumps, allowing the loads to be post-load hardened. 2) Force hardening of incoming registers even if we didn't end up needing to harden the load itself. The reason we need to do these two things is because hardening calls and jumps from this particular variant is importantly different from hardening against leak of secret data. Because the "bad" data here isn't a secret, but in fact speculatively stored by the attacker, it may be loaded from any address, regardless of whether it is read-only memory, mapped memory, or a "hardened" address. The only 100% effective way to harden these instructions is to harden the their operand itself. But to the extent possible, we'd like to take advantage of all the other hardening going on, we just need a fallback in case none of that happened to cover the particular input to the control transfer instruction. For users of SLH, currently they are paing 2% to 6% performance overhead for retpolines, but this mechanism is expected to be substantially cheaper. However, it is worth reminding folks that this does not mitigate all of the things retpolines do -- most notably, variant #2 is not in *any way* mitigated by this technique. So users of SLH may still want to enable retpolines, and the implementation is carefuly designed to gracefully leverage retpolines to avoid the need for further hardening here when they are enabled. Differential Revision: https://reviews.llvm.org/D49663 llvm-svn: 337878
* [X86] Use a shift plus an lea for multiplying by a constant that is a power ↵Craig Topper2018-07-253-22/+154
| | | | | | | | of 2 plus 2/4/8. The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2. llvm-svn: 337875
* [X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we ↵Craig Topper2018-07-253-0/+147
| | | | | | do for pow2 - 2. llvm-svn: 337874
* [X86] Use a two lea sequence for multiply by 37, 41, and 73.Craig Topper2018-07-244-59/+105
| | | | | | These fit a pattern used by 11, 21, and 19. llvm-svn: 337871
* [X86] Add test cases for multiply by 37, 41, and 73.Craig Topper2018-07-243-0/+330
| | | | | | These can all be handled with 2 LEAs similar to what we do for 11, 19, 21. llvm-svn: 337870
* [X86] Change multiply by 26 to use two multiplies by 5 and an add instead of ↵Craig Topper2018-07-244-30/+34
| | | | | | | | multiply by 3 and 9 and a subtract. Same number of operations, but ending in an add is friendlier due to it being commutable. llvm-svn: 337869
* [X86] When expanding a multiply by a negative of one less than a power of 2, ↵Craig Topper2018-07-241-12/+11
| | | | | | | | | | like 31, don't generate a negate of a subtract that we'll never optimize. We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway. This patch makes use explicitly emit the swapped subtract. llvm-svn: 337858
* [X86] Generalize the multiply by 30 lowering to generic multipy by power 2 ↵Craig Topper2018-07-244-44/+206
| | | | | | | | | | minus 2. Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA. Use this for multiply by 14 as well. llvm-svn: 337856
* [WebAssembly] Add tests for weaker memory consistency orderingsHeejin Ahn2018-07-241-0/+141
| | | | | | | | | | | | | | | | Summary: Currently all wasm atomic memory access instructions are sequentially consistent, so even if LLVM IR specifies weaker orderings than that, we should upgrade them to sequential ordering and treat them in the same way. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49194 llvm-svn: 337854
* [X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1.Craig Topper2018-07-244-32/+28
| | | | | | The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub. llvm-svn: 337851
* [X86] Add test case to show failure to combine away negates that may be ↵Craig Topper2018-07-241-0/+64
| | | | | | | | | | created by mul by constant expansion. Mul by constant can expand to a sequence that ends with a negate. If the next instruction is an add or sub we might be able to fold the negate away. We currently fail to do this because we explicitly don't add anything to the DAG combine worklist when we expand multiplies. This is primarily to keep the multipy from being reformed, but we should consider adding the users to worklist. llvm-svn: 337843
* [mips] Fix local dynamic TLS with Sym64Simon Atanasyan2018-07-241-4/+5
| | | | | | | | | | | | | | | | For the final DTPREL addition, rather than a lui/daddiu/daddu triple, LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64. Instead, use a new TlsHi node and, although unnecessary due to the exact structure of the nodes emitted, use TlsHi for local exec too to prevent future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes, and TprelHi since its functionality is provided by the new common TlsHi node. Patch by James Clarke. Differential revision: https://reviews.llvm.org/D49259 llvm-svn: 337827
* [ARM] Disable ARMCodeGenPrepare by defaultSam Parker2018-07-243-11/+11
| | | | | | | | ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821
* [x86] Clean up and convert test to use generated CHECK lines.Chandler Carruth2018-07-241-70/+122
| | | | | | | | | | | | This test was already checking microscopic behavior of tail call under specific conditions. This just makes the CHECK lines much more consistent, clear, and easily updated when intentional changes are made. I've also switched the test to consistently name the entry block and to order the helper declarations and comments for specific tests in the more usual locations. llvm-svn: 337806
* [x86] Update the CHECK lines of this test to use the latest patternsChandler Carruth2018-07-241-42/+42
| | | | | | from the script. This minimizes the diff in subsequent changes. llvm-svn: 337805
* AMDGPU/GlobalISel: Legalize G_INSERTTom Stellard2018-07-241-0/+123
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798
* Fix typo in test/CodeGen/Mips/dins.llThomas Anderson2018-07-231-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D49704 llvm-svn: 337771
* [COFF] Fix assembly output of comdat sections without an attached symbolMartin Storsjo2018-07-231-0/+81
| | | | | | | | | | | | | | | | Since SVN r335286, the .xdata sections are produced without an attached symbol, which requires using a different syntax when printing assembly output. Instead of the usual syntax of '.section <name>,"dr",discard,<symbol>', use '.section <name>,"dr"' + '.linkonce discard' (which is what GCC uses for all assembly output). This fixes PR38254. Differential Revision: https://reviews.llvm.org/D49651 llvm-svn: 337756
* [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classesMartin Storsjo2018-07-231-0/+13
| | | | | | | | | | | | | | | | This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755
* Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵Reid Kleckner2018-07-235-5/+393
| | | | | | | | | | | | | | models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740
* Add inline asm aliasing test.Nirav Dave2018-07-231-0/+140
| | | | llvm-svn: 337734
* [Hexagon] Handle unnamed globals in HexagonConstExprKrzysztof Parzyszek2018-07-231-0/+38
| | | | | | Instead of comparing names, compare positions in the parent module. llvm-svn: 337723
* [mips] Add more checks to the tls.ll test case. NFCSimon Atanasyan2018-07-231-49/+106
| | | | llvm-svn: 337705
* [FPEnv] Legalize double width StrictFP vector operationsCameron McInally2018-07-231-48/+940
| | | | | | Differential Revision: https://reviews.llvm.org/D48809 llvm-svn: 337698
* [ARM] ARMCodeGenPrepare backend passSam Parker2018-07-233-0/+905
| | | | | | | | | | | | | | | | | | | | | | Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687
* [x86/SLH] Fix a bug where we would harden tail calls twice -- once asChandler Carruth2018-07-231-6/+0
| | | | | | | | | a call, and then again as a return. Also added a comment to try and explain better why we would be doing what we're doing when hardening the (non-call) returns. llvm-svn: 337673
OpenPOWER on IntegriCloud