summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global ↵QingShan Zhang2018-12-033-15/+36
| | | | | | | | | | | is accessed as got-indirect or not. In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. Differential Revision: https://reviews.llvm.org/D54925 llvm-svn: 348108
* [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a ↵Craig Topper2018-12-021-0/+12
| | | | | | bitcast and a store of a iX scalar. llvm-svn: 348104
* [X86] Fix bad comment. NFCCraig Topper2018-12-021-1/+1
| | | | llvm-svn: 348103
* [ValueTracking] Support funnel shifts in computeKnownBits()Nikita Popov2018-12-021-0/+21
| | | | | | | | | | | | If the shift amount is known, we can determine the known bits of the output based on the known bits of two inputs. This is essentially the same functionality as implemented in D54869, but for ValueTracking rather than InstCombine SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D55140 llvm-svn: 348091
* [SelectionDAG] fold constant with undef vector per elementSanjay Patel2018-12-021-10/+16
| | | | | | | | | | | | This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090
* [DAGCombiner] guard against an oversized shift crashSanjay Patel2018-12-021-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089
* [ValueTracking] add helper function for testing implied condition; NFCISanjay Patel2018-12-023-60/+46
| | | | | | | | We were duplicating code around the existing isImpliedCondition() that checks for a predecessor block/dominating condition, so make that a wrapper call. llvm-svn: 348088
* [X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast.Craig Topper2018-12-021-23/+8
| | | | | | Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations. llvm-svn: 348087
* [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to ↵Craig Topper2018-12-021-2/+8
| | | | | | | | avoid a store/load to/from the stack. Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register. llvm-svn: 348086
* [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode ↵Craig Topper2018-12-021-3/+4
| | | | | | | | | | similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code. llvm-svn: 348085
* [MachineOutliner][AArch64] Improve checks for stack instructionsJessica Paquette2018-12-011-1/+23
| | | | | | | | | | | | | If we know that we'll definitely save LR to a register, there's no reason to pre-check whether or not a stack instruction is unsafe to fix up. This makes it so that we check for that condition before mapping instructions. This allows us to outline more, since we don't pessimise as many instructions. Also update some tests, since we outline more. llvm-svn: 348081
* [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1Craig Topper2018-12-011-8/+11
| | | | | | | | | | | | | | Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55138 llvm-svn: 348079
* [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XORGraham Sellers2018-12-012-7/+55
| | | | | | | | | The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit. Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it. Differential: https://reviews.llvm.org/D55071 llvm-svn: 348075
* [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts ↵Simon Pilgrim2018-12-011-5/+3
| | | | | | | | | | | | simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073
* [InstCombine] Support ssub.sat canonicalization for non-splatsNikita Popov2018-12-012-16/+12
| | | | | | | | | | | | Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision: https://reviews.llvm.org/D55011 llvm-svn: 348072
* [ThinLTO] Allow importing of functions with var argsTeresa Johnson2018-12-011-9/+2
| | | | | | | | | | | | | | | | | | Summary: Follow up to D54270, which allowed importing of var args functions unless they called va_start. As pointed out in the post-commit comments on that patch, the inliner can handle functions that call va_start in certain situations as well. Go ahead and enable importing of all var args functions. Measurements on a large binary show that this increases imports and binary size by an insignificant amount. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54607 llvm-svn: 348068
* [RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test casesAlex Bradbury2018-12-011-26/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. SRAW assumed that (sext_inreg foo, i32) could only be produced when sign-extended an i32. However, it can be produced by input such as: define i64 @tricky_ashr(i64 %a, i64 %b) { %1 = shl i64 %a, 32 %2 = ashr i64 %1, 32 %3 = ashr i64 %2, %b ret i64 %3 } It's important not to select sraw in the above case, because sraw only uses bits lower 5 bits from the shift, while a shift of 32-63 would be valid. Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be produced when zero-extending a value that was originally i32 in LLVM IR. This is obviously incorrect. This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and adds test cases that would demonstrate a miscompile if the incorrect patterns were re-added. llvm-svn: 348067
* Use RequireNullTerminator=false in identify_magic.Zachary Turner2018-12-011-1/+1
| | | | | | | | | | | identify_magic does not need the file to be null terminated. Passing true here causes the file reading code to decide not to use mmap in some rare cases (which happen to be true 100% of the time in PDB files) which can lead to very large files failing to load. Since it was probably just an accident that we were passing true here (since it is the default function parameter), this should be strictly an improvement. llvm-svn: 348059
* [NVPTX] Add lowering of i128 numbers as struct fieldsArtem Belevich2018-12-011-0/+12
| | | | | | | | | | | Addition to D34555 - override VTs computation with ComputePTXValueVTs for struct fields. Author: Denys Zariaiev<denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55144 llvm-svn: 348057
* LegacyDivergenceAnalysis: fix uninitialized valueNicolai Haehnle2018-11-301-1/+3
| | | | | Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26 llvm-svn: 348051
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-11-306-220/+90
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050
* AMDGPU: Fix various issues around the VirtReg2Value mappingNicolai Haehnle2018-11-302-31/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049
* [DA] GPUDivergenceAnalysis for unstructured GPU kernelsNicolai Haehnle2018-11-302-26/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is patch #3 of the new DivergenceAnalysis <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <https://bugs.llvm.org/show_bug.cgi?id=37185> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D53493 llvm-svn: 348048
* [MachineOutliner] Outline both register save calls + no LR save calls togetherJessica Paquette2018-11-301-32/+26
| | | | | | | | | | | | | | | | | Instead of treating the outlined functions for these as distinct frames, they should be combined into one case. Neither allows for stack fixups, and both generate the same frame. Thus, they ought to be considered one case. This makes the code far easier to understand, for one thing. It also offers some small code size improvements. It's fairly rare to see a class of outlined functions that doesn't fall entirely into one variant (on CTMark anyway). It does happen from time to time though. This mostly offers some serious simplification. Also update the test to show the added functionality. llvm-svn: 348036
* AArch64: Don't emit CFI for SCS register in nounwind functions.Peter Collingbourne2018-11-301-14/+16
| | | | | | | | | | All that you can legitimately do with the CFI for a nounwind function is get a backtrace, and adjusting the SCS register is not (currently) required for this purpose. Differential Revision: https://reviews.llvm.org/D54988 llvm-svn: 348035
* [Mem2Reg] Fix nondeterministic corner caseJoseph Tremoulet2018-11-301-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Summary: When mem2reg inserts phi nodes in blocks with unreachable predecessors, it adds undef operands for those incoming edges. When there are multiple such predecessors, the order is currently based on the address of the BasicBlocks. This change fixes that by using the BBNumbers in the sort/search predicates, as is done elsewhere in mem2reg to ensure determinism. Also adds a testcase with a bunch of unreachable preds, which (nodeterministically) fails without the fix. Reviewers: majnemer Reviewed By: majnemer Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D55077 llvm-svn: 348024
* [DWARFv5] Verify all-or-nothing constraint on DIFile sourceScott Linder2018-11-301-0/+18
| | | | | | | | | Update IR verifier to check the constraint that DIFile source is present on all files or no files. Differential Revision: https://reviews.llvm.org/D54953 llvm-svn: 348022
* [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes ↵Craig Topper2018-11-301-54/+47
| | | | | | | | instead of extracting and concating low and high half registers. This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation. llvm-svn: 348019
* [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked ↵Craig Topper2018-11-301-5/+5
| | | | | | | | operation when avx512bw/avx512vl is enabled. This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates. llvm-svn: 348018
* [SelectionDAG] fold FP binops with 2 undef operands to undefSanjay Patel2018-11-301-2/+4
| | | | llvm-svn: 348016
* [AMDGPU] Disable SReg Global LD/ST, perf regressionRon Lieberman2018-11-301-0/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D55093 llvm-svn: 348014
* Revert "[BTF] Add BTF DebugInfo"Yonghong Song2018-11-308-1117/+2
| | | | | | This reverts commit 9c6b970db8bc63b28ce58a129bb1580a6a3c6caf. llvm-svn: 348004
* [BTF] Add BTF DebugInfoYonghong Song2018-11-308-2/+1117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds BPF Debug Format (BTF) as a standalone LLVM debuginfo. The BTF related sections are directly generated from IR. The BTF debuginfo is generated only when the compilation target is BPF. What is BTF? ============ First, the BPF is a linux kernel virtual machine and widely used for tracing, networking and security. https://www.kernel.org/doc/Documentation/networking/filter.txt https://cilium.readthedocs.io/en/v1.2/bpf/ BTF is the debug info format for BPF, introduced in the below linux patch https://github.com/torvalds/linux/commit/69b693f0aefa0ed521e8bd02260523b5ae446ad7#diff-06fb1c8825f653d7e539058b72c83332 in the patch set mentioned in the below lwn article. https://lwn.net/Articles/752047/ The BTF format is specified in the above github commit. In summary, its layout looks like struct btf_header type subsection (a list of types) string subsection (a list of strings) With such information, the kernel and the user space is able to pretty print a particular bpf map key/value. One possible example below: Withtout BTF: key: [ 0x01, 0x01, 0x00, 0x00 ] With BTF: key: struct t { a : 1; b : 1; c : 0} where struct is defined as struct t { char a; char b; short c; }; How BTF is generated? ===================== Currently, the BTF is generated through pahole. https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69 and available in pahole v1.12 https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4 Basically, the bpf program needs to be compiled with -g with dwarf sections generated. The pahole is enhanced such that a .BTF section can be generated based on dwarf. This format of the .BTF section matches the format expected by the kernel, so a bpf loader can just take the .BTF section and load it into the kernel. https://github.com/torvalds/linux/commit/8a138aed4a807ceb143882fb23a423d524dcdb35 The .BTF section layout is also specified in this patch: with file include/llvm/BinaryFormat/BTF.h. What use cases this patch tries to address? =========================================== Currently, only the bpf instruction stream is required to pass to the kernel. The kernel verifies it, jits it if configured to do so, attaches it to a particular kernel attachment point, and later executes when a particular event happens. This patch tries to expand BTF to support two more use cases below: (1). BPF supports subroutine calls. During performance analysis, it would be good to differentiate which call is hot instead of just providing a virtual address. This would require to pass a unique identifier for each subroutine to the kernel, the subroutine name is a natual choice. (2). If a particular jitted instruction is hot, we want user to know which source line this jitted instruction belongs to. This would require the source information is available to various profiling tools. Note that in a single ELF file, . there may be multiple loadable bpf programs, . for a particular to-be-loaded bpf instruction stream, its instructions may come from multiple PROGBITS sections, the bpf loader needs to merge them together to a single consecutive insn stream before loading to the kernel. For example: section .text: subroutines funcFoo section _progA: calling funcFoo section _progB: calling funcFoo The bpf loader could construct two loadable bpf instruction streams and load them into the kernel: . _progA funcFoo . _progB funcFoo So per ELF section function offset and instruction offset will need to be adjusted before passing to the kernel, and the kernel essentially expect only one code section regardless of how many in the ELF file. What do we propose and Why? =========================== To support the above two use cases, we propose to add an additional section, .BTF.ext, to the ELF file which is the input of the bpf loader. A different section is preferred since loader may need to manipulate it before loading part of its data to the kernel. The .BTF.ext section has a similar header to the .BTF section and it contains two subsections for func_info and line_info. . the func_info maps the func insn byte offset to a func type in the .BTF type subsection. . the line_info maps the insn byte offset to a line info. . both func_info and line_info subsections are organized by ELF PROGBITS AX sections. pahole is not a good place to implement .BTF.ext as pahole is mostly for structure hole information and more importantly, we want to pass the actual code to the kernel. . bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical as you may not even know who loads this bpf program. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used. LLVM is a good place to implement. . The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. . Another consideration is for JIT. The project like bcc (https://github.com/iovisor/bcc) use MCJIT to compile a C program into bpf insns and load them to the kernel. The llvm generated BTF sections will be readily available for such cases as well. Design and implementation of emiting .BTF/.BTF.ext sections =========================================================== The BTF debuginfo format is defined. Both .BTF and .BTF.ext sections are generated directly from IR when both "-target bpf" and "-g" are specified. Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target. This patch also contains tests to verify generated .BTF and .BTF.ext sections for all supported types, func_info and line_info subsections. The patch is also tested against linux kernel bpf sample tests and selftests. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D53736 llvm-svn: 347999
* [CodeGen] Prefer static frame index for STATEPOINT liveness argsThan McIntosh2018-11-301-1/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: If a given liveness arg of STATEPOINT is at a fixed frame index (e.g. a function argument passed on stack), prefer to use this fixed location even the address is also in a register. If we use the register it will generate a spill, which is not necessary since the fixed frame index can be directly recorded in the stack map. Patch by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, niravd, reames Reviewed By: reames Subscribers: cherryyz, reames, anna, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53889 llvm-svn: 347998
* [SLP]PR39774: Update references of the replaced external instructions.Alexey Bataev2018-11-301-0/+2
| | | | | | | | | | | | | | | Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55017 llvm-svn: 347997
* [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)Valery Pykhtin2018-11-3012-50/+711
| | | | | | | | Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993
* TableGen/ISel: Allow PatFrag predicate code to access captured operandsNicolai Haehnle2018-11-301-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This simplifies writing predicates for pattern fragments that are automatically re-associated or commuted. For example, a followup patch adds patterns for fragments of the form (add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are automatically commuted to (add $z, (shl $x, $y)), which makes it basically impossible to refer to $x, $y, and $z generically in the PredicateCode. With this change, the PredicateCode can refer to $x, $y, and $z simply as `Operands[i]`. Test confirmed that there are no changes to any of the generated files when building all (non-experimental) targets. Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D51994 llvm-svn: 347992
* [RISCV] Add additional CSR instruction aliases (imm. operands)Alex Bradbury2018-11-301-0/+10
| | | | | | | | | | | | | | This patch adds CSR instructions aliases for the cases where the instruction takes an immediate operand but the alias doesn't have the i suffix. This is necessary for gas/gcc compatibility. gas doesn't do a similar conversion for fsflags or fsrm, so this should be complete. Differential Revision: https://reviews.llvm.org/D55008 Patch by Luís Marques. llvm-svn: 347991
* Fix parenthesis warning in IVDescriptorsRenato Golin2018-11-301-3/+3
| | | | llvm-svn: 347990
* Add a new reduction pattern matchRenato Golin2018-11-301-6/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adding a new reduction pattern match for vectorizing code similar to TSVC s3111: for (int i = 0; i < N; i++) if (a[i] > b) sum += a[i]; This patch adds support for fadd, fsub and fmull, as well as multiple branches and different (but compatible) instructions (ex. add+sub) in different branches. The difference from the previous patch(https://reviews.llvm.org/D49168) is as follows: - Added check of fast-math property of fp-instruction to the previous patch - Fix/add some pattern for if-reduction.ll Differential Revision: https://reviews.llvm.org/D54464 Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org> and Masakazu Ueno <masakazu.ueno@linaro.org> llvm-svn: 347989
* [RISCV] Add UNIMP instruction (32- and 16-bit forms)Alex Bradbury2018-11-302-0/+17
| | | | | | | | | | | | | | | | | This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, but still follows the 16-bit instruction form (i.e. bits 0-1 != 11). Until recently unimp was undocumented and supported just by binutils, which printed unimp for either the 16 or 32-bit form. Both forms are now documented <https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>. Differential Revision: https://reviews.llvm.org/D54316 Patch by Luís Marques. llvm-svn: 347988
* [SelectionDAG] Support result type promotion for FLT_ROUNDS_Alex Bradbury2018-11-302-0/+10
| | | | | | | | | For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_. Differential Revision: https://reviews.llvm.org/D53820 llvm-svn: 347986
* [SelectionDAG] Support promotion of PREFETCH operandsAlex Bradbury2018-11-302-0/+15
| | | | | | | | | For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operands of ISD::PREFETCH. Differential Revision: https://reviews.llvm.org/D53281 llvm-svn: 347980
* [LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783Max Kazantsev2018-11-301-6/+13
| | | | | | | | | | | | | | | | Terminator folding transform lacks MemorySSA update for memory Phis, while they exist within MemorySSA analysis. They need exactly the same type of updates as regular Phis. Failing to update them properly ends up with inconsistent MemorySSA and manifests in various assertion failures. This patch adds Memory Phi updates to this transform. Thanks to @jonpa for finding this! Differential Revision: https://reviews.llvm.org/D55050 Reviewed By: asbirlea llvm-svn: 347979
* [SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operandsAlex Bradbury2018-11-302-0/+10
| | | | | | | | | For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operand. Differential Revision: https://reviews.llvm.org/D53279 llvm-svn: 347978
* [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement ↵Alex Bradbury2018-11-304-10/+27
| | | | | | | | | | | | | | | for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977
* [RISCV] Introduce codegen patterns for instructions introduced in RV64IAlex Bradbury2018-11-302-1/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in the RFC <http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit RISC-V has i64 as the only legal integer type. This patch introduces patterns to support codegen of the new instructions introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, sraiw, ld, sd. Custom selection code is needed for srliw as SimplifyDemandedBits will remove lower bits from the mask, meaning the obvious pattern won't work: def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32), (SRLIW GPR:$rs1, uimm5:$shamt)>; This is sufficient to compile and execute all of the GCC torture suite for RV64I other than those files using frameaddr or returnaddr intrinsics (LegalizeDAG doesn't know how to promote the operands - a future patch addresses this). When promoting i32 sltu/sltiu operands, it would be more efficient to use sign-extension rather than zero-extension for RV64. A future patch adds a hook to allow this. Differential Revision: https://reviews.llvm.org/D52977 llvm-svn: 347973
* [X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a ↵Craig Topper2018-11-301-6/+1
| | | | | | shuffle. llvm-svn: 347967
* [X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we ↵Craig Topper2018-11-301-13/+18
| | | | | | | | get after DAG combine cleans it up. Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly. llvm-svn: 347966
* [ARM] Don't expand sdiv when optimising for minsizeSjoerd Meijer2018-11-302-0/+47
| | | | | | | | | | | | | | | Don't expand SDIV with an immediate that is a power of 2 if we optimise for minimum code size. For example: sdiv %1, i32 4 gets expanded to a sequence of 3 instructions, but this is suboptimal for minimum code size so instead we just generate a MOV and a SDIV if integer division is supported. Differential Revision: https://reviews.llvm.org/D54546 llvm-svn: 347965
OpenPOWER on IntegriCloud