summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [AggressiveInstCombine] convert rotate with guard branch into funnel shift ↵Sanjay Patel2018-12-171-1/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR34924) Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. In the worst case (a target that doesn't have rotate instructions), we will expand this into a branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very likely to be a perf improvement over the original code. The motivating source code pattern for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=34924 Background: I looked at several different options before deciding where to try this - instcombine, simplifycfg, CGP - because it doesn't fit cleanly anywhere AFAIK. The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to have the IR converted before we reach things like vectorization because the reduced code can make a loop much simpler to transform. Technically, this could be included in instcombine, but it's a large pattern match that includes control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch. So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. This only runs at -O3, but that seems like a reasonable limitation given that source code has many options to avoid this pattern (including the recently added clang intrinsics for rotates). I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug with the new pass manager that's independent of this change (but it will be masked if we canonicalize harder to funnel shift intrinsics in instcombine). Differential Revision: https://reviews.llvm.org/D55604 llvm-svn: 349396
* [SDAG] Clarify the origin of chain in REG_SEQUENCE in comment, NFCKrzysztof Parzyszek2018-12-171-1/+3
| | | | llvm-svn: 349391
* [SelectionDAG] Fix noop detection for vectors in AssertZext/AssertSext in ↵Craig Topper2018-12-171-2/+2
| | | | | | | | | | | | getNode The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them. Similarly for the assert above it. I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868 llvm-svn: 349390
* [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)Sanjay Patel2018-12-172-12/+8
| | | | | | | | | | | | | | | | | | | | | The problem is shown specifically for a case with vector multiply here: https://bugs.llvm.org/show_bug.cgi?id=40032 ...and this might mask the original backend bug for ARM shown in: https://bugs.llvm.org/show_bug.cgi?id=39967 As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision: https://reviews.llvm.org/D55744 llvm-svn: 349389
* Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero ↵Craig Topper2018-12-171-22/+50
| | | | | | | | | | | | flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385
* NFC: remove unused variableJF Bastien2018-12-171-1/+0
| | | | | | D55768 removed its use. llvm-svn: 349377
* [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)Simon Pilgrim2018-12-173-51/+131
| | | | | | | | | | This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374
* [InstSimplify] Simplify saturating add/sub + icmpNikita Popov2018-12-171-0/+66
| | | | | | | | | | | | | If a saturating add/sub has one constant operand, then we can determine the possible range of outputs it can produce, and simplify an icmp comparison based on that. The implementation is based on a similar existing mechanism for simplifying binary operator + icmps. Differential Revision: https://reviews.llvm.org/D55735 llvm-svn: 349369
* FastIsel: take care to update iterators when removing instructions.Tim Northover2018-12-175-7/+22
| | | | | | | | | | We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365
* Add missing include file.Zachary Turner2018-12-171-0/+1
| | | | llvm-svn: 349363
* [PDB] Add some helper functions for working with scopes.Zachary Turner2018-12-172-2/+39
| | | | llvm-svn: 349361
* [MS Demangler] Add a helper function to print a Node as a string.Zachary Turner2018-12-171-0/+8
| | | | llvm-svn: 349359
* [MIPS GlobalISel] Remove switch statement (fix r349346 for MSVC)Petar Avramovic2018-12-171-6/+1
| | | | | | | Temporarily remove switch statement without any case labels in function legalizeCustom in order to fix r349346 for MSVC. llvm-svn: 349356
* ARM: use acquire/release instruction variants when available.Tim Northover2018-12-172-8/+9
| | | | | | | | These features (fairly) recently got split out into their own feature, so we should make CodeGen use them when available. The main change here is that the check used to be based on the triple, but now it's based on CPU features. llvm-svn: 349355
* [MCA] Add support for BeginGroup/EndGroup.Andrea Di Biagio2018-12-172-0/+10
| | | | llvm-svn: 349354
* Revert "DebugInfo: Assume an absence of ranges or high_pc on a CU means the ↵Eric Liu2018-12-171-0/+21
| | | | | | | | | CU is empty (devoid of code addresses)" This reverts commit r349333. It caused internal test to fail. I have sent more information to the author. llvm-svn: 349353
* [MCA] Don't assume that createMCInstrAnalysis() always returns a valid pointer.Andrea Di Biagio2018-12-171-8/+13
| | | | | | | | | | Class InstrBuilder wrongly assumed that llvm targets were always able to return a non-null pointer when createMCInstrAnalysis() was called on them. This was causing crashes when simulating executions for targets that don't provide an MCInstrAnalysis object. This patch fixes the issue by making MCInstrAnalysis optional. llvm-svn: 349352
* [MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADDPetar Avramovic2018-12-172-30/+23
| | | | | | | | Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D54580 llvm-svn: 349346
* [AArch64] Re-run load/store optimizer after aggressive tail duplicationAlexandros Lamprineas2018-12-171-0/+6
| | | | | | | | | The Load/Store Optimizer runs before Machine Block Placement. At O3 the Tail Duplication Threshold is set to 4 instructions and this can create new opportunities for the Load/Store Optimizer. It seems worthwhile to run it once again. llvm-svn: 349338
* DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is ↵David Blaikie2018-12-171-21/+0
| | | | | | | | | | | | | | | | empty (devoid of code addresses) GCC emitted these unconditionally on/before 4.4/March 2012 Clang emitted these unconditionally on/before 3.5/March 2014 This improves performance when parsing CUs (especially those using split DWARF) that contain no code ranges (such as the mini CUs that may be created by ThinLTO importing - though generally they should be/are avoided, especially for Split DWARF because it produces a lot of very small CUs, which don't scale well in a bunch of other ways too (including size)). llvm-svn: 349333
* [llvm-mca] Move llvm-mca library to llvm/lib/MCA.Clement Courbet2018-12-1722-0/+3196
| | | | | | | | | | | | Summary: See PR38731. Reviewers: andreadb Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D55557 llvm-svn: 349332
* [X86] Fix bad operand lookup for cmov introduced in r349315Craig Topper2018-12-171-1/+1
| | | | | | The CC is operand 2 not operand 3. llvm-svn: 349330
* [EarlyCSE] If DI can't be salvaged, mark it as unavailable.Davide Italiano2018-12-171-1/+2
| | | | | | Fixes PR39874. llvm-svn: 349323
* [X86] Pull out constant splat rotation detection.Simon Pilgrim2018-12-161-21/+28
| | | | | | We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319
* [X86] Remove truncation handling from EmitTest. Replace it with a DAG combine.Craig Topper2018-12-161-50/+105
| | | | | | | | | | I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315
* [x86] increment/decrement constant vector with min/max in vsetcc lowering ↵Sanjay Patel2018-12-161-3/+16
| | | | | | | | | | | | | | | | | | | | (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304
* [DAGCombiner] allow hoisting vector bitwise logic ahead of truncatesSanjay Patel2018-12-161-5/+2
| | | | | | | | | | | | | | | | | | The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303
* [SelectionDAG] Add FSHL/FSHR support to computeKnownBitsSimon Pilgrim2018-12-162-2/+37
| | | | | | Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type). llvm-svn: 349298
* [X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI.Simon Pilgrim2018-12-151-5/+5
| | | | | | In preparation for converting to funnel shifts. llvm-svn: 349286
* [X86] Lower to SHLD/SHRD on slow machines for optsizeSimon Pilgrim2018-12-151-3/+3
| | | | | | Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285
* Add NetBSD support in needsRuntimeRegistrationOfSectionRange.Kamil Rytarowski2018-12-151-0/+1
| | | | | | Use linker script magic to get data/cnts/name start/end. llvm-svn: 349277
* Register kASan shadow offset for NetBSD/amd64Kamil Rytarowski2018-12-151-3/+7
| | | | | | | The NetBSD x86_64 kernel uses the 0xdfff900000000000 shadow offset. llvm-svn: 349276
* [CodeGen] Enhance machine PHIs optimizationDinar Temirbulatov2018-12-151-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Make machine PHIs optimization to work for single value register taken from several different copies. This is the first step to fix PR38917. This change allows to get rid of redundant PHIs (see opt_phis2.mir test) to make the subsequent optimizations (like CSE) possible and simpler. For instance, before this patch the code like this: %b = COPY %z ... %a = PHI %bb1, %a; %bb2, %b could be optimized to: %a = %b but the code like this: %c = COPY %z ... %b = COPY %z ... %a = PHI %bb1, %a; %bb2, %b; %bb3, %c would remain unchanged. With this patch the latter case will be optimized: %a = %z```. Committed on behalf of: Anton Afanasyev anton.a.afanasyev@gmail.com Reviewers: RKSimon, MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54839 llvm-svn: 349271
* Fix -Wunused-variable warning. NFCI.Simon Pilgrim2018-12-151-0/+4
| | | | llvm-svn: 349265
* [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorEltsSimon Pilgrim2018-12-151-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D55600 llvm-svn: 349264
* [SILoadStoreOptimizer] Use std::abs to avoid truncation.Florian Hahn2018-12-151-2/+2
| | | | | | | | | | | | | | Using regular abs() causes the following warning error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value] (uint32_t)abs(Dist) > MaxDist) { ^ lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead which causes a bot to fail: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18284/steps/bootstrap%20clang/logs/stdio llvm-svn: 349224
* [X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the ↵Craig Topper2018-12-151-8/+14
| | | | | | | | | | instruction that only modify the O flag to the waiver list. The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF. Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change. llvm-svn: 349223
* [X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that ↵Craig Topper2018-12-151-20/+19
| | | | | | | | | | | | indicates which result is the flag result. NFCI hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction. After this patch we now do a result number check in both and rely on the caller to provide the result number. This shouldn't change behavior it was just an odd difference between the two functions that I noticed. llvm-svn: 349222
* [WebAssembly] Check if the section order is correctHeejin Ahn2018-12-151-3/+67
| | | | | | | | | | | | | | | | | | | | Summary: This patch checks if the section order is correct when reading a wasm object file in `WasmObjectFile` and converting YAML to wasm object in yaml2wasm. (It is not possible to check when reading YAML because it is handled exclusively by the YAML reader.) This checks the ordering of all known sections (core sections + known custom sections). This also adds section ID DataCount section that will be scheduled to be added in near future. Reviewers: sbc100 Subscribers: dschuff, mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54924 llvm-svn: 349221
* [NewGVN] Update use counts for SSA copies when replacing them by their operands.Florian Hahn2018-12-151-4/+7
| | | | | | | | | | | | | | | | | | The current code relies on LeaderUseCount to determine if we can remove an SSA copy, but in that the LeaderUseCount does not refer to the SSA copy. If a SSA copy is a dominating leader, we use the operand as dominating leader instead. This means we removed a user of a ssa copy and we should decrement its use count, so we can remove the ssa copy once it becomes dead. Fixes PR38804. Reviewers: efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D51595 llvm-svn: 349217
* [Util] Refer to [s|z]exts of args when converting dbg.declares (fix PR35400)Vedant Kumar2018-12-151-27/+0
| | | | | | | | | | | | | | | | | | When converting dbg.declares, if the described value is a [s|z]ext, refer to the ext directly instead of referring to its operand. This fixes a narrowing bug (the debugger got the sign of a variable wrong, see llvm.org/PR35400). The main reason to refer to the ext's operand was that an optimization may remove the ext itself, leading to a dropped variable. Now that InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this is less of a concern. Other passes can/should adopt this API as needed to fix dropped variable bugs. Differential Revision: https://reviews.llvm.org/D51813 llvm-svn: 349214
* [NVPTX] Lower instructions that expand into libcalls.Artem Belevich2018-12-141-4/+9
| | | | | | | | | | | | | | | | | | | The change is an effort to split and refactor abandoned D34708 into smaller parts. Here the behaviour of unsupported instructions is changed to match the behaviour of explicit intrinsics calls. Currently LLVM crashes with: > Assertion getInstruction() && "Not a call or invoke instruction!" failed. With this patch LLVM produces a more sensible error message: > Cannot select: ... i32 = ExternalSymbol'__foobar' Author: Denys Zariaiev <denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55145 llvm-svn: 349213
* DebugInfo: Avoid using split DWARF when the split unit would be empty.David Blaikie2018-12-143-33/+45
| | | | | | | | | | | | | | | | | | | | | In ThinLTO many split CUs may be effectively empty because of the lack of support for cross-unit references in split DWARF. Using a split unit in those cases is just a waste/overhead - and turned out to be one contributor to a significant symbolizer performance issue when global variable debug info was being imported (see r348416 for the primary fix) due to symbolizers seeing CUs with no ranges, assuming there might still be addresses covered and walking into the split CU to see if there are any ranges (when that split CU was in a DWP file, that meant loading the DWP and its index, the index was extra large because of all these fractured/empty CUs... and so was very expensive to load). (the 3rd fix which will follow, is to assume that a CU with no ranges is empty rather than merely missing its CU level range data - and to not walk into its DIEs (split or otherwise) in search of address information that is generally not present) llvm-svn: 349207
* [codeview] Add begin/endSymbolRecord helpers, NFCReid Kleckner2018-12-142-138/+84
| | | | | | | Previously beginning a symbol record was excessively verbose. Now it's a bit simpler. This follows the same pattern as begin/endCVSubsection. llvm-svn: 349205
* DebugInfo: Move addAddrBase from DwarfUnit to DwarfCompileUnitDavid Blaikie2018-12-144-12/+12
| | | | | | Only CUs need an address table reference. llvm-svn: 349203
* [Hexagon] Add patterns for shifts of v2i16Krzysztof Parzyszek2018-12-141-0/+12
| | | | | | This fixes https://llvm.org/PR39983. llvm-svn: 349202
* [GlobalISel] LegalizerHelper: Implement fewerElementsVector for G_LOAD/G_STOREVolkan Keles2018-12-141-2/+44
| | | | | | | | | | | | Reviewers: aemerson, dsanders, bogner, paquette, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53728 llvm-svn: 349200
* [Hexagon] Use IMPLICIT_DEF to any-extend 32-bit values to 64 bitsKrzysztof Parzyszek2018-12-141-23/+25
| | | | llvm-svn: 349199
* [AMDGPU] Promote constant offset to the immediate by finding a new base with ↵Farhana Aleen2018-12-142-1/+362
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D55539 llvm-svn: 349196
* [SDAG] Ignore chain operand in REG_SEQUENCE when emitting instructionsKrzysztof Parzyszek2018-12-141-0/+4
| | | | llvm-svn: 349186
OpenPOWER on IntegriCloud