summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Relax load store vectorizer pointer strip checksStanislav Mekhanoshin2019-08-011-3/+2
| | | | | | | | | | | The previous change to fix crash in the vectorizer introduced performance regressions. The condition to preserve pointer address space during the search is too tight, we only need to match the size. Differential Revision: https://reviews.llvm.org/D65600 llvm-svn: 367624
* Changes to improve CodeView debug info type record inline commentsNilanjana Basu2019-08-014-55/+399
| | | | | Signed-off-by: Nilanjana Basu <nilanjana.basu87@gmail.com> llvm-svn: 367623
* [WebAssembly] Fixed relocation errors having no location.Wouter van Oortmerssen2019-08-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fixes: https://bugs.llvm.org/show_bug.cgi?id=42441 Used to print: <unknown>:0: error: Cannot represent a difference across sections (the location was null). Now prints: err.s:20:3: error: Cannot represent a difference across sections i32.const foo-bar ^ Note: I looked at adding a test for this, but I don't think it is worth it. We're not testing error formatting in the Wasm backend :) Reviewers: sbc100, jgravelle-google Subscribers: dschuff, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65602 llvm-svn: 367619
* GlobalISel: Lower scalarizing unmerge of a vector to shiftsMatt Arsenault2019-08-012-0/+36
| | | | | | | | | | | | | | AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604
* Follow up of rL367592, fix the buildSjoerd Meijer2019-08-011-2/+0
| | | | | | | Some buildbots complained about: error: default label in switch which covers all enumeration values llvm-svn: 367603
* [X86] In decomposeMulByConstant, legalize the VT before querying whether the ↵Craig Topper2019-08-013-4/+15
| | | | | | | | | | | | multiply is legal If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it. This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that... Differential Revision: https://reviews.llvm.org/D65533 llvm-svn: 367601
* AMDGPU: Remove v0 workaround for DS_GWS_* instructionsMatt Arsenault2019-08-012-34/+4
| | | | | | | Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598
* CodeGen: Allow virtual registers in bundlesMatt Arsenault2019-08-011-2/+2
| | | | | | | | | | | | | | | The note in the documentation suggests this restriction is a compile time optimization for architectures that make heavy use of bundling. Allowing virtual registers in a bundle is useful for some (non-R600) AMDGPU use cases and are infrequent enough to matter. A more common AMDGPU use case has already been using virtual registers in bundles since r333691, although never calling finalizeBundle on them and manually creating the use/def list on the BUNDLE instruction. This is also relatively infrequent, and only happens for consecutive sequences of some load/store types. llvm-svn: 367597
* [SimplifyCFG] Mark missed Changed to true.Alina Sbirlea2019-08-011-0/+1
| | | | | | | | | | | | | | | | Summary: DominatorTree is invalid after SimplifyCFG because of a missed `Changed = true` when simplifying a branch condition and removing an edge. Resolves PR42272. Reviewers: zhizhouy, manojgupta Subscribers: jlebar, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65490 llvm-svn: 367596
* [MemorySSA] Set LoopSimplify to preserve MemorySSA in the NPM, if analysis ↵Alina Sbirlea2019-08-011-2/+11
| | | | | | | | | | | | | | | | | | exists. Summary: LoopSimplify is preserved in the legacy pass manager, but not in the new pass manager. Update LoopSimplify to preserve MemorySSA conditionally when the analysis is available (same behavior as the legacy pass manager). Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65418 llvm-svn: 367594
* AMDGPU: Use tablegen pattern for sendmsg intrinsicsMatt Arsenault2019-08-013-17/+23
| | | | | | | Since this now emits a direct copy to m0, SIFixSGPRCopies has to handle a physical register. llvm-svn: 367593
* [LV] Tail-Loop FoldingSjoerd Meijer2019-08-012-54/+99
| | | | | | | | | | | This allows folding of the scalar epilogue loop (the tail) into the main vectorised loop body when the loop is annotated with a "vector predicate" metadata hint. To fold the tail, instructions need to be predicated (masked), enabling/disabling lanes for the remainder iterations. Differential Revision: https://reviews.llvm.org/D65197 llvm-svn: 367592
* GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointerMatt Arsenault2019-08-011-1/+3
| | | | | | | AMDGPU testcase isn't broken now, but will be in a future patch without this. llvm-svn: 367591
* [WebAssembly] Assembler/InstPrinter: support call_indirect type index.Wouter van Oortmerssen2019-08-015-37/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A TYPE_INDEX operand (as used by call_indirect) used to be represented by the InstPrinter as a symbol (e.g. .Ltype_index0@TYPE_INDEX) which was a bit of a mismatch with the WasmObjectWriter which expects an unnamed symbol, to receive the signature from and then turn into a reloc. There was really no good way to round-trip this information. An earlier version of this patch tried to attach the signature information using a .functype, but that ran into trouble when the symbol was re-emitted without a name. Removing the name was a giant hack also. The current version changes the assembly syntax to have an inline signature spec for TYPEINDEX operands that is always unnamed, which is much more elegant both in syntax and in implementation (as now the assembler is able to follow the same path as the regular backend) Reviewers: sbc100, dschuff, aheejin, jgravelle-google, sunfish, tlively Subscribers: arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64758 llvm-svn: 367590
* [TargetLowering] SimplifyMultipleUseDemandedBits - Add ↵Simon Pilgrim2019-08-011-0/+10
| | | | | | | | ISD::INSERT_VECTOR_ELT handling Allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367588
* [X86][SSE] Add PEXTR*(PINSR*(v, s, c), c) -> s combine.Simon Pilgrim2019-08-011-4/+15
| | | | | | We should probably extend this to cover bitcasts as well to help other cases in promote-vec3.ll. llvm-svn: 367582
* [Attributor][FIX] Indicate a missing update changeJohannes Doerfert2019-08-011-3/+7
| | | | | | | | | | User of AAReturnedValues need to know if HasOverdefinedReturnedCalls changed from false to true as it will impact the result of the return value traversal (calls are not ignored anymore). This will be tested with the tests in D59978. llvm-svn: 367581
* [mips] Fix lowering load/store instruction in PIC caseSimon Atanasyan2019-08-011-1/+18
| | | | | | | | | | | | | | | | | | | | | If an operand of the `lw/sw` instructions is a symbol, these instructions incorrectly lowered using not-position-independent chain of commands. For PIC code we should use `lw/addiu` instructions with the `R_MIPS_GOT16` and `R_MIPS_LO16` relocations respectively. Instead of that LLVM generates position dependent code with the `R_MIPS_HI16` and `R_MIPS_LO16` relocations. This patch provides a fix for the bug by handling PIC case separately in the `MipsAsmParser::expandMemInst`. The main idea is to generate a chain of PIC instructions to load a symbol address into a register and then load the address content. The fix is not optimal and does not fix all PIC-related problems. This is a task for subsequent patches. Differential Revision: https://reviews.llvm.org/D65524 llvm-svn: 367580
* [X86][SSE] SimplifyMultipleUseDemandedBits - Add PEXTR/PINSR B+W handlingSimon Pilgrim2019-08-012-0/+31
| | | | | | This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367570
* [X86] EltsFromConsecutiveLoads - don't attempt to merge volatile loads (PR42846)Simon Pilgrim2019-08-011-1/+4
| | | | llvm-svn: 367556
* [RISCV] Add Custom Parser for Atomic Memory OperandsSam Elliott2019-08-014-4/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCC Accepts both (reg) and 0(reg) for atomic instruction memory operands. These instructions do not allow for an offset in their encoding, so in the latter case, the 0 is silently dropped. Due to how we have structured the RISCVAsmParser, the easiest way to add support for parsing this offset is to add a custom AsmOperand and parser. This parser drops all the parens, and just keeps the register. This commit also adds a custom printer for these operands, which matches the GCC canonical printer, printing both `(a0)` and `0(a0)` as `(a0)`. Reviewers: asb, lewis-revill Reviewed By: asb Subscribers: s.egerton, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65205 llvm-svn: 367553
* [IR] Value: add replaceUsesWithIf() utilityRoman Lebedev2019-08-016-42/+17
| | | | | | | | | | | | | | | | | | | | | | Summary: While there is always a `Value::replaceAllUsesWith()`, sometimes the replacement needs to be conditional. I have only cleaned a few cases where `replaceUsesWithIf()` could be used, to both add test coverage, and show that it is actually useful. Reviewers: jdoerfert, spatel, RKSimon, craig.topper Reviewed By: jdoerfert Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, aheejin, george.burgess.iv, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65528 llvm-svn: 367548
* [IR] SelectInst: add swapValues() utilityRoman Lebedev2019-08-013-14/+5
| | | | | | | | | | | | | | | | | | Summary: Sometimes we need to swap true-val and false-val of a `SelectInst`. Having a function for that is nicer than hand-writing it each time. Reviewers: spatel, RKSimon, craig.topper, jdoerfert Reviewed By: jdoerfert Subscribers: jdoerfert, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65520 llvm-svn: 367547
* [ARM] Fix for MVE VREV64David Green2019-08-011-5/+5
| | | | | | | | | | | The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the cross-beat nature of the instruction. This adds an earlyclobber to Qd, which seems to be the same way we deal with this on other instructions like the write-back on loads and stores. Differential Revision: https://reviews.llvm.org/D65502 llvm-svn: 367544
* [AArch64] Do not allocate unnecessary emergency slot.Sander de Smalen2019-08-011-2/+2
| | | | | | | | | | | | | | Fix an issue where the compiler still allocates an emergency spill slot even though it already decided to spill an extra callee-save register to use as a scratch register. Reviewers: gberry, thegameg, mstorsjo, t.p.northover Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D65504 llvm-svn: 367540
* [MIPS GlobalISel] Fold load/store + G_GEP + G_CONSTANTPetar Avramovic2019-08-011-2/+23
| | | | | | | | | Fold load/store + G_GEP + G_CONSTANT when immediate in G_CONSTANT fits into 16 bit signed integer. Differential Revision: https://reviews.llvm.org/D65507 llvm-svn: 367535
* [NFC][ARM][ParallelDSP] Getters and renamingSam Parker2019-08-011-16/+22
| | | | | | | Add a couple of getters for Reduction and do some renaming of variables around CreateSMLAD for clarity. llvm-svn: 367522
* [SelectionDAG] Use APInt::isSubsetOf/intersects to simplify some code.Craig Topper2019-08-011-2/+2
| | | | | | Also use KnownBits::isNegative/isNonNegative to further simplify. llvm-svn: 367518
* AMDGPU/SILoadStoreOptimizer: Make some functions constTom Stellard2019-08-011-6/+6
| | | | | | | | | | | | | | Reviewers: arsenm, pendingchaos, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65316 llvm-svn: 367517
* recommit:[PowerPC] Eliminate loads/swap feeding swap/store for vector type ↵Zi Xuan Wu2019-08-013-1/+117
| | | | | | | | | | | | by using big-endian load/store In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target. So we can combine vector load + reverse into big endian load to eliminate the swap instruction. Also combine vector reverse + store into big endian store. Differential Revision: https://reviews.llvm.org/D65063 llvm-svn: 367516
* AMDGPU/GlobalISel: Fix flat load/store of pointer typesMatt Arsenault2019-08-014-8/+13
| | | | llvm-svn: 367513
* AMDGPU/GlobalISel: Remove manual store select codeMatt Arsenault2019-08-012-58/+23
| | | | | | | This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512
* AMDGPU/GlobalISel: Select local atomic cmpxchgMatt Arsenault2019-08-013-28/+13
| | | | llvm-svn: 367511
* AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADDMatt Arsenault2019-08-014-0/+6
| | | | llvm-svn: 367509
* AMDGPU/GlobalISel: Allow selection of DS atomicrmwMatt Arsenault2019-08-014-6/+27
| | | | llvm-svn: 367507
* AMDGPU: Start redefining atomic PatFragsMatt Arsenault2019-08-016-189/+210
| | | | | | | | Start migrating to a form that will be compatible with the global isel emitter. Also should fix some overly lax checks on the memory type, which allowed mis-selecting some illegal atomics. llvm-svn: 367506
* AMDGPU: Correct FP atomic patternsMatt Arsenault2019-08-013-9/+10
| | | | | | | These need to use an fadd, not an add. Also make the noret part clear in the name. llvm-svn: 367505
* AMDGPU/GlobalISel: Select simple local storesMatt Arsenault2019-08-016-19/+52
| | | | llvm-svn: 367504
* GlobalISel: moreElementsVector for G_LOAD/G_STOREMatt Arsenault2019-08-012-1/+12
| | | | | | | AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503
* Create unique, but identically-named ELF sections for explicitly-sectioned ↵Peter Collingbourne2019-08-011-2/+17
| | | | | | | | | | | | | | functions and globals when using -function-sections and -data-sections. This allows functions and globals to to be reordered later in the linking phase (using the -symbol-ordering-file) even though reordering will be limited to the scope of the explicit section. Patch by Rahman Lavaee! Differential Revision: https://reviews.llvm.org/D65478 llvm-svn: 367501
* Reapply "AMDGPU: Split block for si_end_cf"Matt Arsenault2019-08-015-25/+139
| | | | | | This reverts commit r359363, reapplying r357634 llvm-svn: 367500
* Fix a release-only build warning triggered by rL367485Philip Reames2019-08-011-0/+3
| | | | llvm-svn: 367499
* AMDGPU/GlobalISel: Select local loadsMatt Arsenault2019-08-015-9/+108
| | | | llvm-svn: 367498
* Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" andAmy Huang2019-07-312-21/+0
| | | | | | | | | | and partial fix. Causes windows buildbot errors. This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and 53da7ca94343166ac68aef81db0398932fc258bb. llvm-svn: 367496
* [ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1.Eli Friedman2019-07-315-0/+34
| | | | | | | | | This is extremely specific, but saves three instructions when it's legal. I don't think the code can be usefully generalized. Differential Revision: https://reviews.llvm.org/D65351 llvm-svn: 367492
* [ARM] Transform compare of masked value to shift on Thumb1.Eli Friedman2019-07-311-0/+37
| | | | | | | | | | | | Thumb1 has very limited immediate modes, so turning an "and" into a shift can save multiple instructions. It's possible to simplify the generated code for test2 and test3 in cmp-and-fold.ll a little more, but I'll implement that as a followup. Differential Revision: https://reviews.llvm.org/D65175 llvm-svn: 367491
* [ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use ↵Craig Topper2019-07-311-11/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | scalar bit tests for the branches. X86 at least is able to use movmsk or kmov to move the mask to the scalar domain. Then we can just use test instructions to test individual bits. This is more efficient than extracting each mask element individually. I special cased v1i1 to use the previous behavior. This avoids poor type legalization of bitcast of v1i1 to i1. I've skipped expandload/compressstore as I think we need to handle constant masks for those better first. Many tests end up with duplicate test instructions due to tail duplication in the branch folding pass. But the same thing happens when constructing similar code in C. So its not unique to the scalarization. Not sure if this lowering code will also be good for other targets, but we're only testing X86 today. Differential Revision: https://reviews.llvm.org/D65319 llvm-svn: 367489
* [X86] Add DAG combine to fold any_extend_vector_inreg+truncstore to an ↵Craig Topper2019-07-311-0/+35
| | | | | | | | | | | | extractelement+store We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it. This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk. Differential Revision: https://reviews.llvm.org/D65538 llvm-svn: 367488
* Migrate some more fadd and fsub cases away from UnsafeFPMath control to ↵Michael Berg2019-07-312-7/+7
| | | | | | | | | | | | | | | | utilize NoSignedZerosFPMath options control Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context. Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper Reviewed By: spatel Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji Differential Revision: https://reviews.llvm.org/D65170 llvm-svn: 367486
* [IndVars, RLEV] Support rewriting exit values in loops without known exits ↵Philip Reames2019-07-311-9/+7
| | | | | | | | | | (prep work) This is a prepatory patch for future work on support exit value rewriting in loops with a mixture of computable and non-computable exit counts. The intention is to be "mostly NFC" - i.e. not enable any interesting new transforms - but in practice, there are some small output changes. The test differences are caused by cases wherewhere getSCEVAtScope can simplify a single entry phi without needing any knowledge of the loop. llvm-svn: 367485
OpenPOWER on IntegriCloud