summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [IRMover] Set Address Space for moved global valuesTeresa Johnson2019-11-051-4/+5
| | | | | | | | | | | | | | | | | | | Summary: Set Address Space when creating a new function (from another). Fix PR41154. Patch by Ehud Katz <ehudkatz@gmail.com> Reviewers: tejohnson, chandlerc Reviewed By: tejohnson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69361
* [AMDGPU] Add missing flags to DS_RealStanislav Mekhanoshin2019-11-051-0/+2
| | | | Differential Revision: https://reviews.llvm.org/D69867
* [IRMover] Use GlobalValue::getAddressSpace instead of directly from its type ↵Teresa Johnson2019-11-051-10/+10
| | | | | | | | | | | | | | | | | | [NFC] Summary: Change the old form of G->getType()->getAddressSpace() to the new G->getAddressSpace() (underneath does the same). Patch by Ehud Katz <ehudkatz@gmail.com> Reviewers: tejohnson, chandlerc Reviewed By: tejohnson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69550
* [mips] Fix `getRegForInlineAsmConstraint` to do not crash on empty ConstraintSimon Atanasyan2019-11-061-4/+6
|
* [LoopRotationUtils] Check values are newly inserted into maps.Alina Sbirlea2019-11-051-5/+14
| | | | | This is a cleanup that came up in D63680. All values added to the ValueMaps should be newly added.
* [Hexagon] getCompoundCandidateGroup - fix 'false' value is implicitly cast ↵Simon Pilgrim2019-11-051-5/+5
| | | | | | to unsigned warning. NFCI. Consistently return HexagonII::HCG_None.
* [X86/Atomics] Correct a few transforms for new atomic loweringPhilip Reames2019-11-051-4/+3
| | | | | | This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609). Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue. These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
* [MIR] Add MIR parsing for heap alloc site instruction markersAmy Huang2019-11-055-5/+39
| | | | | | | | | | | | | | | Summary: This patch adds MIR parsing and printing for heap alloc markers, which were added in D69136. They are printed as an operand similar to pre-/post-instr symbols, with a heap-alloc-marker token and a metadata node. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69864
* [X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMathBenjamin Kramer2019-11-051-8/+7
|
* [AMDGPU] Removed dead code from R600ISelLowering.cppStanislav Mekhanoshin2019-11-051-6/+1
| | | | | | | | | This was added to inhibit a warning from gcc 7.3 according to the comment. However, it triggers warning from PVS. In addition I cannot reproduce it with gcc 7.4 and I also cannot reproduce it with gcc 7.3 using compiler explorer. Differential Revision: https://reviews.llvm.org/D69863
* [X86/Atomics] (Semantically) revert G246098, switch back to the old atomic ↵Philip Reames2019-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.
* [AMDGPU] Removed dead code handling M0CopyRegStanislav Mekhanoshin2019-11-051-14/+0
| | | | | | | Static analyzer complains about always false condition. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69860
* [X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZerosBenjamin Kramer2019-11-051-2/+3
| | | | | The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-0528-82/+84
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* [AMDGPU] return Fail instead of SolfFail from addOperand()Stanislav Mekhanoshin2019-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | addOperand() method of AMDGPU disassembler returns SoftFail on error. All instances which may lead to that place are an impossible encdoing, not something which is possible to encode, but semantically incorrect as described for SoftFail. Then tablegen generates a check of the following form: if (Decode...(..) == MCDisassembler::Fail) { return MCDisassembler::Fail; } Since we can only return Success and SoftFail that is dead code as detected by the static code analyzer. Solution: return Fail as it should be. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69819
* [SLP] - Add couple safety checks to TreeEntry::dump(). NFCSergey Dmitriev2019-11-051-5/+12
| | | | | | | | | | | | | | Summary: Check for MainOp and AltOp for NULL before dereferencing or issue NULL. Reviewers: Vasilis, dtemirbulatov, RKSimon, ABataev Reviewed By: ABataev Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69812
* [JumpThreading] Factor out code to merge basic blocks (NFC)Kazu Hirata2019-11-051-43/+52
| | | | | | | | | | | | | | | Summary: This patch factors out code to merge a basic block with its sole successor -- partly for readability and partly to facilitate an upcoming patch of my own. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69852
* Remove redundant assignment. NFCI.Simon Pilgrim2019-11-051-1/+0
| | | | Fixes cppcheck warning.
* Use iterator prefix increment. NFCI.Simon Pilgrim2019-11-051-1/+1
|
* [MachineOutliner] Reduce scope of variable and stop duplicate getMF() calls. ↵Simon Pilgrim2019-11-051-3/+3
| | | | NFCI.
* [DFAPacketizer] Allow up to 64 functional unitsjmolloy2019-11-051-54/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: To drive the automaton we used a uint64_t as an action type. This contained the transition's resource requirements as a conjunction: (a OR b) AND (b OR c) We encoded this conjunction as a sequence of four 16-bit bitmasks. This limited the number of addressable functional units to 16, which is quite low and has bitten many people in the past. Instead, the DFAEmitter now generates a lookup table from InstrItinerary class (index of the ItinData inside the ProcItineraries) to an internal action index which is essentially a dense embedding of the conjunctive form. Because we never materialize the conjunctive form, we no longer have the 16 FU restriction. In this patch we limit to 64 functional units due to using a uint64_t bitmask in the DFAEmitter. Now that we've decoupled these representations we can increase this in future. Reviewers: ThomasRaoux, kparzysz, majnemer Reviewed By: ThomasRaoux Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69110
* [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)Gil Rapaport2019-11-055-125/+170
| | | | | | | This recommits 2be17087f8c38934b7fc9208ae6cf4e9b4d44f4b (reverted in d3ec06d219788801380af1948c7f7ef9d3c6100b for heap-use-after-free) with a fix in IAI's reset() which was not clearing the set of interleave groups after deleting them.
* [MachineOutliner] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-051-1/+1
|
* [ObjC][ARC] Ignore lifetime markers between *ReturnValue callsFrancis Visoiu Mistrih2019-11-051-1/+28
| | | | | | | | | | | | | | | | | | | | | | When eliminating a pair of `llvm.objc.autoreleaseReturnValue` followed by `llvm.objc.retainAutoreleasedReturnValue` we need to make sure that the instructions in between are safe to ignore. Other than bitcasts and useless GEPs, it's also safe to ignore lifetime markers for both static allocas (lifetime.start/lifetime.end) and dynamic allocas (stacksave/stackrestore). These get added by the inliner as part of the return sequence and can prevent the transformation from happening in practice. Differential Revision: https://reviews.llvm.org/D69833
* [JumpThreading] Factor out common code to update the SSA form (NFC)Kazu Hirata2019-11-051-75/+46
| | | | | | | | | | | | | | | Summary: This patch factors out common code to update the SSA form in JumpThreading.cpp -- partly for readability and partly to facilitate an coming patch of my own. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69811
* [GVN] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-052-12/+12
|
* Add missing GVN =operator. NFCI.Simon Pilgrim2019-11-051-0/+1
| | | | Fixes PVS Studio warning that the 'ValueTable' class implements a copy constructor, but lacks the '=' operator.
* [AtomicExpandPass] Silence static analyzer warnings about operator priority. ↵Dávid Bolvanský2019-11-051-1/+1
| | | | NFCI.
* [MachineScheduler] Enable AA in PostRA Machine schedulerDavid Green2019-11-051-0/+2
| | | | | | | | | | | | This adds AA to Post-RA Machine Scheduling, allowing the pass more freedom when handling memory operations. My understanding is that this was just never done, not that it is inherently incorrect to do so. The older PostRA List scheduler already makes use of AA, it's just that the MI PostRA Scheduler was never taught to use it. Differential Revision: https://reviews.llvm.org/D69814
* Fix PR40644: miscompile indexed FP constant storeThomas Preud'homme2019-11-052-0/+6
| | | | | | | | | | | | | | | | | | | | | Summary: Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both replace store of float by a store of an integer unconditionally. However this generates wrong code when the store that is replaced is an indexed or truncating store. This commit solves this issue by adding an early return in these functions when the store being considered is not a normal store. Bug was only observed on out of tree targets, hence the lack of testcase in this commit. Reviewers: efriedma Subscribers: hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68420
* [ARM] Always enable UseAA in the arm backendDavid Green2019-11-052-16/+2
| | | | | | | | | | This feature controls whether AA is used into the backend, and was previously turned on for certain subtargets to help create less constrained scheduling graphs. This patch turns it on for all subtargets, so that they can all make use of the extra information to produce better code. Differential Revision: https://reviews.llvm.org/D69796
* [Scheduling][ARM] Consistently enable PostRA Machine schedulingDavid Green2019-11-056-10/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the ARM backend, for historical reasons we have only some targets using Machine Scheduling. The rest use the old list scheduler as they are using itinaries and the list scheduler seems to produce better code (and not crash running out of register on v6m codes). So whether to use the MIScheduler or not is checked at runtime from the subtarget features. This is fine, except for post-ra scheduling. Whether to use the old post-ra list scheduler or the post-ra machine schedule is decided as the pass manager is set up, in arms case from a newly constructed subtarget. Under some situations, like LTO, this won't include the correct cpu so can pick the wrong option. This can have a surprising effect on performance. To fix that, this patch overrides targetSchedulesPostRAScheduling and addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and picking at runtime which to execute. To pick between the two I've had to add a enablePostRAMachineScheduler() method that normally returns enableMachineScheduler() && enablePostRAScheduler(), which can be overridden to enable just one of PostRAMachineScheduler vs PostRAScheduler. Thanks to David Penry for the identifying this problem. Differential Revision: https://reviews.llvm.org/D69775
* [InstCombine] dropRedundantMaskingOfLeftShiftInput(): truncation (PR42563)Roman Lebedev2019-11-051-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: That fold keeps growing and growing :( I think this may be one of the last pieces for it. Since D67677/D67725, the fold knowns the general form of the pattern - where some masking is needed: https://rise4fun.com/Alive/F5R https://rise4fun.com/Alive/gslRa But there is one more huge piece missing - if you are extracting some bits, it is not impossible that the origin is wider than the extraction, i.e. there may be a truncation. And we don't deal with that yet. But we can, and the generalization remains fully identical: https://rise4fun.com/Alive/Uar https://rise4fun.com/Alive/5SW After a preparatory cleanup i think the diff looks rather clean. One missing piece is that in some patterns (especially pat. b), `-1` only needs to be `-1` in final type, but that is for later.. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69125
* [RISCV] Add InstrInfo areMemAccessesTriviallyDisjoint hookLuís Marques2019-11-052-0/+63
| | | | | | | | | | | Summary: Introduces the `InstrInfo::areMemAccessesTriviallyDisjoint` hook. The test could check for instruction reorderings, but to avoid being brittle it just checks instruction dependencies. Reviewers: asb, lenary Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D67046
* DWARFDebugLoclists: Make it possible to read relocated addressesPavel Labath2019-11-053-15/+17
| | | | | | | | | | | | | | | | | Summary: Handling relocations was not needed when the loclists section was a DWO-only thing. But since DWARF5, it is possible to use it in regular objects too, and the standard permits embedding addresses into the section directly. These addresses need to be relocated in unlinked files. Reviewers: JDevlieghere, dblaikie, probinson Subscribers: aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68271
* Recommit "[HardwareLoops] Optimisation remarks"Sjoerd Meijer2019-11-051-23/+82
| | | | | | | | | With a few things fixed: - initialisaiton of the optimisation remark pass (this was causing the buildbot failures on PPC), - a test case. Differential Revision: https://reviews.llvm.org/D69660
* [IR] Remove switch's default block that causes clang 8 raise erroraqjune2019-11-051-2/+0
|
* [X86] Lower the cost of avx512 horizontal bool and/or reductions to ↵Craig Topper2019-11-041-0/+21
| | | | | | | | | | | | | | | | | 2*log2(bitwidth)+1 for legal types. This better represents the kshift+binop we'd get for each stage before the final extract. Its likely we'll do even better by doing a kmov and a cmp with a GPR, but this is a good start. The default handling was costing a worst case single source permute shuffle of the vector before the binop. This worst case assumes the shuffle might have to be emulated with extracts and inserts. But since we know we're doing a reduction we can assume we'll get kshift lowering. There's still some room for improvement here, but this is much better than it was.
* [IR] Add Freeze instructionaqjune2019-11-0513-64/+93
| | | | | | | | | | | | | | | | | | Summary: - Define Instruction::Freeze, let it be UnaryOperator - Add support for freeze to LLLexer/LLParser/BitcodeReader/BitcodeWriter The format is `%x = freeze <ty> %v` - Add support for freeze instruction to llvm-c interface. - Add m_Freeze in PatternMatch. - Erase freeze when lowering IR to SelDag. Reviewers: deadalnix, hfinkel, efriedma, lebedev.ri, nlopes, jdoerfert, regehr, filcab, delcypher, whitequark Reviewed By: lebedev.ri, jdoerfert Subscribers: jfb, kristof.beyls, hiraditya, lebedev.ri, steven_wu, dexonsmith, xbolva00, delcypher, spatel, regehr, trentxintong, vsk, filcab, nlopes, mehdi_amini, deadalnix, llvm-commits Differential Revision: https://reviews.llvm.org/D29011
* [BPF] fix a use after free bugYonghong Song2019-11-041-2/+8
| | | | | | | | | | | | | | Commit fff2721286e1 ("[BPF] Fix CO-RE bugs with bitfields") fixed CO-RE handling bitfield issues. But the implementation introduced a use after free bug. The "Base" of the intrinsic might be freed so later on accessing the Type of "Base" might access the freed memory. The failed test case, CodeGen/BPF/CORE/offset-reloc-middle-chain.ll is exactly used to test such a case. Similarly to previous attempt to remember Metadata etc, remember "Base" pointee Alignment in advance to avoid such use after free bug.
* [X86] Teach X86MCInstLower to swap operands of commutable instructions to ↵Craig Topper2019-11-041-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | enable 2-byte VEX encoding. Summary: The 2 source operands commutable instructions are encoded in the VEX.VVVV field and the r/m field of the MODRM byte plus the VEX.B field. The VEX.B field is missing from the 2-byte VEX encoding. If the VEX.VVVV source is 0-7 and the other register is 8-15 we can swap them to avoid needing the VEX.B field. This works as long as the VEX.W, VEX.mmmmm, and VEX.X fields are also not needed. Fixes PR36706. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68550
* [BPF] Fix CO-RE bugs with bitfieldsYonghong Song2019-11-041-38/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bitfield handling is not robust with current implementation. I have seen two issues as described below. Issue 1: struct s { long long f1; char f2; char b1:1; } *p; The current approach will generate an access bit size 56 (from b1 to the end of structure) which will be rejected as it is not power of 2. Issue 2: struct s { char f1; char b1:3; char b2:5; char b3:6: char b4:2; char f2; }; The LLVM will group 4 bitfields together with 2 bytes. But loading 2 bytes is not correct as it violates alignment requirement. Note that sometimes, LLVM breaks a large bitfield groups into multiple groups, but not in this case. To resolve the above two issues, this patch takes a different approach. The alignment for the structure is used to construct the offset of the bitfield access. The bitfield incurred memory access is an aligned memory access with alignment/size equal to the alignment of the structure. This also simplified the code. This may not be the optimal memory access in terms of memory access width. But this should be okay since extracting the bitfield value will have the same amount of work regardless of what kind of memory access width. Differential Revision: https://reviews.llvm.org/D69837
* [AArch64] Update for ExynosEvandro Menezes2019-11-041-2/+4
| | | | Fix the costs of integer division.
* [AMDGPU] Added assert in SIFoldOperands before ptr use. NFC.Stanislav Mekhanoshin2019-11-041-0/+1
|
* [AMDGPU] deduplicate tablegen predicatesStanislav Mekhanoshin2019-11-0411-38/+46
| | | | | | | | | | | | | | | We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815
* [demangle] NFC: get rid of NodeOrStringErik Pilkington2019-11-042-19/+0
| | | | | | This class was a bit overengineered, and was triggering some PVS warnings. Instead, put strings into a NameType and let clients unconditionally treat it as a Node.
* [X86] Add support for -mvzeroupper and -mno-vzeroupper to match gccCraig Topper2019-11-044-53/+68
| | | | | | | | | | | | | | | | | -mvzeroupper will force the vzeroupper insertion pass to run on CPUs that normally wouldn't. -mno-vzeroupper disables it on CPUs where it normally runs. To support this with the default feature handling in clang, we need a vzeroupper feature flag in X86.td. Since this flag has the opposite polarity of the fast-partial-ymm-or-zmm-write we used to use to disable the pass, we now need to add this new flag to every CPU except KNL/KNM and BTVER2 to keep identical behavior. Remove -fast-partial-ymm-or-zmm-write which is no longer used. Differential Revision: https://reviews.llvm.org/D69786
* [SimplifyCFG] Use a (trivially) dominanting widenable branch to remove later ↵Philip Reames2019-11-041-0/+48
| | | | | | | | | | slow path blocks This transformation is a variation on the GuardWidening transformation we have checked in as it's own pass. Instead of focusing on merge (i.e. hoisting and simplifying) two widenable branches, this transform makes the observation that simply removing a second slowpath block (by reusing an existing one) is often a very useful canonicalization. This may lead to later merging, or may not. This is a useful generalization when the intermediate block has loads whose dereferenceability is hard to establish. As noted in the patch, this can be generalized further, and will be. Differential Revision: https://reviews.llvm.org/D69689
* [DAGCombine][MSP430] use shift amount threshold in DAGCombine (2/2)Sanjay Patel2019-11-041-36/+48
| | | | | | | | | | | | | | | | | Continuation of: D69116 Contributes to a fix for PR43559: https://bugs.llvm.org/show_bug.cgi?id=43559 See also D69099 and D69116 Use the TLI hook in DAGCombine.cpp to guard against creating shift nodes that are not optimal for a target. Patch by: @joanlluch (Joan LLuch) Differential Revision: https://reviews.llvm.org/D69120
* [X86] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-0410-42/+42
|
OpenPOWER on IntegriCloud