summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86MacroFusion] Handle branch fusion (AMD CPUs).Clement Courbet2019-03-285-56/+112
| | | | | | | | | | | | | | | | | | Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171
* AMDGPU: Make exec mask optimzations more resistant to block splitsMatt Arsenault2019-03-283-22/+84
| | | | | | | Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170
* [X86] AMD Piledriver (BdVer2): fine-tune some latenciesRoman Lebedev2019-03-281-28/+50
| | | | | | | | | | | | | | Based on llvm-exegesis measurements. Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter, it is now possible to continue cleanup of the scheduler model. With this, there are no more latency inconsistencies for the opcodes that produce stable measurements, and only a few inconsistencies for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.) llvm-svn: 357169
* [NFC] Format InlineFeatureIgnoreList.Clement Courbet2019-03-281-51/+52
| | | | | | To avoid more spurious clang-format changes when adding features (D59872). llvm-svn: 357168
* [DAGCombiner] Fold truncate(build_vector(x,y)) -> ↵Simon Pilgrim2019-03-281-1/+15
| | | | | | | | | | | | build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161
* [yaml2obj][obj2yaml] - Teach yaml2obj/obj2yaml tools about STB_GNU_UNIQUE ↵George Rimar2019-03-281-2/+3
| | | | | | | | | | | | | symbols. yaml2obj/obj2yaml does not support the symbols with STB_GNU_UNIQUE yet. Currently, obj2yaml fails with llvm_unreachable when met such a symbol. I faced it when investigated the https://bugs.llvm.org/show_bug.cgi?id=41196. Differential revision: https://reviews.llvm.org/D59875 llvm-svn: 357158
* [asan] Add options -asan-detect-invalid-pointer-cmp and ↵Pierre Gousseau2019-03-281-6/+31
| | | | | | | | | | | | | -asan-detect-invalid-pointer-sub options. This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract. Disabled by default as this is still an experimental feature. Reviewed By: morehouse, vitalybuka Differential Revision: https://reviews.llvm.org/D59220 llvm-svn: 357157
* [VPlan] Determine Vector Width programmatically.Florian Hahn2019-03-282-20/+37
| | | | | | | | | | | | | | With this change, the VPlan native path is triggered with the directive: #pragma clang loop vectorize(enable) There is no need to specify the vectorize_width(N) clause. Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D57598 llvm-svn: 357156
* [X85][AVX] Add missing vXi16 broadcast fold patternsSimon Pilgrim2019-03-282-0/+24
| | | | | | | | Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155
* [ARM GlobalISel] Fix G_STORE with s1Diana Picus2019-03-281-0/+18
| | | | | | | G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154
* [ARM GlobalISel] Fix selection of G_SELECTDiana Picus2019-03-281-5/+3
| | | | | | | | | | | G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153
* SafepointIRVerifier port to new Pass ManagerSerguei Katkov2019-03-283-0/+13
| | | | | | | | | | | Straightforward port of StatepointIRVerifier pass to new Pass Manager framework. Reviewers: fedor.sergeev, reames Reviewed By: fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D59825 llvm-svn: 357147
* [WebAssembly] Rename wasm fixup kindsSam Clegg2019-03-285-18/+13
| | | | | | | | | | | These fixup kinds are not explicitly related to the code section. They are there to signal how to apply the fixup. Also, a couple of other minor wasm cleanups. Differential Revision: https://reviews.llvm.org/D59908 llvm-svn: 357145
* [NewPM] Fix a nasty bug with analysis invalidation in the new PM.Chandler Carruth2019-03-281-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue here is that we actually allow CGSCC passes to mutate IR (and therefore invalidate analyses) outside of the current SCC. At a minimum, we need to support mutating parent and ancestor SCCs to support the ArgumentPromotion pass which rewrites all calls to a function. However, the analysis invalidation infrastructure is heavily based around not needing to invalidate the same IR-unit at multiple levels. With Loop passes for example, they don't invalidate other Loops. So we need to customize how we handle CGSCC invalidation. Doing this without gratuitously re-running analyses is even harder. I've avoided most of these by using an out-of-band preserved set to accumulate the cross-SCC invalidation, but it still isn't perfect in the case of re-visiting the same SCC repeatedly *but* it coming off the worklist. Unclear how important this use case really is, but I wanted to call it out. Another wrinkle is that in order for this to successfully propagate to function analyses, we have to make sure we have a proxy from the SCC to the Function level. That requires pre-creating the necessary proxy. The motivating test case now works cleanly and is added for ArgumentPromotion. Thanks for the review from Philip and Wei! Differential Revision: https://reviews.llvm.org/D59869 llvm-svn: 357137
* [ARM] Remove dead function ARMMCCodeEmitter::getSOImmOpValueSam Clegg2019-03-271-34/+0
| | | | | | | | | The last reference to this function was removed from the ARM td files in 2015 in rL225266. Differential Revision: https://reviews.llvm.org/D59868 llvm-svn: 357130
* [x86] improve AVX lowering of vector zextSanjay Patel2019-03-271-0/+20
| | | | | | | | | | | | | | | | | | | | | If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently. I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op. From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions. Differential Revision: https://reviews.llvm.org/D59777 llvm-svn: 357129
* [x86] look through bitcast operand of MOVMSKSanjay Patel2019-03-271-6/+5
| | | | | | | | | This is not exactly NFC because it should make further combines of MOVMSK easier to match, but there should be no outward differences because we have isel patterns in place specifically to allow this. See: // Also support integer VTs to avoid a int->fp bitcast in the DAG. llvm-svn: 357128
* [X86ISelDAGToDAG] Move initialization of OptForSize and OptForMinSize from ↵Craig Topper2019-03-271-5/+7
| | | | | | | | PreprocessISelDAG to runOnMachineFunction. NFCI This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created. llvm-svn: 357123
* [DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodesNirav Dave2019-03-271-0/+2
| | | | | | | | | | | | | | Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121
* [LegalizeVectorTypes] Allow single loads and stores for more short vectorsJustin Bogner2019-03-272-7/+14
| | | | | | | | | | | | | | | | | | | When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120
* [WebAssembly] Add some whitespace to WebAssemblyFixIrreducibleControlFlowAlon Zakai2019-03-271-0/+2
| | | | | | | Differential Revision: https://reviews.llvm.org/D59855 modified: llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp llvm-svn: 357117
* Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."Nirav Dave2019-03-272-15/+0
| | | | | | | This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116
* [ConstantRange] Add isWrappedSet() and isUpperSignWrapped()Nikita Popov2019-03-271-3/+11
| | | | | | | | | | | | | | | Split off from D59749. This adds isWrappedSet() and isUpperSignWrapped() set with the same behavior as isSignWrappedSet() and isUpperWrapped() for the respectively other domain. The methods isWrappedSet() and isSignWrappedSet() will not consider ranges of the form [X, Max] == [X, 0) and [X, SignedMax] == [X, SignedMin) to be wrapping, while isUpperWrapped() and isUpperSignWrapped() will. Also replace the checks in getUnsignedMin() and friends with method calls that implement the same logic. llvm-svn: 357112
* [CGP] Reset DT when optimizing select instructionsTeresa Johnson2019-03-271-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: A recent fix (r355751) caused a compile time regression because setting the ModifiedDT flag in optimizeSelectInst means that each time a select instruction is optimized the function walk in runOnFunction stops and restarts again (which was needed to build a new DT before we started building it lazily in r356937). Now that the DT is built lazily, a simple fix is to just reset the DT at this point, rather than restarting the whole function walk. In the future other places that set ModifiedDT may want to switch to just resetting the DT directly. But that will require an evaluation to ensure that they don't otherwise need to restart the function walk. Reviewers: spatel Subscribers: jdoerfert, llvm-commits, xur Tags: #llvm Differential Revision: https://reviews.llvm.org/D59889 llvm-svn: 357111
* [ARM] Don't confuse the scheduler for very large VLDMDIA etc.Eli Friedman2019-03-271-1/+6
| | | | | | | | | | | | | ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the memory operands of load and store-multiple operations. This doesn't really fix the problem properly, but it's enough to prevent crashing, at least. Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 . Differential Revision: https://reviews.llvm.org/D59834 llvm-svn: 357109
* [AArch64][GlobalISel] Make G_PHI of v2s64, v4s32, v2s32 legal.Amara Emerson2019-03-271-1/+1
| | | | llvm-svn: 357108
* [ConstantRange] Rename isWrappedSet() to isUpperWrapped()Nikita Popov2019-03-272-17/+17
| | | | | | | | | | | | | | Split out from D59749. The current implementation of isWrappedSet() doesn't do what it says on the tin, and treats ranges like [X, Max] as wrapping, because they are represented as [X, 0) when using half-inclusive ranges. This also makes it inconsistent with the semantics of isSignWrappedSet(). This patch renames isWrappedSet() to isUpperWrapped(), in preparation for the introduction of a new isWrappedSet() method with corrected behavior. llvm-svn: 357107
* RegPressure: Fix crash on blocks with only dbg_valueMatt Arsenault2019-03-271-1/+7
| | | | | | | | If there were only dbg_values in the block, recede would hit the beginning of the block and try to use thet dbg_value as a real instruction. llvm-svn: 357105
* [InstCombine] Use uadd.sat and usub.sat for canonicalizationNikita Popov2019-03-271-28/+21
| | | | | | | | | | | | | Start using the uadd.sat and usub.sat intrinsics for the existing canonicalizations. These intrinsics should optimize better than expanded IR, have better handling in the X86 backend and should be no worse than expanded IR in other backends, as far as we know. rL357012 already introduced use of uadd.sat for the add+umin pattern. Differential Revision: https://reviews.llvm.org/D58872 llvm-svn: 357103
* [GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead ↵Amara Emerson2019-03-271-1/+2
| | | | | | | | | | | | | | | | | | | | instructions. The artifact combiners push instructions which have been marked for deletion onto an list for the legalizer to deal with on return. However, for trunc(ext) combines the combiner routine recursively calls itself. When it does this the dead instructions list may not be empty, and the other combiners don't expect to be dealing with essentially invalid MIR (multiple vreg defs etc). This change fixes it by ensuring that the dead instructions are processed on entry into tryCombineInstruction. As a result, this fix exposed a few places in tests where G_TRUNC instructions were not being deleted even though they were dead. Differential Revision: https://reviews.llvm.org/D59892 llvm-svn: 357101
* Reapply "AMDGPU: Scavenge register instead of findUnusedReg"Matt Arsenault2019-03-271-1/+1
| | | | | | | | | | | | | This reapplies r356149, using the correct overload of findUnusedReg which passes the current iterator. This worked most of the time, because the scavenger iterator was moved at the end of the frame index loop in PEI. This would fail if the spill was the first instruction. This was further hidden by the fact that the scavenger wasn't passed in for normal frame index elimination. llvm-svn: 357098
* [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRDCraig Topper2019-03-272-10/+36
| | | | | | | | | | | | | | Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD. When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization. This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work. Fixes PR41055 Differential Revision: https://reviews.llvm.org/D59391 llvm-svn: 357096
* [PeepholeOpt] Don't stop simplifying copies on sequence of subregsQuentin Colombet2019-03-271-6/+1
| | | | | | | | | | | | | | | This patch removes an overly conservative check that would prevent simplifying copies when the value we were tracking would go through several subregister indices. Indeed, the intend of this check was to not track values whenever we have to compose subregister, but actually what the check was doing was bailing anytime we see a second subreg, even if that second subreg would actually be the new source of truth (as opposed to a part of that subreg). Differential Revision: https://reviews.llvm.org/D59891 llvm-svn: 357095
* [AArch64][SVE] Asm: error on unexpected SVE vector register type suffixSander de Smalen2019-03-271-3/+2
| | | | | | | | | | | | | | | | | | | | | | This patch fixes an assembler bug that allowed SVE vector registers to contain a type suffix when not expected. The SVE unpredicated movprfx instruction is the only instruction affected. The following are examples of what was previously valid: movprfx z0.b, z0.b movprfx z0.b, z0.s movprfx z0, z0.s These instructions are now erroneous. Patch by Cullen Rhodes (c-rhodes) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D59636 llvm-svn: 357094
* AMDGPU: Enable the scavenger for large framesMatt Arsenault2019-03-271-5/+14
| | | | | | | Another test is needed for the case where the scavenge fail, but there's another issue with that which needs an additional fix. llvm-svn: 357093
* AMDGPU: Add additional MIR tests for exec mask optimizationsMatt Arsenault2019-03-271-3/+11
| | | | | | | | | | Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091
* AMDGPU: Skip debug_instr when collapsing end_cfMatt Arsenault2019-03-271-3/+8
| | | | | | | Based on how these are inserted, I doubt this was causing a problem in practice. llvm-svn: 357090
* AMDGPU: Fix missing scc implicit def on s_andn2_b64_termMatt Arsenault2019-03-271-18/+13
| | | | | | | Introduce new helper class to copy properties directly from the base instruction. llvm-svn: 357089
* New methods to check for under-/overflow in the SMT APIMikhail R. Gadelha2019-03-271-0/+70
| | | | | | | | | | | | | | | | Summary: Added methods to check for under-/overflow in additions, subtractions, signed divisions/modulus, negations, and multiplications. Reviewers: ddcc, gou4shi1 Reviewed By: ddcc, gou4shi1 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59796 llvm-svn: 357088
* PEI: Delay checking requiresFrameIndexReplacementScavengingMatt Arsenault2019-03-271-4/+10
| | | | | | | | | | Currently this is called before the frame size is set on the function. For AMDGPU, the scavenger is used for large frames where part of the offset needs to be materialized in a register, so estimating the frame size is useful for knowing whether the scavenger is useful. llvm-svn: 357087
* [MCA] Fix -Wparentheses warning breaking the -Werror build.Andrea Di Biagio2019-03-271-1/+2
| | | | | | Waring was introduced at r357074. llvm-svn: 357085
* AMDGPU: Don't hardcode num defs for MUBUF instructionsMatt Arsenault2019-03-271-2/+2
| | | | | | | This shouldn't change anything since the no-ret atomics are selected later. llvm-svn: 357084
* MIR: Freeze reserved regs after parsing everythingMatt Arsenault2019-03-271-3/+8
| | | | | | | | | | | | The AMDGPU implementation of getReservedRegs depends on MachineFunctionInfo fields that are parsed from the YAML section. This was reserving the wrong register since it was setting the reserved regs before parsing the correct one. Some tests were relying on the default reserved set for the assumed default calling convention. llvm-svn: 357083
* AMDGPU: wave_barrier is not isBarrierMatt Arsenault2019-03-271-1/+0
| | | | | | | This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081
* [BPF] use std::map to ensure consistent outputYonghong Song2019-03-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | The .BTF.ext FuncInfoTable and LineInfoTable contain information organized per ELF section. Current definition of FuncInfoTable/LineInfoTable is: std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable where the key is the section name off in the string table. The unordered_map may cause the order of section output different for different platforms. The same for unordered map definition of std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>> DataSecEntries where BTF_KIND_DATASEC entries may have different ordering for different platforms. This patch fixed the issue by using std::map. Test static-var-derived-type.ll is modified to generate two DataSec's which will ensure the ordering is the same for all supported platforms. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 357077
* [MCA][Pipeline] Don't visit stages in reverse order when calling method ↵Andrea Di Biagio2019-03-271-3/+3
| | | | | | | | | | cycleEnd(). NFCI There is no reason why stages should be visited in reverse order. This patch allows the definition of stages that push instructions forward from their cycleEnd() routine. llvm-svn: 357074
* AMDGPU: Fix areLoadsFromSameBasePtr for DS atomicsMatt Arsenault2019-03-271-4/+11
| | | | | | The offset operand index is different for atomics. llvm-svn: 357073
* [DAGCombiner] Unify Lifetime and memory Op aliasing.Nirav Dave2019-03-272-79/+120
| | | | | | | | | | | | | | | | | | | Rework BaseIndexOffset and isAlias to fully work with lifetime nodes and fold in lifetime alias analysis. This is mostly NFC. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59794 llvm-svn: 357070
* [DAGCombine] Refactor GatherAllAliases. NFCI.Nirav Dave2019-03-271-65/+66
| | | | llvm-svn: 357069
* Re-commit r355490 "[CodeGen] Omit range checks from jump tables when ↵Hans Wennborg2019-03-272-55/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | lowering switches with unreachable default" Original commit by Ayonam Ray. This commit adds a regression test for the issue discovered in the previous commit: that the range check for the jump table can only be omitted if the fall-through destination of the jump table is unreachable, which isn't necessarily true just because the default of the switch is unreachable. This addresses the missing optimization in PR41242. > During the lowering of a switch that would result in the generation of a > jump table, a range check is performed before indexing into the jump > table, for the switch value being outside the jump table range and a > conditional branch is inserted to jump to the default block. In case the > default block is unreachable, this conditional jump can be omitted. This > patch implements omitting this conditional branch for unreachable > defaults. > > Differential Revision: https://reviews.llvm.org/D52002 > Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 357067
OpenPOWER on IntegriCloud