summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Switch lowering: exploit unreachable fall-through when lowering case range ↵Hans Wennborg2019-03-292-3/+23
| | | | | | | | | | | | | | | | | | | | cluster In the example below, we would previously emit two range checks, one for cases 1--3 and one for 4--6. This patch makes us exploit the fact that the fall-through is unreachable and only one range check is necessary. switch i32 %i, label %default [ i32 1, label %bb1 i32 2, label %bb1 i32 3, label %bb1 i32 4, label %bb2 i32 5, label %bb2 i32 6, label %bb2 ] default: unreachable llvm-svn: 357252
* [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodesDmitry Preobrazhensky2019-03-291-7/+7
| | | | | | | | | | See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59878 llvm-svn: 357249
* [MCA] Add an experimental MicroOpQueue stage.Andrea Di Biagio2019-03-293-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an experimental stage named MicroOpQueueStage. MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically, a decoupling queue between 'decode' and 'dispatch'). Users can specify a queue size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage - can be used to simulate a different throughput from the decoders). This stage is added to the default pipeline between the EntryStage and the DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag -micro-op-queue-size. Throughput from the decoder can be simulated via another hidden flag named -decoder-throughput. That flag allows us to quickly experiment with different frontend throughputs. For targets that declare a loop buffer, flag -decoder-throughput allows users to do multiple runs, each time simulating a different throughput from the decoders. This stage can/will be extended in future. For example, we could add a "buffer full" event to notify bottlenecks caused by backpressure. flag -decoder-throughput would probably go away if in future we delegate to another stage (DecoderStage?) the simulation of a (potentially variable) throughput from the decoders. For now, flag -decoder-throughput is "good enough" to run some simple experiments. Differential Revision: https://reviews.llvm.org/D59928 llvm-svn: 357248
* AMDGPU: Make sram-ecc off by default for Vega20Konstantin Zhuravlyov2019-03-291-1/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247
* [X86] Add X86TargetLowering::isCommutativeBinOp override.Simon Pilgrim2019-03-292-0/+13
| | | | | | We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244
* [SLP] Add support for swapping icmp/fcmp predicates to permit vectorizationSimon Pilgrim2019-03-291-9/+17
| | | | | | | | We should be able to match elements with the swapped predicate as well - as long as we commute the source operands. Differential Revision: https://reviews.llvm.org/D59956 llvm-svn: 357243
* [PowerPC] Add the support for __builtin_setrnd()Kang Zhang2019-03-292-0/+140
| | | | | | | | | | | | | | | | | | Summary: PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode. double __builtin_setrnd(int mode); The effective values for mode are: 0 - round to nearest 1 - round to zero 2 - round to +infinity 3 - round to -infinity Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2). Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D59405 llvm-svn: 357241
* [ScheduleDAG] Move `Topo` and `addEdge` to base class.Clement Courbet2019-03-294-37/+29
| | | | | | | | | Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`. There is nothing actually specific to `ScheduleDAGMI` in `Topo`. llvm-svn: 357239
* AMDGPU/GlobalISel: Insert waterfall loop for vector indexingMatt Arsenault2019-03-292-0/+174
| | | | | | | | The register index can only really be an SGPR. Lie that a VGPR index is legal, and then rewrite the instruction in a waterfall loop to handle the index. llvm-svn: 357235
* [PowerPC] Strength reduction of multiply by a constant by shift and add/sub ↵Zi Xuan Wu2019-03-292-0/+87
| | | | | | | | | | | | | | | | | | | | | | | in place A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns. ``` (mul x, 2^N + 1) => (add (shl x, N), x) (mul x, -(2^N + 1)) => -(add (shl x, N), x) (mul x, 2^N - 1) => (sub (shl x, N), x) (mul x, -(2^N - 1)) => (sub x, (shl x, N)) ``` And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply. Differential Revision: https://reviews.llvm.org/D58950 llvm-svn: 357233
* Revert Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."Florian Hahn2019-03-292-54/+39
| | | | | | | | | | | | | | Another buildbot failure http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/20402 clang-9: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/llvm/include/llvm/ADT/DenseMap.h:1228: llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type* llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::operator->() const [with KeyT = const llvm::Instruction*; ValueT = unsigned int; KeyInfoT = llvm::DenseMapInfo<const llvm::Instruction*>; Bucket = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>; bool IsConst = false; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::pointer = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>*; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>]: Assertion `isHandleInSync() && "invalid iterator access!"' failed. 0. Program arguments: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/bin/clang-9 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name ArchiveCommandLine.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu skylake-avx512 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip/Output/ArchiveCommandLine.llvm.gcno -resource-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0 -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/include -I ../../../include -D _GNU_SOURCE -D __STDC_LIMIT_MACROS -D NDEBUG -D BREAK_HANDLER -D UNICODE -D _UNICODE -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/C -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/myWindows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/include_windows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP -I . -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -D NDEBUG -D _REENTRANT -D ENV_UNIX -D _7ZIP_LARGE_PAGES -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=gnu++98 -fdeprecated-macro -fdebug-compilation-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -ferror-limit 19 -fmessage-length 0 -pthread -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o Output/ArchiveCommandLine.llvm.o -x c++ /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/7zip/UI/Common/ArchiveCommandLine.cpp -faddrsig This reverts r357222 (git commit 64cccfcc72c44ea62f441b782d2177a90912769a) llvm-svn: 357227
* [WebAssembly] Merge used feature sets, update atomics linkage policyThomas Lively2019-03-297-62/+156
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: It does not currently make sense to use WebAssembly features in some functions but not others, so this CL adds an IR pass that takes the union of all used feature sets and applies it to each function in the module. This allows us to prevent atomics from being lowered away if some function has opted in to using them. When atomics is not enabled anywhere, we detect whether there exists any atomic operations or thread local storage that would be stripped and disallow linking with objects that contain atomics if and only if atomics or tls are stripped. When atomics is enabled, mark it as used but do not require it of other objects in the link. These changes allow libraries that do not use atomics to be built once and linked into both single-threaded and multithreaded binaries. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59625 llvm-svn: 357226
* Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."Florian Hahn2019-03-282-39/+54
| | | | | | | | Recommitting after addressing a buildbot failure. This reverts commit c87869ebea000dd6483de7c7451cb36c1d36f866. llvm-svn: 357222
* [LSR] Fix signed overflow in GenerateCrossUseConstantOffsets.Florian Hahn2019-03-281-4/+10
| | | | | | | | | | | | | | | For the attached test case, unchecked addition of immediate starts and ends overflows, as they can be arbitrary i64 constants. Proof: https://rise4fun.com/Alive/Plqc Reviewers: qcolombet, gilr, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59218 llvm-svn: 357217
* [BPF] add proper multi-dimensional array supportYonghong Song2019-03-282-35/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For multi-dimensional array like below int a[2][3]; the previous implementation generates BTF_KIND_ARRAY type like below: . element_type: int . index_type: unsigned int . number of elements: 6 This is not the best way to represent arrays, esp., when converting BTF back to headers and users will see int a[6]; instead. This patch generates proper support for multi-dimensional arrays. For "int a[2][3]", the two BTF_KIND_ARRAY types will be generated: Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: #n . index_type: unsigned int . number of elements: 2 The linux kernel already supports such a multi-dimensional array representation properly. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59943 llvm-svn: 357215
* [MC] Fix floating-point literal lexing.Eli Friedman2019-03-282-11/+17
| | | | | | | | | | | | | | | | | | | | | This patch has three related fixes to improve float literal lexing: 1. Make AsmLexer::LexDigit handle floats without a decimal point more consistently. 2. Make AsmLexer::LexFloatLiteral print an error for floats which are apparently missing an "e". 3. Make APFloat::convertFromString use binutils-compatible exponent parsing. Together, this fixes some cases where a float would be incorrectly rejected, fixes some cases where the compiler would crash, and improves diagnostics in some cases. Patch by Brandon Jones. Differential Revision: https://reviews.llvm.org/D57321 llvm-svn: 357214
* [SelectionDAGBuilder] Fix 80 column violation. NFCCraig Topper2019-03-281-1/+2
| | | | llvm-svn: 357213
* [InterleavedAccessPass] Don't increase the number of bytes loaded.Eli Friedman2019-03-281-3/+9
| | | | | | | | | | | | | | | | | Even if the interleaving transform would otherwise be legal, we shouldn't introduce an interleaved load that is wider than the original load: it might have undefined behavior. It might be possible to perform some sort of mask-narrowing transform in some cases (using a narrower interleaved load, then extending the results using shufflevectors). But I haven't tried to implement that, at least for now. Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 . Differential Revision: https://reviews.llvm.org/D59954 llvm-svn: 357212
* Revert [DSE] Preserve basic block ordering using OrderedBasicBlock.Florian Hahn2019-03-282-53/+39
| | | | | | | | | | | | This reverts r357208 (git commit c0bfd37d385c93711ef3a349599dba20e6b101ef) This causes a buildbot failure: http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/16124 FAILED: lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/install/stage2/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/IR -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR -Iinclude -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/include -fPIC -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -O3 -UNDEBUG -fno-exceptions -fno-rtti -MD -MT lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -MF lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o.d -o lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -c /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR/IRBuilder.cpp clang-9: /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/Analysis/OrderedBasicBlock.cpp:38: bool llvm::OrderedBasicBlock::comesBefore(const llvm::Instruction *, const llvm::Instruction *): Assertion `!(LastInstFound == BB->end() && NextInstPos != 0) && "Instruction supposed to be in NumberedInsts"' failed. llvm-svn: 357211
* [DSE] Preserve basic block ordering using OrderedBasicBlock.Florian Hahn2019-03-282-39/+53
| | | | | | | | | | | | | | | | By extending OrderedBB to allow removing and replacing cached instructions, we can preserve OrderedBBs in DSE easily. This eliminates one source of quadratic compile time in DSE. Fixes PR38829. Reviewers: rnk, efriedma, hfinkel Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59789 llvm-svn: 357208
* [MemDepAnalysis] Allow caller to pass in an OrderedBasicBlock.Florian Hahn2019-03-281-14/+19
| | | | | | | | | | | | | If the caller can preserve the OBB, we can avoid recomputing the order for each getDependency call. Reviewers: efriedma, rnk, hfinkel Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D59788 llvm-svn: 357206
* Temporarily revert "SafepointIRVerifier port to new Pass Manager"Adrian Prantl2019-03-283-13/+0
| | | | | | | | | | | to unbreak the modular bots and its follow-up commit. This reverts commit https://reviews.llvm.org/D59825 because it introduced a fatal error: cyclic dependency in module 'LLVM_intrinsic_gen': LLVM_intrinsic_gen -> LLVM_IR -> LLVM_intrinsic_gen llvm-svn: 357201
* [X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << ↵Craig Topper2019-03-281-23/+29
| | | | | | | | | | | | C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate For 64-bit operations we should consider if the immediate can be made to fit in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate with MOV32ri instead of movabsq. For AND this allows us to fold the immediate. Differential Revision: https://reviews.llvm.org/D59867 llvm-svn: 357196
* Delay initialization of three static global maps, NFCReid Kleckner2019-03-282-79/+79
| | | | | | | This avoids allocating a few KB of heap memory on startup, and instead allocates these maps lazily. I noticed this while profiling LLD. llvm-svn: 357192
* Make helper functions static. NFC.Benjamin Kramer2019-03-281-2/+2
| | | | llvm-svn: 357187
* [MIPS GlobalISel] Select float constantsPetar Avramovic2019-03-283-4/+71
| | | | | | | | Select 32 and 64 bit float constants for MIPS32. Differential Revision: https://reviews.llvm.org/D59933 llvm-svn: 357183
* [DAG] Fix Lifetime Node ID hashing.Nirav Dave2019-03-281-0/+7
| | | | llvm-svn: 357179
* [DAGCombiner] fold sext into negationSanjay Patel2019-03-281-0/+10
| | | | | | | | | | | | | | As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178
* [x86] avoid cmov in movmsk reductionSanjay Patel2019-03-281-19/+21
| | | | | | | | | | | | | | | | | | | | | | This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions. We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed. There seems to be a missing fold for sext of a bool bit here: negl %ecx movslq %ecx, %rax ...but that's an independent transform. Differential Revision: https://reviews.llvm.org/D59818 llvm-svn: 357172
* [X86MacroFusion] Handle branch fusion (AMD CPUs).Clement Courbet2019-03-285-56/+112
| | | | | | | | | | | | | | | | | | Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171
* AMDGPU: Make exec mask optimzations more resistant to block splitsMatt Arsenault2019-03-283-22/+84
| | | | | | | Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170
* [X86] AMD Piledriver (BdVer2): fine-tune some latenciesRoman Lebedev2019-03-281-28/+50
| | | | | | | | | | | | | | Based on llvm-exegesis measurements. Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter, it is now possible to continue cleanup of the scheduler model. With this, there are no more latency inconsistencies for the opcodes that produce stable measurements, and only a few inconsistencies for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.) llvm-svn: 357169
* [NFC] Format InlineFeatureIgnoreList.Clement Courbet2019-03-281-51/+52
| | | | | | To avoid more spurious clang-format changes when adding features (D59872). llvm-svn: 357168
* [DAGCombiner] Fold truncate(build_vector(x,y)) -> ↵Simon Pilgrim2019-03-281-1/+15
| | | | | | | | | | | | build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161
* [yaml2obj][obj2yaml] - Teach yaml2obj/obj2yaml tools about STB_GNU_UNIQUE ↵George Rimar2019-03-281-2/+3
| | | | | | | | | | | | | symbols. yaml2obj/obj2yaml does not support the symbols with STB_GNU_UNIQUE yet. Currently, obj2yaml fails with llvm_unreachable when met such a symbol. I faced it when investigated the https://bugs.llvm.org/show_bug.cgi?id=41196. Differential revision: https://reviews.llvm.org/D59875 llvm-svn: 357158
* [asan] Add options -asan-detect-invalid-pointer-cmp and ↵Pierre Gousseau2019-03-281-6/+31
| | | | | | | | | | | | | -asan-detect-invalid-pointer-sub options. This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract. Disabled by default as this is still an experimental feature. Reviewed By: morehouse, vitalybuka Differential Revision: https://reviews.llvm.org/D59220 llvm-svn: 357157
* [VPlan] Determine Vector Width programmatically.Florian Hahn2019-03-282-20/+37
| | | | | | | | | | | | | | With this change, the VPlan native path is triggered with the directive: #pragma clang loop vectorize(enable) There is no need to specify the vectorize_width(N) clause. Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D57598 llvm-svn: 357156
* [X85][AVX] Add missing vXi16 broadcast fold patternsSimon Pilgrim2019-03-282-0/+24
| | | | | | | | Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155
* [ARM GlobalISel] Fix G_STORE with s1Diana Picus2019-03-281-0/+18
| | | | | | | G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154
* [ARM GlobalISel] Fix selection of G_SELECTDiana Picus2019-03-281-5/+3
| | | | | | | | | | | G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153
* SafepointIRVerifier port to new Pass ManagerSerguei Katkov2019-03-283-0/+13
| | | | | | | | | | | Straightforward port of StatepointIRVerifier pass to new Pass Manager framework. Reviewers: fedor.sergeev, reames Reviewed By: fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D59825 llvm-svn: 357147
* [WebAssembly] Rename wasm fixup kindsSam Clegg2019-03-285-18/+13
| | | | | | | | | | | These fixup kinds are not explicitly related to the code section. They are there to signal how to apply the fixup. Also, a couple of other minor wasm cleanups. Differential Revision: https://reviews.llvm.org/D59908 llvm-svn: 357145
* [NewPM] Fix a nasty bug with analysis invalidation in the new PM.Chandler Carruth2019-03-281-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue here is that we actually allow CGSCC passes to mutate IR (and therefore invalidate analyses) outside of the current SCC. At a minimum, we need to support mutating parent and ancestor SCCs to support the ArgumentPromotion pass which rewrites all calls to a function. However, the analysis invalidation infrastructure is heavily based around not needing to invalidate the same IR-unit at multiple levels. With Loop passes for example, they don't invalidate other Loops. So we need to customize how we handle CGSCC invalidation. Doing this without gratuitously re-running analyses is even harder. I've avoided most of these by using an out-of-band preserved set to accumulate the cross-SCC invalidation, but it still isn't perfect in the case of re-visiting the same SCC repeatedly *but* it coming off the worklist. Unclear how important this use case really is, but I wanted to call it out. Another wrinkle is that in order for this to successfully propagate to function analyses, we have to make sure we have a proxy from the SCC to the Function level. That requires pre-creating the necessary proxy. The motivating test case now works cleanly and is added for ArgumentPromotion. Thanks for the review from Philip and Wei! Differential Revision: https://reviews.llvm.org/D59869 llvm-svn: 357137
* [ARM] Remove dead function ARMMCCodeEmitter::getSOImmOpValueSam Clegg2019-03-271-34/+0
| | | | | | | | | The last reference to this function was removed from the ARM td files in 2015 in rL225266. Differential Revision: https://reviews.llvm.org/D59868 llvm-svn: 357130
* [x86] improve AVX lowering of vector zextSanjay Patel2019-03-271-0/+20
| | | | | | | | | | | | | | | | | | | | | If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently. I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op. From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions. Differential Revision: https://reviews.llvm.org/D59777 llvm-svn: 357129
* [x86] look through bitcast operand of MOVMSKSanjay Patel2019-03-271-6/+5
| | | | | | | | | This is not exactly NFC because it should make further combines of MOVMSK easier to match, but there should be no outward differences because we have isel patterns in place specifically to allow this. See: // Also support integer VTs to avoid a int->fp bitcast in the DAG. llvm-svn: 357128
* [X86ISelDAGToDAG] Move initialization of OptForSize and OptForMinSize from ↵Craig Topper2019-03-271-5/+7
| | | | | | | | PreprocessISelDAG to runOnMachineFunction. NFCI This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created. llvm-svn: 357123
* [DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodesNirav Dave2019-03-271-0/+2
| | | | | | | | | | | | | | Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121
* [LegalizeVectorTypes] Allow single loads and stores for more short vectorsJustin Bogner2019-03-272-7/+14
| | | | | | | | | | | | | | | | | | | When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120
* [WebAssembly] Add some whitespace to WebAssemblyFixIrreducibleControlFlowAlon Zakai2019-03-271-0/+2
| | | | | | | Differential Revision: https://reviews.llvm.org/D59855 modified: llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp llvm-svn: 357117
OpenPOWER on IntegriCloud