summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles ↵Simon Pilgrim2018-10-131-0/+60
| | | | | | | | | | | | | | (PR39161) Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute. This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet. We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead. Differential Revision: https://reviews.llvm.org/D53148 llvm-svn: 344446
* [AArch64] Swap comparison operands if that enables some folding.Arnaud A. de Grandmaison2018-10-131-12/+74
| | | | | | | | | | | | | | | | Summary: AArch64 can fold some shift+extend operations on the RHS operand of comparisons, so swap the operands if that makes sense. This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751 Reviewers: efriedma, t.p.northover, javed.absar Subscribers: mcrosier, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53067 llvm-svn: 344439
* [WebAssembly] SIMD min and maxThomas Lively2018-10-131-7/+7
| | | | | | | | | | | | Summary: Depends on D52324 and D52764. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52325 llvm-svn: 344438
* [Intrinsic] Add llvm.minimum and llvm.maximum instrinsic functionsThomas Lively2018-10-131-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | Summary: These new intrinsics have the semantics of the `minimum` and `maximum` operations specified by the latest draft of IEEE 754-2018. Unlike llvm.minnum and llvm.maxnum, these new intrinsics propagate NaNs and always treat -0.0 as less than 0.0. `minimum` and `maximum` lower directly to the existing `fminnan` and `fmaxnan` ISel DAG nodes. It is safe to reuse these DAG nodes because before this patch were only emitted in situations where there were known to be no NaN arguments or where NaN propagation was correct and there were known to be no zero arguments. I know of only four backends that lower fminnan and fmaxnan: WebAssembly, ARM, AArch64, and SystemZ, and each of these lowers fminnan and fmaxnan to instructions that are compatible with the IEEE 754-2018 semantics. Reviewers: aheejin, dschuff, sunfish, javed.absar Subscribers: kristof.beyls, dexonsmith, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D52764 llvm-svn: 344437
* [WebAssembly][NFC] Unify ARGUMENT classesThomas Lively2018-10-135-45/+36
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53172 llvm-svn: 344436
* move GetOrCreateFunctionComdat to Instrumentation.cpp/Instrumentation.hKostya Serebryany2018-10-122-18/+19
| | | | | | | | | | | | | | | | | | Summary: GetOrCreateFunctionComdat is currently used in SanitizerCoverage, where it's defined. I'm planing to use it in HWASAN as well, so moving it into a common location. NFC Reviewers: morehouse Reviewed By: morehouse Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53218 llvm-svn: 344433
* [RISCV] Eliminate unnecessary masking of promoted shift amountsAlex Bradbury2018-10-121-3/+20
| | | | | | | | | | | | | | | | SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it is promoted to the ShiftAmountTy. This results in zero-extension (masking) which is unnecessary for RISC-V as the shift operations only read the lower 5 or 6 bits (RV32 or RV64). I initially proposed adding a getExtendForShiftAmount hook so the shift amount can be any-extended (D52975). @efriedma explained this was unsafe, so I have instead eliminate the unnecessary and operations at instruction selection time in a manner similar to X86InstrCompiler.td. Differential Revision: https://reviews.llvm.org/D53224 llvm-svn: 344432
* [LegalizeVectorTypes] Use TLI.getVectorIdxTy instead of DAG.getIntPtrConstant.Craig Topper2018-10-121-2/+3
| | | | | | There's no guarantee that vector indices should use pointer types. So use the correct query method. llvm-svn: 344428
* [X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to ↵Craig Topper2018-10-121-7/+13
| | | | | | avoid a stack stack temporary. llvm-svn: 344425
* [X86] Simplify the end of custom type legalization for (v2i32/v4i16/v8i8 ↵Craig Topper2018-10-121-7/+3
| | | | | | | | (bitcast (f64))) by just emitting an EXTRACT_SUBVECTOR instead of a BUILD_VECTOR. Generic legalization should be able to finish legalizing the EXTRACT_SUBVECTOR probably by turning it into a BUILD_VECTOR. But we should emit the simplest sequence. llvm-svn: 344424
* [X86] Skip (v2i32/v4i16/v8i8 (bitcast (f64))) handling in ReplaceNodeResults ↵Craig Topper2018-10-121-8/+2
| | | | | | | | if the dest type can be widened by generic legalization. NFCI The algorithm we would do previously was identical to generic legalization. If we ever switch to legalizing integer vectors via widening we'll be able to kill off the code since it now only runs for promotion. llvm-svn: 344423
* [LegalizeVectorTypes] When widening the result of a bitcast from a scalar ↵Craig Topper2018-10-121-14/+12
| | | | | | | | | | type, use a scalar_to_vector to turn the scalar into a vector intead of a build vector full of mostly undefs. This is more consistent with what we usually do and matches some code X86 custom emits in some cases that I think I can cleanup. The MIPS test change just looks to be an instruction ordering change. llvm-svn: 344422
* Revert BTF commit series.Eli Friedman2018-10-1214-1072/+4
| | | | | | | | | | The initial patch was not reviewed, and does not have any tests; it should not have been merged. This reverts 344395, 344390, 344387, 344385, 344381, 344376, and 344366. llvm-svn: 344405
* [LegalizeVectorTypes] When widening the operands to a concat_vectors, see if ↵Craig Topper2018-10-121-5/+16
| | | | | | | | | | | | we can use the widened operand 0 if the width matches and the other operands are undef. This saves a conversion to extracts and build_vector. We already do this when both the result and the input need to be widened to the same type. This changed the sse-intrinsics-fast-isel test because we don't lower (insert_vector_elt (scalar_to_vector X), Y, 1) well. We turn it into (vector_shuffle (scalar_to_vector X), (scalar_to_vector Y), <0, 4, 2, 3>) losing track of the fact that the upper elts could be undef. We should probably find a way to prevent the scalarization of the <2 x f32> load on these tests. llvm-svn: 344404
* [LegalizeVectorTypes] When unrolling in WidenVecRes_Convert, make sure we ↵Craig Topper2018-10-121-12/+6
| | | | | | | | | | use the original vector element count. Not min of the widened result type and the possibly widened input type. If the input type is widened as well, but we still were forced to unroll, we shouldn't be considering the widened input element count. We should only create as many scalar operations as the original type called for. This will be important for an upcoming patch. llvm-svn: 344403
* Replace assert() with llvm_unreachable because it's obviously a typo.Rui Ueyama2018-10-121-1/+1
| | | | llvm-svn: 344395
* [codeview] Emit S_BUILDINFO and LF_BUILDINFO with cwd and source fileReid Kleckner2018-10-122-0/+50
| | | | | | | | | | | | Summary: We can fill in the command line and compiler path later if we want. Reviewers: zturner Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D53179 llvm-svn: 344393
* [SanitizerCoverage] Prevent /OPT:REF from stripping constructorsJonathan Metzman2018-10-121-0/+21
| | | | | | | | | | | | | | | | | | | | | Summary: Linking with the /OPT:REF linker flag when building COFF files causes the linker to strip SanitizerCoverage's constructors. Prevent this by giving the constructors WeakODR linkage and by passing the linker a directive to include sancov.module_ctor. Include a test in compiler-rt to verify libFuzzer can be linked using /OPT:REF Reviewers: morehouse, rnk Reviewed By: morehouse, rnk Subscribers: rnk, morehouse, hiraditya Differential Revision: https://reviews.llvm.org/D52119 llvm-svn: 344391
* [BPF] Use cstdint {,u}int*_t instead of linux/types.h __u32 __u16 ...Fangrui Song2018-10-122-9/+9
| | | | llvm-svn: 344387
* Disambiguate: s/make_unique/llvm::make_unique/. NFCEric Liu2018-10-121-13/+13
| | | | llvm-svn: 344385
* [BPF] Don't include linux/types.h and fix styleFangrui Song2018-10-126-192/+184
| | | | llvm-svn: 344381
* Better support for POSIX paths in PDBs.Zachary Turner2018-10-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This a resubmission of a patch which was previously reverted due to breaking several lld tests. The issues causing those failures have been fixed, so the patch is now resubmitted. ---Original Commit Message--- While it doesn't make a *ton* of sense for POSIX paths to be in PDBs, it's possible to occur in real scenarios involving cross compilation. The tools need to be able to handle this, because certain types of debugging scenarios are possible without a running process and so don't necessarily require you to be on a Windows system. These include post-mortem debugging and binary forensics (e.g. using a debugger to disassemble functions and examine symbols without running the process). There's changes in clang, LLD, and lldb in this patch. After this the cross-platform disassembly and source-list tests pass on Linux. Furthermore, the behavior of LLD can now be summarized by a much simpler rule than before: Unless you specify /pdbsourcepath and /pdbaltpath, the PDB ends up with paths that are valid within the context of the machine that the link is performed on. Differential Revision: https://reviews.llvm.org/D53149 llvm-svn: 344377
* [BPF] Some fixes after rL344366Fangrui Song2018-10-123-2/+5
| | | | | | | | | | * Move #include outside of namespaces * Add missing #include * Add out-of-line virtual destructor to BTFTypeEntry designated initializers should also be fixed llvm-svn: 344376
* [Support] exit with custom return code for SIGPIPENick Desaulniers2018-10-121-0/+5
| | | | | | | | | | | | | | | | | | | | | Summary: We tell the user to file a bug report on LLVM right now, and SIGPIPE isn't LLVM's fault so our error message is wrong. Allows frontends to detect SIGPIPE from writing to closed readers. This can be seen commonly from piping into head, tee, or split. Fixes PR25349, rdar://problem/14285346, b/77310947 Reviewers: jfb Reviewed By: jfb Subscribers: majnemer, kristina, llvm-commits, thakis, srhines Differential Revision: https://reviews.llvm.org/D53000 llvm-svn: 344372
* [BPF] Add BTF generation for BPF targetYonghong Song2018-10-1214-1/+1074
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BTF is the debug format for BPF, a kernel virtual machine and widely used for tracing, networking and security, etc ([1]). Currently only instruction streams are passed to kernel, the kernel verifier verifies them before execution. In order to provide better visibility of bpf programs to user space tools, some debug information, e.g., function names and debug line information are desirable for kernel so tools can get such information with better annotation for jited instructions for performance or other reasons. The dwarf is too complicated in kernel and for BPF. Hence, BTF is designed to be the debug format for BPF ([2]). Right now, pahole supports BTF for types, which are generated based on dwarf sections in the ELF file. In order to annotate performance metrics for jited bpf insns, it is necessary to pass debug line info to the kernel. Furthermore, we want to pass the actual code to the kernel because of the following reasons: . bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used. The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. Another consideration is for JIT. The project like bcc use MCJIT to compile a C program into bpf insns and load them to the kernel ([3]). The generated BTF sections will be readily available for such cases as well. This patch implemented generation of BTF info in llvm compiler. The BTF related sections will be generated when both -target bpf and -g are specified. Two sections are generated: .BTF contains all the type and string information, and .BTF.ext contains the func_info and line_info. The separation is related to how two sections are used differently in bpf loader, e.g., linux libbpf ([4]). The .BTF section can be loaded into the kernel directly while .BTF.ext needs loader manipulation before loading to the kernel. The format of the each section is roughly defined in llvm:include/llvm/MC/MCBTFContext.h and from the implementation in llvm:lib/MC/MCBTFContext.cpp. A later example also shows the contents in each section. The type and func_info are gathered during CodeGen/AsmPrinter by traversing dwarf debug_info. The line_info is gathered in MCObjectStreamer before writing to the object file. After all the information is gathered, the two sections are emitted in MCObjectStreamer::finishImpl. With cmake CMAKE_BUILD_TYPE=Debug, the compiler can dump out all the tables except insn offset, which will be resolved later as relocation records. The debug type "btf" is used for BTFContext dump. Dwarf tests the debug info generation with llvm-dwarfdump to decode the binary sections and check whether the result is expected. Currently we do not have such a tool yet. We will implement btf dump functionality in bpftool ([5]) as the bpftool is considered the recommended tool for bpf introspection. The implementation for type and func_info is tested with linux kernel test cases. The line_info is visually checked with dump from linux kernel libbpf ([4]) and checked with readelf dumping section raw data. Note that the .BTF and .BTF.ext information will not be emitted to assembly code and there is no assembler support for BTF either. In the below, with a clang/llvm built with CMAKE_BUILD_TYPE=Debug, Each table contents are shown for a simple C program. -bash-4.2$ cat -n test.c 1 struct A { 2 int a; 3 char b; 4 }; 5 6 int test(struct A *t) { 7 return t->a; 8 } -bash-4.2$ clang -O2 -target bpf -g -mllvm -debug-only=btf -c test.c Type Table: [1] FUNC name_off=1 info=0x0c000001 size/type=2 param_type=3 [2] INT name_off=12 info=0x01000000 size/type=4 desc=0x01000020 [3] PTR name_off=0 info=0x02000000 size/type=4 [4] STRUCT name_off=16 info=0x04000002 size/type=8 name_off=18 type=2 bit_offset=0 name_off=20 type=5 bit_offset=32 [5] INT name_off=22 info=0x01000000 size/type=1 desc=0x02000008 String Table: 0 : 1 : test 6 : .text 12 : int 16 : A 18 : a 20 : b 22 : char 27 : test.c 34 : int test(struct A *t) { 58 : return t->a; FuncInfo Table: sec_name_off=6 insn_offset=<Omitted> type_id=1 LineInfo Table: sec_name_off=6 insn_offset=<Omitted> file_name_off=27 line_off=34 line_num=6 column_num=0 insn_offset=<Omitted> file_name_off=27 line_off=58 line_num=7 column_num=3 -bash-4.2$ readelf -S test.o ...... [12] .BTF PROGBITS 0000000000000000 0000028d 00000000000000c1 0000000000000000 0 0 1 [13] .BTF.ext PROGBITS 0000000000000000 0000034e 0000000000000050 0000000000000000 0 0 1 [14] .rel.BTF.ext REL 0000000000000000 00000648 0000000000000030 0000000000000010 16 13 8 ...... -bash-4.2$ The latest linux kernel ([6]) can already support .BTF with type information. The [7] has the reference implementation in linux kernel side to support .BTF.ext func_info. The .BTF.ext line_info support is not implemented yet. If you have difficulty accessing [6], you can manually do the following to access the code: git clone https://github.com/yonghong-song/bpf-next-linux.git cd bpf-next-linux git checkout btf The change will push to linux kernel soon once this patch is landed. References: [1]. https://www.kernel.org/doc/Documentation/networking/filter.txt [2]. https://lwn.net/Articles/750695/ [3]. https://github.com/iovisor/bcc [4]. https://github.com/torvalds/linux/tree/master/tools/lib/bpf [5]. https://github.com/torvalds/linux/tree/master/tools/bpf/bpftool [6]. https://github.com/torvalds/linux [7]. https://github.com/yonghong-song/bpf-next-linux/tree/btf Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Differential Revision: https://reviews.llvm.org/D52950 llvm-svn: 344366
* [x86] add and use fast horizontal vector math subtarget featureSanjay Patel2018-10-123-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen by default. AMD Jaguar (btver2) should have no difference with this patch because it has fast-hops. (If we want to set that bit for other CPUs, let me know.) The code changes are small, but there are many test diffs. For files that are specifically testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences side-by-side. For files that are primarily concerned with codegen other than hops, I just updated the CHECK lines to reflect the new default codegen. To recap the recent horizontal op story: 1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. Hops were likely not optimal for all targets though. 2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195). 3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but probably bad for other CPUs. 4. This patch allows us to distinguish when we want to produce hops, so everyone can be happy. I'm not sure if we have the best predicate here, but the intent is to undo the extra hop-iness that was enabled by r344141. Differential Revision: https://reviews.llvm.org/D53095 llvm-svn: 344361
* [MC][ELF] fix newly added testNick Desaulniers2018-10-121-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Reland of - r344197 "[MC][ELF] compute entity size for explicit sections" - r344206 "[MC][ELF] Fix section_mergeable_size.ll" after being reverted in r344278 due to build breakages from not specifying a target triple. Move test from test/CodeGen/Generic/ to test/MC/ELF/. Add explicit target triple so we don't try to run this test on non ELF targets. Reported: https://reviews.llvm.org/D53056#1261707 Reviewers: fhahn, rnk, espindola, NoQ Reviewed By: fhahn, rnk Subscribers: NoQ, MaskRay, rengolin, emaste, arichardson, llvm-commits, pirama, srhines Differential Revision: https://reviews.llvm.org/D53146 llvm-svn: 344360
* Pull out repeated value types. NFCI.Simon Pilgrim2018-10-121-6/+5
| | | | llvm-svn: 344355
* Pull out repeated value types. NFCI.Simon Pilgrim2018-10-121-3/+5
| | | | llvm-svn: 344354
* Fix unused variable warning after r344348Eric Liu2018-10-121-0/+1
| | | | llvm-svn: 344350
* [SelectionDAG] Move VectorLegalizer::ExpandCTLZ codegen into ↵Simon Pilgrim2018-10-122-24/+5
| | | | | | | | SelectionDAGLegalize Generalize SelectionDAGLegalize's CTLZ expansion to handle vectors - lets VectorLegalizer::ExpandCTLZ to just pass the expansion on instead of repeating the same codegen. llvm-svn: 344349
* [X86][SSE] LowerVectorCTPOP - pull out repeated byte sum stage. Simon Pilgrim2018-10-121-51/+30
| | | | | | | | Pull out repeated byte sum stage for popcount of vector elements > 8bits. This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw. llvm-svn: 344348
* [PowerPC] avoid masking already-zero bits in BitPermutationSelectorHiroshi Inoue2018-10-121-15/+104
| | | | | | | | | | | | | | | The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable. ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value. This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node. VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them. VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits. This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero). Differential Revision: https://reviews.llvm.org/D48025 llvm-svn: 344347
* [SanitizerCoverage] Make Inline8bit and TracePC counters dead stripping ↵Max Moroz2018-10-121-3/+4
| | | | | | | | | | | | | | | | | | | | resistant. Summary: Otherwise, at least on Mac, the linker eliminates unused symbols which causes libFuzzer to error out due to a mismatch of the sizes of coverage tables. Issue in Chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=892167 Reviewers: morehouse, kcc, george.karpenkov Reviewed By: morehouse Subscribers: kubamracek, llvm-commits Differential Revision: https://reviews.llvm.org/D53113 llvm-svn: 344345
* [X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combineSimon Pilgrim2018-10-121-0/+12
| | | | | | | | Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes. This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment. llvm-svn: 344336
* [tblgen][llvm-mca] Add the ability to describe move elimination candidates ↵Andrea Di Biagio2018-10-121-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | via tablegen. This patch adds the ability to identify instructions that are "move elimination candidates". It also allows scheduling models to describe processor register files that allow move elimination. A move elimination candidate is an instruction that can be eliminated at register renaming stage. Each subtarget can specify which instructions are move elimination candidates with the help of tablegen class "IsOptimizableRegisterMove" (see llvm/Target/TargetInstrPredicate.td). For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated. The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this: ``` def : IsOptimizableRegisterMove<[ InstructionEquivalenceClass<[ // GPR variants. MOV32rr, MOV64rr, // MMX variants. MMX_MOVQ64rr, // SSE variants. MOVAPSrr, MOVUPSrr, MOVAPDrr, MOVUPDrr, MOVDQArr, MOVDQUrr, // AVX variants. VMOVAPSrr, VMOVUPSrr, VMOVAPDrr, VMOVUPDrr, VMOVDQArr, VMOVDQUrr ], CheckNot<CheckSameRegOperand<0, 1>> > ]>; ``` Definitions of IsOptimizableRegisterMove from processor models of a same Target are processed by the SubtargetEmitter to auto-generate a target-specific override for each of the following predicate methods: ``` bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI) const; bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned CPUID) const; ``` By default, those methods return false (i.e. conservatively assume that there are no move elimination candidates). Tablegen class RegisterFile has been extended with the following information: - The set of register classes that allow move elimination. - Maxium number of moves that can be eliminated every cycle. - Whether move elimination is restricted to moves from registers that are known to be zero. This patch is structured in three part: A first part (which is mostly boilerplate) adds the new 'isOptimizableRegisterMove' target hooks, and extends existing register file descriptors in MC by introducing new fields to describe properties related to move elimination. A second part, uses the new tablegen constructs to describe move elimination in the BtVer2 scheduling model. A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove' hook to mark instructions that are candidates for move elimination. It also teaches class RegisterFile how to describe constraints on move elimination at PRF granularity. llvm-mca tests for btver2 show differences before/after this patch. Differential Revision: https://reviews.llvm.org/D53134 llvm-svn: 344334
* [X86] Ignore float/double non-temporal loads (PR39256)Simon Pilgrim2018-10-121-0/+3
| | | | | | | | Scalar non-temporal loads were asserting instead of just being ignored. Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895 llvm-svn: 344331
* SCCP: avoid caching DenseMap entry that might be invalidated.Tim Northover2018-10-121-3/+5
| | | | | | | Later calls to getValueState might insert entries into the ValueState map and cause reallocation, invalidating a reference. llvm-svn: 344327
* [mips] Mark fmaxl as a long double emulation routineStefan Maksimovic2018-10-121-4/+4
| | | | | | | | | | | | | | | | | | Failure was discovered upon running projects/compiler-rt/test/builtins/Unit/divtc3_test.c in a stage2 compiler build. When compiling projects/compiler-rt/lib/builtins/divtc3.c, a call to fmaxl within the divtc3 implementation had its return values read from registers $2 and $3 instead of $f0 and $f2. Include fmaxl in the list of long double emulation routines to have its return value correctly interpreted as f128. Almost exact issue here: https://reviews.llvm.org/D17760 Differential Revision: https://reviews.llvm.org/D52649 llvm-svn: 344326
* [ThinLTO] Don't import GV which contains blockaddressEugene Leviant2018-10-122-5/+17
| | | | | | Differential revision: https://reviews.llvm.org/D53139 llvm-svn: 344325
* [DAGCombiner] rearrange extract_element+bitcast fold; NFCSanjay Patel2018-10-111-6/+8
| | | | | | | | | | I want to add another pattern here that includes scalar_to_vector, so this makes that patch smaller. I was hoping to remove the hasOneUse() check because it shouldn't be necessary for common codegen, but an AMDGPU test has a comment suggesting that the extra check makes things better on one of those targets. llvm-svn: 344320
* Revert "DwarfDebug: Pick next location in case of missing location at block ↵Matthias Braun2018-10-112-74/+41
| | | | | | | | | | | | | | | begin" It originally triggered a stepping problem in the debugger, which could be fixed by adjusting CodeGen/LexicalScopes.cpp however it seems we prefer the previous behavior anyway. See the discussion for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181008/593833.html This reverts commit r343880. This reverts commit r343874. llvm-svn: 344318
* Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"Tom Stellard2018-10-112-31/+0
| | | | | | | | This reverts commit r344310. The test case was failing on some bots. llvm-svn: 344317
* X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_FreeMatthias Braun2018-10-111-1/+5
| | | | | | | | | | | | | | | DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315
* merge two near-identical functions createPrivateGlobalForString into oneKostya Serebryany2018-10-113-33/+21
| | | | | | | | | | | | | | | | Summary: We have two copies of createPrivateGlobalForString (in asan and in esan). This change merges them into one. NFC Reviewers: vitalybuka Reviewed By: vitalybuka Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53178 llvm-svn: 344314
* AMDGPU/GlobalISel: Implement select for G_INSERTTom Stellard2018-10-112-0/+31
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 344310
* [RISCV] Fix disassembling of fence instruction with invalid fieldAna Pazos2018-10-111-0/+4
| | | | | | | | | | | | | | | | | Summary: Instruction with 0 in fence field being disassembled as fence , iorw. Printing "unknown" to match GAS behavior. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D51828 llvm-svn: 344309
* Inline variable into assert to avoid unused variable warning.Richard Trieu2018-10-111-2/+1
| | | | llvm-svn: 344308
* [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.Craig Topper2018-10-111-0/+24
| | | | | | | | | | | | On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success. This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain. There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately. Differential Revision: https://reviews.llvm.org/D52528 llvm-svn: 344291
* [WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] (fixed)Thomas Lively2018-10-1111-95/+5
| | | | llvm-svn: 344287
OpenPOWER on IntegriCloud