summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [NVPTX] Allow libcalls that are defined in the current module.Justin Lebar2018-12-268-4/+202
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds a possibility to make library calls on NVPTX. An important thing about library functions - they must be defined within the current module. This basically should guarantee that we produce a valid PTX assembly (without calls to not defined functions). The one who wants to use the libcalls is probably will have to link against compiler-rt or any other implementation. Currently, it's completely impossible to make library calls because of error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can lower ExternalSymbol to TargetExternalSymbol and verify if the function definition is available. Also, there was an issue with a DAG during legalisation. When we expand instruction into libcall, the inner call-chain isn't being "integrated" into outer chain. Since the last "data-flow" (call retval load) node is located in call-chain earlier than CALLSEQ_END node, the latter becomes a leaf and therefore a dead node (and is being removed quite fast). Proposed here solution relies on another data-flow pseudo nodes (ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and instruction selection phases - we remove the pseudo instructions before register scheduling phase. Patch by Denys Zariaiev! Differential Revision: https://reviews.llvm.org/D34708 llvm-svn: 350069
* [MIPS GlobalISel] Select G_SELECTPetar Avramovic2018-12-253-0/+21
| | | | | | | | | | Add widen scalar for type index 1 (i1 condition) for G_SELECT. Select G_SELECT for pointer, s32(integer) and smaller low level types on MIPS32. Differential Revision: https://reviews.llvm.org/D56001 llvm-svn: 350063
* [PowerPC] Fix the bug of ISD::ADDE to set its second return type to glueKang Zhang2018-12-251-1/+1
| | | | | | | | | | | | | | | Summary: This patch is to fix the bug imported by rL341634. In above submit , the the return type of ISD::ADDE is 14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64), but in fact, the second return type of ISD::ADDE should be MVT::Glue not MVT::i64. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D55977 llvm-svn: 350061
* [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.Craig Topper2018-12-241-0/+9
| | | | | | | | | | | | | | This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059
* [LoopIdioms] More LocationSize::precise annotations; NFCGeorge Burgess IV2018-12-241-1/+2
| | | | | | | | | Both of these places reference memset-like loops. Memset is precise. Trying to keep these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350044
* [X86] Remove unused variables left after r350041. NFCCraig Topper2018-12-241-6/+0
| | | | llvm-svn: 350043
* [X86] Move the optimization that turns 'CMP (AND+IMM64), 0' into ↵Craig Topper2018-12-242-55/+75
| | | | | | | | SRL/SHL+TEST to X86ISelDAGToDAG. This cleans more code out of EmitTest. llvm-svn: 350041
* [X86] Remove the ANDN check from EmitTest.Craig Topper2018-12-243-46/+67
| | | | | | | | Remove the TESTmr isel patterns and add another postprocessing combine for TESTrr+ANDrm->TESTmr. We already have a postprocessing combine for TESTrr+ANDrr->TESTrr. With this we can give ANDN a chance to match first. And clean it up during post processing if we ended up with just a regular AND. This is another step towards my plan to gut EmitTest and do more flag handling during isel matching or by using optimizeCompare. llvm-svn: 350038
* [X86] Return false from hasAndNotCompare if the comparision value is a constant.Craig Topper2018-12-231-6/+3
| | | | | | We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets. llvm-svn: 350018
* [X86] Fix an old FIXME about folding the zero constant into the OR ↵Craig Topper2018-12-232-7/+6
| | | | | | instruction we use for sequentially consistent fence in 32-bit mode without SSE2. llvm-svn: 350013
* [x86] add load fold patterns for movddup with vzext_loadSanjay Patel2018-12-222-0/+8
| | | | | | | | | | | The missed load folding noticed in D55898 is visible independent of that change either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build vector becomes a broadcast first; movddup is not produced until we get into isel via tablegen patterns). Differential Revision: https://reviews.llvm.org/D55936 llvm-svn: 350005
* [X86] FixupLEAs, reduce number of calls to getOperand and use ↵Craig Topper2018-12-221-22/+29
| | | | | | | | X86::AddrBaseReg/AddrIndexReg, etc. instead of hardcoded constants. Makes the code a little more readable. llvm-svn: 349983
* [NVPTX] Reduce stack size in NVPTXAsmPrinter::doInitialization().Justin Lebar2018-12-221-5/+2
| | | | | | | | NVPTXAsmPrinter::doInitialization() was creating an NVPTXSubtarget on the stack. This object is huge, about 80kb. Also it's slow to create. And it's all redundant; we have one in NVPTXTargetMachine anyway! llvm-svn: 349982
* Silence warning in assert introduced in rL349973.Mircea Trofin2018-12-211-0/+1
| | | | | | | | Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56030 llvm-svn: 349975
* [llvm] API for encoding/decoding DWARF discriminators.Mircea Trofin2018-12-211-15/+25
| | | | | | | | | | | | | | | | | | | Summary: Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling) The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues: - communicates overflow back to the user - supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported. Reviewers: davidxl, danielcdh, wmi, dblaikie Reviewed By: dblaikie Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D55681 llvm-svn: 349973
* [X86] Add isel patterns to match BMI/TBMI instructions when lowering has ↵Craig Topper2018-12-212-2/+62
| | | | | | | | | | | | turned the root nodes into one of the flag producing binops. This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them. I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know. Differential Revision: https://reviews.llvm.org/D55813 llvm-svn: 349962
* [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the ↵Craig Topper2018-12-211-6/+18
| | | | | | | | | | | | | | sign flag is used. The BEXTR instruction documents the SF bit as undefined. The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed. Fixes PR40060 Differential Revision: https://reviews.llvm.org/D55807 llvm-svn: 349956
* AMDGPU: Don't peel of the offset if the resulting base could possibly be ↵Changpeng Fang2018-12-211-3/+7
| | | | | | | | | | | | | | | | | | | negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision: https://reviews.llvm.org/D55568 llvm-svn: 349951
* [x86] add movddup specialization for build vector lowering (PR37502) Sanjay Patel2018-12-211-0/+20
| | | | | | | | | | | | | | This is admittedly a narrow fix for the problem: https://bugs.llvm.org/show_bug.cgi?id=37502 ...but as the XOP restriction shows, it's a maze to get this right. In the motivating example, note that we have movddup before SSE4.1 and again with AVX2. That's because insertps isn't available pre-SSE41 and vbroadcast is (more generally) available with AVX2 (and the splat is reduced to movddup via isel pattern). Differential Revision: https://reviews.llvm.org/D55898 llvm-svn: 349937
* [ARM] Set Defs = [CPSR] for COPY_STRUCT_BYVAL, as it clobbers CPSR.Florian Hahn2018-12-211-1/+1
| | | | | | | | | | | | Fixes PR35023. Reviewers: MatzeB, t.p.northover, sunfish, qcolombet, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D55909 llvm-svn: 349935
* [GlobalISel][AArch64] Add support for widening G_FCEILJessica Paquette2018-12-211-2/+8
| | | | | | | | | | | | This adds support for widening G_FCEIL in LegalizerHelper and AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't support full FP 16. This also updates AArch64/f16-instructions.ll to show that we perform the correct transformation. llvm-svn: 349927
* [AArch64] Refactor Exynos predicate (NFC)Evandro Menezes2018-12-211-4/+3
| | | | | | Change order of conditions in predicate. llvm-svn: 349918
* [XCore] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-8/+4
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349915
* [Sparc] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-2/+2
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349914
* [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-14/+7
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349912
* [WebAssembly] Always use the version of computeKnownBits that returns a ↵Simon Pilgrim2018-12-211-4/+2
| | | | | | | | value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349911
* [ARM] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-8/+5
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349909
* [AArch64] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-212-8/+5
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349908
* [SystemZ] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-212-21/+14
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349906
* [Lanai] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-2/+2
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349905
* [PPC] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-212-15/+9
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349903
* [X86] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-213-28/+18
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version. llvm-svn: 349902
* [Dwarf/AArch64] Return address signing B key dwarf supportLuke Cheeseman2018-12-214-4/+39
| | | | | | | | | | | | | | | | | | | | | | | | | - When signing return addresses with -msign-return-address=<scope>{+<key>}, either the A key instructions or the B key instructions can be used. To correctly authenticate the return address, the unwinder/debugger must know which key was used to sign the return address. - When and exception is thrown or a break point reached, it may be necessary to unwind the stack. To accomplish this, the unwinder/debugger must be able to first authenticate an the return address if it has been signed. - To enable this, the augmentation string of CIEs has been extended to allow inclusion of a 'B' character. Functions that are signed using the B key variant of the instructions should have and FDE whose associated CIE has a 'B' in the augmentation string. - One must also be able to preserve these semantics when first stepping from a high level language into assembly and then, as a second step, into an object file. To achieve this, I have introduced a new assembly directive '.cfi_b_key_frame ', that tells the assembler the current frame uses return address signing with the B key. - This ensures that the FDE is associated with a CIE that has 'B' in the augmentation string. Differential Revision: https://reviews.llvm.org/D51798 llvm-svn: 349895
* [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic ↵Simon Pilgrim2018-12-211-12/+0
| | | | | | | | | | | | intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892
* [WebAssembly] Fix invalid machine instrs in -O0, verify in testsThomas Lively2018-12-211-2/+5
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55956 llvm-svn: 349889
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.voteMatt Arsenault2018-12-211-0/+6
| | | | llvm-svn: 349882
* AMDGPU/GlobalISel: RegBankSelect for some fp opsMatt Arsenault2018-12-212-0/+11
| | | | llvm-svn: 349880
* AMDGPU/GlobalISel: Redo legality for build_vectorMatt Arsenault2018-12-211-10/+38
| | | | | | | | | | It seems better to avoid using the callback if possible since there are coverage assertions which are disabled if this is used. Also fix missing tests. Only test the legal cases since it seems legalization for build_vector is quite lacking. llvm-svn: 349878
* [X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in ↵Craig Topper2018-12-211-75/+34
| | | | | | | | | | X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp. This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes. This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses. llvm-svn: 349868
* [X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses.Craig Topper2018-12-211-0/+3
| | | | | | Found while working on another patch llvm-svn: 349867
* [ARM] Complete the Thumb1 shift+and->shift+shift transforms.Eli Friedman2018-12-202-7/+48
| | | | | | | | | | | | | | This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857
* [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcodeJessica Paquette2018-12-201-0/+1
| | | | | | | | | If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback. Add it to the switch, and add a test verifying that this happens. llvm-svn: 349822
* [MC] [AArch64] Correctly resolve ":abs_g1:3" etc.Eli Friedman2018-12-201-2/+8
| | | | | | | | | | | We have to treat constructs like this as if they were "symbolic", to use the correct codepath to resolve them. This mostly only affects movz etc. because the other uses of classifySymbolRef conservatively treat everything that isn't a constant as if it were a symbol. Differential Revision: https://reviews.llvm.org/D55906 llvm-svn: 349800
* [MC] [AArch64] Support resolving fixups for abs_g0 etc.Eli Friedman2018-12-201-8/+49
| | | | | | | | | | | | | | | | | | | This requires a bit more code than other fixups, to distingush between abs_g0/abs_g1/etc. Actually, I think some of the other fixups are missing some checks, but I won't try to address that here. I haven't seen any real-world code that uses a construct like this, but it clearly should work, and we're considering using it in the implementation of localescape/localrecover on Windows (see https://reviews.llvm.org/D53540). I've verified that binutils produces the same code as llvm-mc for the testcase. This currently doesn't include support for the *_s variants (that requires a bit more work to set the opcode). Differential Revision: https://reviews.llvm.org/D55896 llvm-svn: 349799
* [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift ↵Simon Pilgrim2018-12-201-32/+0
| | | | | | | | | | | | intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: https://reviews.llvm.org/D55937 Differential Revision: https://reviews.llvm.org/D55938 llvm-svn: 349795
* [BPF] Disable relocation for .BTF.ext sectionYonghong Song2018-12-201-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Build llvm with assertion on, and then build bcc against this llvm. Run any bcc tool with debug=8 (turning on -g for clang compilation), you will get the following assertion errors, /home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888: void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed. The .BTF.ext ELF section uses Fixup's to get the instruction offsets. The data width of the Fixup is 4 bytes since we only need the insn offset within the section. This caused the above error though since R_BPF_64_32 expects 4-byte value and the Runtime Dyld tried to resolve the actual insn address which is 8 bytes. Actually the offset within the section is all what we need. Therefore, there is no need to perform any kind of relocation for .BTF.ext section and such relocation will actually cause incorrect result. This patch changed BPFELFObjectWriter::getRelocType() such that for Fixup Kind FK_Data_4, if the relocation Target is a temporary symbol, let us skip the relocation (ELF::R_BPF_NONE). Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 349778
* [Hexagon] Add patterns for funnel shiftsKrzysztof Parzyszek2018-12-202-3/+96
| | | | llvm-svn: 349770
* [RISCV] Properly evaluate fixup_riscv_pcrel_lo12Alex Bradbury2018-12-204-6/+132
| | | | | | | | | | | | | | | | | | | | This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12. Notable changes: Rebased onto trunk Handle and test S-type Test case pcrel-hilo.s is merged into relocations.s D43157 description: VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup pointing to the real target. Differential Revision: https://reviews.llvm.org/D54029 Patch by Chih-Mao Chen and Michael Spencer. llvm-svn: 349764
* [X86][AVX512] Don't custom lower v16i8 rotations.Simon Pilgrim2018-12-201-2/+2
| | | | | | As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI. llvm-svn: 349763
* [SystemZ] "Generic" vector assembler instructions shoud clobber CCUlrich Weigand2018-12-201-9/+14
| | | | | | | | | | | | | There are several vector instructions which may or may not set the condition code register, depending on the value of an argument. For codegen, we use two versions of the instruction, one that sets CC and one that doesn't, which hard-code appropriate values of that argument. But we also have a "generic" version of the instruction that is used for the assembler/disassembler. These generic versions should always be considered to clobber CC just to be safe. llvm-svn: 349761
OpenPOWER on IntegriCloud