summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [aarch64][globalisel] Fix a crash in selectAddrModeIndexed() caused by ↵Daniel Sanders2017-10-161-1/+5
| | | | | | | | | | incorrect G_FRAME_INDEX handling The wrong operand was being rendered to the result instruction. The crash was detected by Bitcode/simd_ops/AArch64_halide_runtime.bc llvm-svn: 315890
* bpf: fix bug on silently truncating 64-bit immediateYonghong Song2017-10-162-3/+7
| | | | | | | | | | | | | | | | We came across an llvm bug when compiling some testcases that 64-bit immediates are silently truncated into 32-bit and then packed into BPF_JMP | BPF_K encoding. This caused comparison with wrong value. This bug looks to be introduced by r308080. The Select_Ri pattern is supposed to be lowered into J*_Ri while the latter only support 32-bit immediate encoding, therefore Select_Ri should have similar immediate predicate check as what J*_Ri are doing. Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 315889
* [PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extendedHiroshi Inoue2017-10-166-0/+506
| | | | | | | | | | | | | | | | | | This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass. If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated. One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated. void int_func(int); void ii_test(int a) { if (a & 1) return int_func(a); } Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG. Differential Revision: https://reviews.llvm.org/D31319 llvm-svn: 315888
* Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement ↵Daniel Sanders2017-10-162-0/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. The previous commit failed on MSVC due to a failure to convert an initializer_list to a std::vector. Hopefully, MSVC will accept this version. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315887
* Revert r315885: [globalisel][tblgen] Add support for iPTR and implement ↵Daniel Sanders2017-10-162-158/+0
| | | | | | | | am_unscaled* and am_indexed* MSVC doesn't like one of the constructors. llvm-svn: 315886
* [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and ↵Daniel Sanders2017-10-162-0/+158
| | | | | | | | | | | | | | | | | | | | | | | | am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315885
* [Hexagon] Add LLVM_ATTRIBUTE_UNUSED to operator<<, NFCKrzysztof Parzyszek2017-10-161-0/+8
| | | | | | This should silence "unused function" warnings. llvm-svn: 315883
* Re-commit r315863: [globalisel][tablegen] Import ComplexPattern when used as ↵Daniel Sanders2017-10-151-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | an operator Summary: It's possible for a ComplexPattern to be used as an operator in a match pattern. This is used by the load/store patterns in AArch64 to name the suboperands returned by ComplexPattern predicate so that they can be broken apart and referenced independently in the result pattern. This patch adds support for this in order to enable the import of load/store patterns. Depends on D37445 Hopefully fixed the ambiguous constructor that a large number of bots reported. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37456 llvm-svn: 315869
* Revert r315863: [globalisel][tablegen] Import ComplexPattern when used as an ↵Daniel Sanders2017-10-151-8/+5
| | | | | | | | operator A large number of bots are failing on an ambiguous constructor call. llvm-svn: 315866
* [globalisel][tablegen] Import ComplexPattern when used as an operatorDaniel Sanders2017-10-151-5/+8
| | | | | | | | | | | | | | | | | | | | | | | Summary: It's possible for a ComplexPattern to be used as an operator in a match pattern. This is used by the load/store patterns in AArch64 to name the suboperands returned by ComplexPattern predicate so that they can be broken apart and referenced independently in the result pattern. This patch adds support for this in order to enable the import of load/store patterns. Depends on D37445 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37456 llvm-svn: 315863
* [X86] Remove the SlowBTMem feature flag entirelyCraig Topper2017-10-154-66/+31
| | | | | | Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments. llvm-svn: 315862
* [AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom ↵Craig Topper2017-10-152-59/+15
| | | | | | | | | | | | | | | | | | | lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860
* [X86] Add FeatureSlowBTMem to Haswell, Broadwell, Skylake, Cannonlake, and ↵Craig Topper2017-10-151-6/+13
| | | | | | | | | | | | | | | | Knights Landing CPUs. Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too. Reviewers: RKSimon, zvi, gadi.haber Reviewed By: gadi.haber Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38890 llvm-svn: 315859
* Reverting r315590; it did not include changes for llvm-tblgen, which is ↵Aaron Ballman2017-10-1511-13/+13
| | | | | | | | causing link errors for several people. Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1 llvm-svn: 315854
* [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve ↵Amjad Aboud2017-10-151-0/+31
| | | | | | | | PR34565 Differential Revision: https://reviews.llvm.org/D38359 llvm-svn: 315851
* [X86] Lower vselect with constant condition to vector_shuffle even with ↵Craig Topper2017-10-151-5/+5
| | | | | | | | | | | | | | | | | | | AVX512 instructions. Summary: It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register. It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day. Reviewers: RKSimon, delena, zvi Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38932 llvm-svn: 315849
* Remove unused variablesVitaly Buka2017-10-155-5/+2
| | | | llvm-svn: 315847
* [Hexagon] Mark RangeTree::dump() with LLVM_DUMP_METHOD.Davide Italiano2017-10-141-1/+1
| | | | | | | GCC otherwise emits a "defined but not used" warning on the member function. llvm-svn: 315838
* AMDGPU: Don't use TargetStreamer if it has not been initializedKonstantin Zhuravlyov2017-10-142-10/+16
| | | | | | | | | | Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl test after r315808 We may hit few other similar issues, but I want to discuss good solution offline. llvm-svn: 315830
* [X86][SSE] Don't attempt to reduce the imul vector width of odd sized ↵Simon Pilgrim2017-10-141-1/+4
| | | | | | vectors (PR34947) llvm-svn: 315825
* Revert "[AArch64][RegisterBankInfo] Use the statically computed mappings for ↵Bruno Cardoso Lopes2017-10-141-32/+4
| | | | | | | | | COPY" This reverts commit r315781, breaks: http://green.lab.llvm.org/green/job/Compiler_Verifiers_GlobalISEL/9882 llvm-svn: 315823
* AMDGPU: Bring HSA metadata on par with the specificationKonstantin Zhuravlyov2017-10-144-71/+93
| | | | | | Differential Revision: https://reviews.llvm.org/D38753 llvm-svn: 315821
* Pull out repeated calls to VT.getVectorNumElements(). NFCI.Simon Pilgrim2017-10-141-10/+11
| | | | llvm-svn: 315818
* Use DAG::getBitcast() helper. NFCI.Simon Pilgrim2017-10-141-4/+4
| | | | llvm-svn: 315815
* AMDGPU: Improve note directive verification in assemblerKonstantin Zhuravlyov2017-10-141-1/+19
| | | | | | | | | | - Do not allow amd_amdgpu_isa directives on non-amdgcn architectures - Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes - Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes Differential Revision: https://reviews.llvm.org/D38750 llvm-svn: 315812
* AMDGPU: Do not emit deprecated notes for code object v3Konstantin Zhuravlyov2017-10-146-11/+40
| | | | | | Differential Revision: https://reviews.llvm.org/D38749 llvm-svn: 315810
* AMDGPU: Add support for isa version noteKonstantin Zhuravlyov2017-10-146-10/+97
| | | | | | | | | | - Emit NT_AMD_AMDGPU_ISA - Add assembler parsing for isa version directive - If isa version directive does not match command line arguments, then return error Differential Revision: https://reviews.llvm.org/D38748 llvm-svn: 315808
* [X86][SSE] Support combining AND(EXTRACT(SHUF(X)), C) -> EXTRACT(SHUF(X))Simon Pilgrim2017-10-141-0/+39
| | | | | | | | If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle. Fixes the last issue with PR22415 llvm-svn: 315807
* [X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load.Craig Topper2017-10-142-1/+19
| | | | llvm-svn: 315802
* [X86] Add AVX512 versions of VCVTPD2PS to load folding tables.Craig Topper2017-10-141-0/+3
| | | | llvm-svn: 315801
* [X86] Add patterns for vzmovl+cvtpd2ps with a load.Craig Topper2017-10-142-12/+24
| | | | llvm-svn: 315800
* [X86] Remove some patterns for bitcasted alignednonedtemporalloads.Craig Topper2017-10-141-18/+0
| | | | | | These select the same instruction as the non-bitcasted pattern. So this provides no additional value. llvm-svn: 315799
* [X86] Remove unnecessary bitconverts as the root of patterns for zero ↵Craig Topper2017-10-141-4/+4
| | | | | | | | extended VCVTPD2UDQZ128rr and VCVTTPD2UDQZ128rr. We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions. llvm-svn: 315798
* [X86] Add additional patterns for folding loads with 128-bit VCVTDQ2PD and ↵Craig Topper2017-10-141-0/+10
| | | | | | | | | | VCVTUDQ2PD. This matches the patterns we have for the SSE/AVX version. This is a prerequisite for D38714. llvm-svn: 315797
* [X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding ↵Craig Topper2017-10-141-0/+6
| | | | | | tables. llvm-svn: 315796
* [X86] Remove TB_NO_REVERSE from VCVTDQ2PDYrr and VCVTPS2PDYrr in the load ↵Craig Topper2017-10-141-2/+2
| | | | | | | | folding tables. I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true. llvm-svn: 315795
* [X86] Add an additional isel pattern to CVTDQ2PDrm/VCVTDQ2PDrm to enable ↵Craig Topper2017-10-141-2/+6
| | | | | | | | load folding without the peephole pass. This pattern is already used in AVX512VL version of these instructions. Though AVX512VL version is missing other patterns. llvm-svn: 315794
* [AArch64][RegisterBankInfo] Use the statically computed mappings for COPYQuentin Colombet2017-10-141-4/+32
| | | | | | | | | | | | | | | | | | | | We use to resort on the generic implementation to get the mappings for COPYs. The generic implementation resorts on table lookup and dynamically allocated objects to get the valid mappings. Given we already know how to map G_BITCAST and have the static mappings for them, use that code path for COPY as well. This is much more efficient. Improve the compile time of RegBankSelect by up to 20%. Note: When we eventually generate all the mappings via TableGen, we wouldn't have to do that dance to shave compile time. The intent of this change was to make sure that moving to static structure really pays off. NFC. llvm-svn: 315781
* Revert r315763: "[Hexagon] Rangify some loops, NFC"Krzysztof Parzyszek2017-10-132-26/+44
| | | | | | Broke some builds (using libstdc++). llvm-svn: 315769
* [X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is ↵Craig Topper2017-10-133-17/+27
| | | | | | | | | | | | | | | | available This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations. For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP. We may be able to use this for AVX1 as well which would allow us to remove more isel patterns. I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP. Differential Revision: https://reviews.llvm.org/D38836 llvm-svn: 315768
* [Hexagon] Rangify some loops, NFCKrzysztof Parzyszek2017-10-132-44/+26
| | | | llvm-svn: 315763
* [globalisel][tablegen] Add support for fpimm and import of APInt/APFloat ↵Daniel Sanders2017-10-132-8/+13
| | | | | | | | | | | | | | | | | | | | | | based ImmLeaf. Summary: There's only a tablegen testcase for IntImmLeaf and not a CodeGen one because the relevant rules are rejected for other reasons at the moment. On AArch64, it's because there's an SDNodeXForm attached to the operand. On X86, it's because the rule either emits multiple instructions or has another predicate using PatFrag which cannot easily be supported at the same time. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36569 llvm-svn: 315761
* AMDGPU: Implement hasBitPreservingFPLogicMatt Arsenault2017-10-132-0/+6
| | | | llvm-svn: 315754
* [Hexagon] Avoid unused variable warnings in release builds.Benjamin Kramer2017-10-131-0/+4
| | | | | | No functionality change intended. llvm-svn: 315749
* AMDGPU: Look for src mods before fp_extendMatt Arsenault2017-10-131-1/+17
| | | | | | | When selecting modifiers for mad_mix instructions, look at fneg/fabs that occur before the conversion. llvm-svn: 315748
* [aarch64] Support APInt and APFloat in ImmLeaf subclasses and make AArch64 ↵Daniel Sanders2017-10-131-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use them. Summary: The purpose of this patch is to expose more information about ImmLeaf-like PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf could only be used to test int64_t's produced by sign-extending an APInt. Other tests on immediates had to use the generic PatLeaf and extract the constant using C++. With this patch, tablegen will know how to generate predicates for APInt, and APFloat. This will allow it to 'do the right thing' for both SelectionDAG and GlobalISel which require different methods of extracting the immediate from the IR. This is NFC for SelectionDAG since the new code is equivalent to the previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1 for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new subclasses will require a significant re-factor of FastISel. For GlobalISel, it's currently NFC because the relevant code to import the affected rules is not yet present. This will be added in a later patch. Depends on D36086 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36534 llvm-svn: 315747
* AMDGPU: Implement isFPExtFoldableMatt Arsenault2017-10-132-0/+12
| | | | | | This helps match v_mad_mix* in some cases. llvm-svn: 315744
* DAG: Add opcode and source type to isFPExtFreeMatt Arsenault2017-10-132-3/+4
| | | | | | | | This is only currently used for mad/fma transforms. This is the only case where it should be used for AMDGPU, so add an opcode to be sure. llvm-svn: 315740
* [Hexagon] Minimize number of repeated constant extendersKrzysztof Parzyszek2017-10-133-0/+1863
| | | | | | | | | | | | | | Each constant extender requires an extra instruction, which adds to the code size and also reduces the number of available slots in an instruction packet. In most cases, the value of a repeated constant extender could be loaded into a register, and the instructions using the extender could be replaced with their counterparts that use that register instead. This patch adds a pass that tries to reduce the number of constant extenders, including extenders which differ only in an immediate offset known at compile time, e.g. @global and @global+12. llvm-svn: 315735
* [X86] Add initial skeleton support for knm cpuCraig Topper2017-10-131-5/+16
| | | | | | | | This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it. Differential Revision: https://reviews.llvm.org/D38811 llvm-svn: 315722
OpenPOWER on IntegriCloud