summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* [LoopInterchange] Loops with empty dependency matrix are safe.Florian Hahn2018-02-261-3/+0
| | | | | | | | | | | | | | | | | | | | The dependency matrix is only empty if no conflicting load/store instructions have been found. In that case, it is safe to interchange. For the LLVM test-suite, after this change around 1900 loops are interchanged, whereas it is 15 before this change. On cortex-a57, this gives an improvement of -0.57% on the geomean execution time of SPEC2006, SPEC2000 and the test-suite. There are a few small perf regressions, but I think we can improve on those by making the cost model better. Reviewers: karthikthecool, mcrosier Reviewed by: karthikthecool Differential Revision: https://reviews.llvm.org/D43236 llvm-svn: 326077
* The final step to close D41278 [MachineCombiner] Improve debug output (NFC).Andrew V. Tischenko2018-02-262-4/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D41278 llvm-svn: 326074
* [SCEV] Factor out getUsedLoopsSerguei Katkov2018-02-262-4/+17
| | | | | | | | | | | | | | | The patch introduces the new function in ScalarEvolution to get all loops used in specified SCEV. This is a preparation for re-writing isKnownPredicate utility as described in https://reviews.llvm.org/D42417. Reviewers: sanjoy, mkazantsev, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43504 llvm-svn: 326072
* [SCEV] Introduce SCEVPostIncRewriterSerguei Katkov2018-02-261-0/+41
| | | | | | | | | | | | | | | | The patch introduces the SCEVPostIncRewriter rewriter which is similar to SCEVInitRewriter but rewrites AddRec with post increment value of this AddRec. This is a preparation for re-writing isKnownPredicate utility as described in https://reviews.llvm.org/D42417. Reviewers: sanjoy, mkazantsev, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43499 llvm-svn: 326071
* [XCore] Return true in enableMultipleCopyHints().Jonas Paulsson2018-02-262-3/+5
| | | | | | | | | | Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Robert Lytton llvm-svn: 326069
* [X86] Add avx1 command line to madd.ll to show splitting and concatenating ↵Craig Topper2018-02-261-75/+421
| | | | | | 256-bit operations. llvm-svn: 326068
* [SCEV] Extends the SCEVInitRewriterSerguei Katkov2018-02-261-8/+20
| | | | | | | | | | | | | | | | | The patch introduces an additional parameter IgnoreOtherLoops to SCEVInitRewriter. if it is equal to true then rewriter will not invalidate result in case SCEV depends on other loops then specified during creation. The patch does not change the default behavior. This is a preparation for re-writing isKnownPredicate utility as described in https://reviews.llvm.org/D42417. Reviewers: sanjoy, mkazantsev, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43498 llvm-svn: 326067
* [X86] Don't use getZExtValue when we have no idea how large the input ↵Craig Topper2018-02-262-2/+1051
| | | | | | elements are. llvm-svn: 326066
* [X86] Use SelectionDAG::SplitVectorOperand to simplify some code. NFCCraig Topper2018-02-261-6/+2
| | | | llvm-svn: 326065
* [X86] Simplify the ReplaceNodeResults code for X86ISD::AVG.Craig Topper2018-02-261-13/+7
| | | | | | This code seemed to try to widen to 128, 256, or 512 bit vectors, but we only create X86ISD::AVG with a power of 2 number of elements. This means the only nodes that need to be legalized are less than 128-bits and need to be widened up to 128 bits. llvm-svn: 326064
* [X86] Remove VT.isSimple() check from detectAVGPattern.Craig Topper2018-02-263-1/+549
| | | | | | Which types are considered 'simple' is a function of the requirements of all targets that LLVM supports. That shouldn't directly affect what types we are able to handle. The remainder of this code checks that the number of elements is a power of 2 and takes care of splitting down to a legal size. llvm-svn: 326063
* TableGen: Remove VarInit::getFieldTypeNicolai Haehnle2018-02-252-9/+0
| | | | | | | | | | | | | | It is redundant with the implementation in TypedInit. Change-Id: I8ab1fb5c77e4923f7eb3ffae5889f0f8af6093b4 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D43678 llvm-svn: 326061
* TableGen: Get rid of Init::getFieldInitNicolai Haehnle2018-02-252-36/+6
| | | | | | | | | | | | | | | | | | | | Summary: FieldInit will just rely on the standardized resolving mechanism to give us DefInits for folding, thus simplifying the code. Unlike the removal of resolveListElementReference, this shouldn't have performance implications, because DefInits do not recurse inside their record. Change-Id: Id4544c774c9d9ee92f293615af6ecff706453f21 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D43563 llvm-svn: 326060
* TableGen: Remove Init::resolveListElementReferenceNicolai Haehnle2018-02-252-170/+9
| | | | | | | | | | | | | | | | | | | | | | Summary: Resolving a VarListElementInit should just resolve the list and then take its element. This eliminates a lot of duplicated logic and simplifies the next steps of refactoring resolveReferences. This does potentially cause sub-elements of the entire list to be resolved resulting in more work, but I didn't notice a measurable change in performance, and a later patch adds a caching mechanism that covers at least the common case of `var[i]` in a more generic way. Change-Id: I7b59185b855c7368585c329c31e5be38c5749dac Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D43562 llvm-svn: 326059
* [DebugInfo] Stable sort symbols to remove non-deterministic orderingMandeep Singh Grang2018-02-251-1/+1
| | | | | | | | | | | | | | | | Summary: This fixes failure in DebugInfo/X86/multiple-aranges.ll uncovered by D39245. Reviewers: rafael, echristo, probinson Reviewed By: probinson Subscribers: probinson, llvm-commits, JDevlieghere Tags: #debug-info Differential Revision: https://reviews.llvm.org/D39950 llvm-svn: 326056
* [InstSimplify] Add test cases for removal of vector fabs on known positive.Craig Topper2018-02-251-0/+118
| | | | llvm-svn: 326050
* [InstSimplify] Remove unused parameter from test cases.Craig Topper2018-02-251-7/+7
| | | | llvm-svn: 326049
* [X86] Use SDNode instead of SDPatternOperator. NFCCraig Topper2018-02-251-7/+7
| | | | llvm-svn: 326048
* [TargetLowering] SimplifyDemandedVectorElts - pass demanded elts through ↵Simon Pilgrim2018-02-242-1/+15
| | | | | | ADD/SUB ops llvm-svn: 326044
* [TargetLowering] SimplifyDemandedVectorElts - pass demanded elts through ↵Simon Pilgrim2018-02-242-22/+10
| | | | | | TRUNCATE ops llvm-svn: 326043
* [X86] Add cvt tests to avx512vl-intrinsics-fast-isel.llCraig Topper2018-02-241-0/+1312
| | | | llvm-svn: 326042
* [X86] Allow int_x86_sse2_cvtps2dq and int_x86_avx_cvt_ps2dq_256 to select ↵Craig Topper2018-02-246-16/+30
| | | | | | EVEX encoded instructions. llvm-svn: 326041
* [X86] Remove GCCBuiltin from some intrinsics that are no longer used by clang.Craig Topper2018-02-241-18/+9
| | | | llvm-svn: 326040
* Revert "StructurizeCFG: Test for branch divergence correctly"Adam Nemet2018-02-244-103/+6
| | | | | | | | This reverts commit r325881. Breaks many bots llvm-svn: 326037
* [DebugInfo] Fix buildbot failure on non-X86 targetsScott Linder2018-02-241-0/+1
| | | | llvm-svn: 326035
* [X86][SSE] combineSubToSubus - support v8i64 handling from SSSE3Simon Pilgrim2018-02-242-242/+164
| | | | | | Our UMIN/UMAX, vector truncation and shuffle combining is good enough to efficiently handle v8i64 with the number of leading zeros that are necessary for PSUBUS. llvm-svn: 326034
* [X86][SSE] combineSubToSubus - support v8i32 handling from SSSE3 (not SSE41)Simon Pilgrim2018-02-242-80/+57
| | | | | | Now that UMIN etc are Legal/Custom for SSE2+, we can efficiently match SUBUS v8i32 cases from SSSE3 which can perform efficient truncation with PSHUFB. llvm-svn: 326033
* [X86][SSE] combineSubToSubus - begun generalizing to work with any type ↵Simon Pilgrim2018-02-242-18/+25
| | | | | | sizes with SplitBinaryOpsAndApply llvm-svn: 326030
* Fix spelling in comment. NFCI.Simon Pilgrim2018-02-241-1/+1
| | | | llvm-svn: 326029
* [Sparc] Return true in enableMultipleCopyHints().Jonas Paulsson2018-02-244-33/+25
| | | | | | | | | | Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: James Y Knight llvm-svn: 326028
* [X86] Remove GCCBuiltin from some intrinsics that are no longer used by clang.Craig Topper2018-02-241-2/+2
| | | | llvm-svn: 326026
* [X86] Use SelectionDAG::getNot instead of implementing manually. NFCCraig Topper2018-02-241-2/+1
| | | | llvm-svn: 326020
* [AMDGPU] Shrinking V_SUBBREV_U32Stanislav Mekhanoshin2018-02-243-7/+8
| | | | | | | | | | V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when we try to commute V_SUBB_U32 in order to shrink it we do not then process V_SUBBREV_U32 and it stay VOP3. This is fixed. Differential Revision: https://reviews.llvm.org/D43699 llvm-svn: 326011
* Fix build breakage from r326003Pavel Labath2018-02-242-3/+2
| | | | | | | | | | - an ambiguous reference to Optional<T> in llvm-dwarfdump.cpp (fixed with an explicit prefix). - a missing base class initialization in Entry copy constructor (fixed by using the implicitly default constructor, which is possible after some changes which were done during review). llvm-svn: 326006
* [llvm-objcopy] Fix typo in setSymTabAlexander Shaposhnikov2018-02-241-2/+3
| | | | | | | | | | | | This diff fixes the name of the argument of setSymTab and makes setSymTab/setStrTab private (to make the public interface a bit cleaner). Test plan: make check-all Differential revision: https://reviews.llvm.org/D43661 llvm-svn: 326005
* [WebAssembly] Add exception handling option and featureHeejin Ahn2018-02-2410-10/+30
| | | | | | | | | | | | | | Summary: Add a llc command line option and WebAssembly architecture feature for exception handling. Reviewers: dschuff Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D43683 llvm-svn: 326004
* Implement equal_range for the DWARF v5 accelerator tablePavel Labath2018-02-244-56/+540
| | | | | | | | | | | | | | | | | | | Summary: This patch implements the name lookup functionality of the .debug_names accelerator table and hooks it up to "llvm-dwarfdump -find". To make the interface of the two kinds of accelerator tables more consistent, I've created an abstract "DWARFAcceleratorTable::Entry" class, which provides a consistent interface to access the common functionality of the table entries (such as getting the die offset, die tag, etc.). I've also modified the apple table to vend entries conforming to this interface. Reviewers: JDevlieghere, aprantl, probinson, dblaikie Subscribers: vleschuk, clayborg, echristo, llvm-commits Differential Revision: https://reviews.llvm.org/D43067 llvm-svn: 326003
* [MemorySSA] Remove a redundant dyn_cast.George Burgess IV2018-02-241-3/+2
| | | | | | StartingAccess is a MemoryUseOrDef. No need to check again. llvm-svn: 326000
* [X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar ↵Craig Topper2018-02-242-14/+8
| | | | | | | | masked move patterns. This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from. llvm-svn: 325999
* [AMDGPU] Fixed madak.ll test on VI, added GFX10. NFC.Stanislav Mekhanoshin2018-02-231-33/+45
| | | | llvm-svn: 325995
* bpf: New disassembler testcases for 32-bit subregister supportYonghong Song2018-02-233-7/+40
| | | | | | | | | | | | This patch test disassembler output for load/store instructions when -mattr=+alu32 specified for which we want to use "w" register format. Also, this patch extended the existing insn-unit.s and insn-unit-32.s to make sure disassemblers for all other instructions are not affected. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325993
* bpf: New codegen testcases for 32-bit subregister supportYonghong Song2018-02-234-0/+541
| | | | | | | | | | This patch adds some unit tests for 32-bit subregister support. We want to make sure ALU32, subregister load/store and new peephole optimization are truely enabled once -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325992
* bpf: New optimization pass for eliminating unnecessary i32 promotionsYonghong Song2018-02-234-0/+197
| | | | | | | | | | | | | | | | | | This pass performs peephole optimizations to cleanup ugly code sequences at MachineInstruction layer. Currently, the only optimization in this pass is to eliminate type promotion sequences for zero extending 32-bit subregisters to 64-bit registers. If the compiler could prove the zero extended source come from 32-bit subregistere then it is safe to erase those promotion sequece, because the upper half of the underlying 64-bit registers were zeroed implicitly already. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325991
* bpf: New decoder namespace for 32-bit subregister load/storeYonghong Song2018-02-231-2/+43
| | | | | | | | | | | | | | | | | | | | | | | When -mattr=+alu32 passed to the disassembler, use decoder namespace for 32-bit subregister. This is to disassemble load and store instructions in preferred B format as described in previous commit: w = *(u8 *) (r + off) // BPF_LDX | BPF_B w = *(u16 *)(r + off) // BPF_LDX | BPF_H w = *(u32 *)(r + off) // BPF_LDX | BPF_W *(u8 *) (r + off) = w // BPF_STX | BPF_B *(u16 *)(r + off) = w // BPF_STX | BPF_H *(u32 *)(r + off) = w // BPF_STX | BPF_W NOTE: all other instructions should still use the default decoder namespace. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325990
* bpf: Enable 32-bit subregister support for -mattr=+alu32Yonghong Song2018-02-231-0/+2
| | | | | | | | | After all those preparation patches, now we could enable 32-bit subregister support once -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325989
* bpf: Support 32-bit subregister in various InstrInfo hooksYonghong Song2018-02-231-0/+10
| | | | | | | | | This patch support 32-bit subregister in three InstrInfo hooks, i.e. copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot, Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325988
* bpf: New instruction patterns for 32-bit subregister load and storeYonghong Song2018-02-232-10/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The instruction mapping between eBPF/arm64/x86_64 are: eBPF arm64 x86_64 LD1 BPF_LDX | BPF_B ldrb movzbl LD2 BPF_LDX | BPF_H ldrh movzwl LD4 BPF_LDX | BPF_W ldr movl movzbl/movzwl/movl on x86_64 accept 32-bit sub-register, for example %eax, the same for ldrb/ldrh on arm64 which accept 32-bit "w" register. And actually these instructions only accept sub-registers. There is no point to have LD1/2/4 (unsigned) for 64-bit register, because on these arches, upper 32-bits are guaranteed to be zeroed by hardware or VM, so load into the smallest available register class is the best choice for maintaining type information. For eBPF we should adopt the same philosophy, to change current format (A): r = *(u8 *) (r + off) // BPF_LDX | BPF_B r = *(u16 *)(r + off) // BPF_LDX | BPF_H r = *(u32 *)(r + off) // BPF_LDX | BPF_W *(u8 *) (r + off) = r // BPF_STX | BPF_B *(u16 *)(r + off) = r // BPF_STX | BPF_H *(u32 *)(r + off) = r // BPF_STX | BPF_W into B: w = *(u8 *) (r + off) // BPF_LDX | BPF_B w = *(u16 *)(r + off) // BPF_LDX | BPF_H w = *(u32 *)(r + off) // BPF_LDX | BPF_W *(u8 *) (r + off) = w // BPF_STX | BPF_B *(u16 *)(r + off) = w // BPF_STX | BPF_H *(u32 *)(r + off) = w // BPF_STX | BPF_W There is no change on encoding nor how should they be interpreted, everything is as it is, load the specified length, write into low bits of the register then zeroing all remaining high bits. The only change is their associated register class and how compiler view them. Format A still need to be kept, because eBPF LLVM backend doesn't support sub-registers at default, but once 32-bit subregister is enabled, it should use format B. This patch implemented this together with all those necessary extended load and truncated store patterns. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325987
* bpf: Support i32 in getScalarShiftAmountTy methodYonghong Song2018-02-232-0/+7
| | | | | | | | | | getScalarShiftAmount method should be implemented for eBPF backend to make sure shift amount could still get correct type once 32-bit subregisters support are enabled. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325986
* bpf: Support condition comparison on i32Yonghong Song2018-02-233-27/+126
| | | | | | | | | | | | | | | | | | | We need to support condition comparison on i32. All these comparisons are supposed to be combined into BPF_J* instructions which only support i64. For ISD::BR_CC we need to promote it to i64 first, then do custom lowering. For ISD::SET_CC, just expand to SELECT_CC like what's been done for i64. For ISD::SELECT_CC, we also want to do custom lower for i32. However, after 32-bit subregister support enabled, it is possible the comparison operands are i32 while the selected value are i64, or the comparison operands are i64 while the selected value are i32. We need to define extra instruction pattern and support them in custom instruction inserter. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325985
* bpf: Handle i32 for ALU operations without ISA supportYonghong Song2018-02-231-21/+26
| | | | | | | | | | | | There is no eBPF ISA support for BSWAP, ROTR, ROTL, SREM, SDIVREM, MULHU, ADDC, ADDE etc on i32. They could be emulated by other basic BPF_ALU operations, we'd set their lowering action the same as i64. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325984
OpenPOWER on IntegriCloud