bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Add PerfJITEventListener for perf profiling support.	Andres Freund	2018-07-24	5	-1/+529
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new JIT event listener supports generating profiling data for the linux 'perf' profiling tool, allowing it to generate function and instruction level profiles. Currently this functionality is not enabled by default, but must be enabled with LLVM_USE_PERF=yes. Given that the listener has no dependencies, it might be sensible to enable by default once the initial issues have been shaken out. I followed existing precedent in registering the listener by default in lli. Should there be a decision to enable this by default on linux, that should probably be changed. Please note that until https://reviews.llvm.org/D47343 is resolved, using this functionality with mcjit rather than orcjit will not reliably work. Disregarding the previous comment, here's an example: $ cat /tmp/expensive_loop.c bool stupid_isprime(uint64_t num) { if (num == 2) return true; if (num < 1 \|\| num % 2 == 0) return false; for(uint64_t i = 3; i < num / 2; i+= 2) { if (num % i == 0) return false; } return true; } int main(int argc, char **argv) { int numprimes = 0; for (uint64_t num = argc; num < 100000; num++) { if (stupid_isprime(num)) numprimes++; } return numprimes; } $ clang -ggdb -S -c -emit-llvm /tmp/expensive_loop.c -o /tmp/expensive_loop.ll $ perf record -o perf.data -g -k 1 ./bin/lli -jit-kind=mcjit /tmp/expensive_loop.ll 1 $ perf inject --jit -i perf.data -o perf.jit.data $ perf report -i perf.jit.data - 92.59% lli jitted-5881-2.so [.] stupid_isprime stupid_isprime main llvm::MCJIT::runFunction llvm::ExecutionEngine::runFunctionAsMain main __libc_start_main 0x4bf6258d4c544155 + 0.85% lli ld-2.27.so [.] do_lookup_x And line-level annotations also work: │ for(uint64_t i = 3; i < num / 2; i+= 2) { │1 30: movq $0x3,-0x18(%rbp) 0.03 │1 38: mov -0x18(%rbp),%rax 0.03 │ mov -0x10(%rbp),%rcx │ shr $0x1,%rcx 3.63 │ ┌──cmp %rcx,%rax │ ├──jae 6f │ │ if (num % i == 0) 0.03 │ │ mov -0x10(%rbp),%rax │ │ xor %edx,%edx 89.00 │ │ divq -0x18(%rbp) │ │ cmp $0x0,%rdx 0.22 │ │↓ jne 5f │ │ return false; │ │ movb $0x0,-0x1(%rbp) │ │↓ jmp 73 │ │ } 3.22 │1 5f:│↓ jmp 61 │ │ for(uint64_t i = 3; i < num / 2; i+= 2) { Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D44892 llvm-svn: 337789
*	[x86/SLH] Simplify the code for hardening a loaded value. NFC.	Chandler Carruth	2018-07-24	1	-20/+15
\| \| \| \| \| \| \|	This is in preparation for extracting this into a re-usable utility in this code. llvm-svn: 337785
*	[x86/SLH] Remove complex SHRX-based post-load hardening.	Chandler Carruth	2018-07-24	1	-73/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This code was really nasty, had several bugs in it originally, and wasn't carrying its weight. While on Zen we have all 4 ports available for SHRX, on all of the Intel parts with Agner's tables, SHRX can only execute on 2 ports, giving it 1/2 the throughput of OR. Worse, all too often this pattern required two SHRX instructions in a chain, hurting the critical path by a lot. Even if we end up needing to safe/restore EFLAGS, that is no longer so bad. We pay for a uop to save the flag, but we very likely get fusion when it is used by forming a test/jCC pair or something similar. In practice, I don't expect the SHRX to be a significant savings here, so I'd like to avoid the complex code required. We can always resurrect this if/when someone has a specific performance issue addressed by it. llvm-svn: 337781
*	[DWARF] Use deque in place of SmallVector to fix use-after-free issue	Fangrui Song	2018-07-23	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: SmallVector's elements are moved when resizing and cause use-after-free. Reviewers: probinson, dblaikie Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D49702 llvm-svn: 337772
*	Embed a template specialization in a namespace to work around a gcc bug.	Wolfgang Pieb	2018-07-23	1	-0/+2
\| \| \| \|	llvm-svn: 337770
*	[DWARF v5] Refactor range lists dumping by using a more generic way of ↵	Wolfgang Pieb	2018-07-23	4	-190/+132
\| \| \| \| \| \| \| \| \| \| \| \| \|	handling tables of lists. The intent is to use it for location list tables as well. Change is almost NFC with the exception of the spelling of some strings used during dumping (all lowercase now). Reviewer: JDevlieghere Differential Revision: https://reviews.llvm.org/D49500 llvm-svn: 337763
*	[LTO] Handle __imp_ (dllimport) symbols consistently with lld	Teresa Johnson	2018-07-23	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Similar to what lld already does for dllimport symbols which are prefaced with __imp_ (see lld patch r240620), strip off the __imp_ prefix in LTO. Otherwise we can get 2 separate GlobalResolution for a single symbol, the dllimport declaration, and the definition, which leads to incorrect LTO handling. Fixes PR38105. Reviewers: pcc Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49138 llvm-svn: 337762
*	[demangler] call terminate() if allocation failed	Erik Pilkington	2018-07-23	2	-4/+17
\| \| \| \| \| \| \| \| \| \|	We really should set *status to memory_alloc_failure, but we need to refactor the demangler a bit to properly propagate the failure up the stack. Until then, its better to explicitly terminate then rely on a null dereference crash. rdar://31240372 llvm-svn: 337759
*	[MC] Add a separate flag for skipping comdat constant sections for MinGW. NFC.	Martin Storsjo	2018-07-23	2	-3/+13
\| \| \| \| \| \| \| \| \| \| \|	This actually has nothing to do with the associative comdat sections that aren't supported by GNU binutils ld. Clarify the comments from SVN r335918 and use a separate flag for it. Differential Revision: https://reviews.llvm.org/D49645 llvm-svn: 337757
*	[COFF] Fix assembly output of comdat sections without an attached symbol	Martin Storsjo	2018-07-23	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since SVN r335286, the .xdata sections are produced without an attached symbol, which requires using a different syntax when printing assembly output. Instead of the usual syntax of '.section <name>,"dr",discard,<symbol>', use '.section <name>,"dr"' + '.linkonce discard' (which is what GCC uses for all assembly output). This fixes PR38254. Differential Revision: https://reviews.llvm.org/D49651 llvm-svn: 337756
*	[AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes	Martin Storsjo	2018-07-23	2	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755
*	[SelectionDAG] Reduce DanglingDebugInfo memory traffic, NFC	Vedant Kumar	2018-07-23	1	-19/+23
\| \| \| \| \| \| \|	This avoids approx. 2 x 10^5 DenseMap insertions in both non-debug and debug -O2 builds of the sqlite3 amalgamation. llvm-svn: 337751
*	[ThinLTO] Ensure the TargetLibraryInfo is constructed early enough	Teresa Johnson	2018-07-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without this change, the WholeProgramDevirt pass, which requires the TargetLibraryInfo, will construct one from the default triple. Fixes PR38139. Reviewers: pcc Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49278 llvm-svn: 337750
*	[DebugCounters] Keep track of total counts	George Burgess IV	2018-07-23	2	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes debug counters keep track of the total number of times we've called `shouldExecute` for each counter, so it's easier to build automated tooling on top of these. A patch to print these counts is coming soon. Patch by Zhizhou Yang! Differential Revision: https://reviews.llvm.org/D49560 llvm-svn: 337748
*	ConstantFolding: Avoid a crash.	Manoj Gupta	2018-07-23	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Check if the parent basic block and caller exists before calling CS.getCaller when constant folding strip.invariant.group instrinsic. This avoids a crash when the function containing the intrinsic is being inlined. The instruction is checked for any simplifiction but has not yet been added to a basic block. Reviewers: Prazek, rsmith, efriedma Reviewed By: efriedma Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D49690 llvm-svn: 337742
*	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵	Reid Kleckner	2018-07-23	6	-29/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740
*	Fix RegScavenger::unprocess	David Greene	2018-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	RegScavenger::unprocess walks backward, so it should undo the effects of defs before undoing effects of kills. Previously it did things in the opposite order, leaving a register apparently unused (dead) in the case where an instruction both used (killed) and defined a register. Differential Revision: https://reviews.llvm.org/D42200 llvm-svn: 337735
*	[Hexagon] Handle unnamed globals in HexagonConstExpr	Krzysztof Parzyszek	2018-07-23	1	-3/+15
\| \| \| \| \| \|	Instead of comparing names, compare positions in the parent module. llvm-svn: 337723
*	[Demangle] Attempt to fix arena memory leak	Reid Kleckner	2018-07-23	1	-1/+3
\| \| \| \|	llvm-svn: 337720
*	[ARM] Use unique_ptr to fix memory leak introduced in r337701	Fangrui Song	2018-07-23	1	-11/+9
\| \| \| \|	llvm-svn: 337714
*	OpChain has subclasses, so add a virtual destructor.	Jordan Rupprecht	2018-07-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: OpChain has subclasses, so add a virtual destructor. This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701. Reviewers: javed.absar Subscribers: llvm-commits, SjoerdMeijer, samparker Differential Revision: https://reviews.llvm.org/D49681 llvm-svn: 337713
*	[ARM] Follow-up to r337709.	Matt Morehouse	2018-07-23	1	-2/+0
\| \| \| \| \| \|	Fix double-free. llvm-svn: 337711
*	[ARM] Add doFinalization() to ARMCodeGenPrepare pass.	Matt Morehouse	2018-07-23	1	-0/+6
\| \| \| \| \| \| \|	Attempt to fix the leak introduced in r337687 and make sanitizer buildbots green again. llvm-svn: 337709
*	[Legalize] Elide MERGE_VALUES created by scalarizeVectorLoad.	Nirav Dave	2018-07-23	2	-3/+10
\| \| \| \| \| \| \|	scalarizeVectorLoad creates MERGE_VALUES nodes which are immediately decomposed in expandLoad. Elide the node in these cases. llvm-svn: 337708
*	[ARM][NFC] ParallelDSP reorganisation	Sam Parker	2018-07-23	1	-88/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparing to allow ARMParallelDSP pass to parallelise more than smlads, I've restructed some elements: - The ParallelMAC struct has been renamed to BinOpChain. - The BinOpChain struct holds two value lists: LHS and RHS, as well as inheriting from the OpChain base class. - The OpChain struct holds all the values of the represented chain and has had the memory locations functionality inserted into it. - ParallelMACList becomes OpChainList and it now holds pointers instead of objects. Differential Revision: https://reviews.llvm.org/D49020 llvm-svn: 337701
*	[SystemZ] Fix dumpSU() method in SystemZHazardRecognizer.	Jonas Paulsson	2018-07-23	1	-1/+5
\| \| \| \| \| \| \| \|	Two minor issues: The new MCD SchedWrite name does not contain "Unit" like all the others, so a check is needed. Also, print "LSU" instead of "LS". Review: Ulrich Weigand llvm-svn: 337700
*	[FPEnv] Legalize double width StrictFP vector operations	Cameron McInally	2018-07-23	2	-0/+70
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48809 llvm-svn: 337698
*	[ARM] ARMCodeGenPrepare backend pass	Sam Parker	2018-07-23	4	-0/+757
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687
*	[GVN] Don't use the eliminated load as an available value in phi construction	John Brawn	2018-07-23	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \|	In ConstructSSAForLoadSet if an available value is actually the load that we're doing SSA construction to eliminate, then we can omit it as SSAUpdate will add in the value for the phi that will be replacing it anyway. This can result in simpler IR which can allow further optimisation. Differential Revision: https://reviews.llvm.org/D44160 llvm-svn: 337686
*	[MemorySSAUpdater] Update Phi operands after trivial Phi elimination	Alexandros Lamprineas	2018-07-23	1	-15/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Bug fix for PR37445. The underlying problem and its fix are similar to PR37808. The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is computed before the call to tryRemoveTrivialPhi() and it ends up being out of date, pointing to stale data. We have now turned each of the PhiOps into a TrackingVH<MemoryAccess>. Differential Revision: https://reviews.llvm.org/D49425 llvm-svn: 337680
*	[Support] Add a UniqueStringSaver: like StringSaver, but deduplicating.	Sam McCall	2018-07-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Clarify contract of StringSaver (it null-terminates, callers rely on it). Reviewers: hokein Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49596 llvm-svn: 337677
*	[NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on ↵	Roman Lebedev	2018-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	partially written registers. Summary: Pretty mechanical follow-up for D49196. As microarchitecture.pdf notes, "20 AMD Ryzen pipeline", "20.8 Register renaming and out-of-order schedulers": The integer register file has 168 physical registers of 64 bits each. The floating point register file has 160 registers of 128 bits each. "20.14 Partial register access": The processor always keeps the different parts of an integer register together. ... An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it. Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49393 llvm-svn: 337676
*	[GVNHoist] safeToHoistLdSt allows illegal hoisting	Alexandros Lamprineas	2018-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Bug fix for PR36787. When reasoning if it's safe to hoist a load we want to make sure that the defining memory access dominates the new insertion point of the hoisted instruction. safeToHoistLdSt calls firstInBB(InsertionPoint,DefiningAccess) which returns false if InsertionPoint == DefiningAccess, and therefore it falsely thinks it's safe to hoist. Differential Revision: https://reviews.llvm.org/D49555 llvm-svn: 337674
*	[x86/SLH] Fix a bug where we would harden tail calls twice -- once as	Chandler Carruth	2018-07-23	1	-1/+5
\| \| \| \| \| \| \| \| \|	a call, and then again as a return. Also added a comment to try and explain better why we would be doing what we're doing when hardening the (non-call) returns. llvm-svn: 337673
*	[x86/SLH] Rename and comment the main hardening function. NFC.	Chandler Carruth	2018-07-23	1	-4/+21
\| \| \| \| \| \| \| \| \| \|	This provides an overview of the algorithm used to harden specific loads. It also brings this our terminology further in line with hardening rather than checking. Differential Revision: https://reviews.llvm.org/D49583 llvm-svn: 337667
*	Test commit, fix a minor typo.	Jiading Gai	2018-07-22	1	-1/+1
\| \| \| \|	llvm-svn: 337657
*	[X86] Remove the max vector width restriction from combineLoopMAddPattern ↵	Craig Topper	2018-07-22	1	-7/+1
\| \| \| \| \| \| \| \|	and rely splitOpsAndApply to handle splitting. This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency. llvm-svn: 337656
*	[ORE] Move loop invariant ORE checks outside the PM loop.	Xin Tong	2018-07-22	3	-17/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This takes 22ms out of ~20s compiling sqlite3.c because we call it for every unit of compilation and every pass. Reviewers: paquette, anemet Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D49586 llvm-svn: 337654
*	[SelectionDAGBuilder] Use APInt::isZero instead of comparing ↵	Craig Topper	2018-07-22	1	-1/+1
\| \| \| \| \| \| \| \|	APInt::getZExtValue to 0 in a place where we can't be sure contents of the APInt fit in a uint64_t. This is used on an extract vector element index which is most cases is going to be an i32 or i64 and the element will be a valid element number. But it is possible to construct IR with a larger type and large out of range value. llvm-svn: 337652
*	[SelectionDAGBuilder] Restrict vector reduction check to types with a power ↵	Craig Topper	2018-07-22	1	-0/+4
\| \| \| \| \| \| \| \|	of 2 number of elements. The check for the shuffles usages probably isn't correct for non power of 2 vectors. llvm-svn: 337651
*	[mips] Factor out register class selection for global base register. NFC	Simon Atanasyan	2018-07-21	1	-18/+20
\| \| \| \| \| \| \|	Factor out register class selection for global base register into a separate function to escape long chain of ternary operators. llvm-svn: 337647
*	[mips] Move out the WrapperPat declaration from the NotInMicroMips predicate	Simon Atanasyan	2018-07-21	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up to the rL335185. Those commit adds some WrapperPat patterns for microMIPS target. But declaration of the WrapperPat class is under the NotInMicroMips predicate and microMIPS patterns cannot be selected because predicate (Subtarget->inMicroMipsMode()) && (!Subtarget->inMicroMipsMode()) is always false. This change move out the WrapperPat class declaration from the NotInMicroMips predicate and enables microMIPS WrapperPat patterns. Differential revision: https://reviews.llvm.org/D49533 llvm-svn: 337646
*	Early exit with cheaper checks	Aditya Kumar	2018-07-21	1	-13/+12
\| \| \| \| \| \| \|	Reviewers: sebpop,davide,fhahn,trentxintong Differential Revision: https://reviews.llvm.org/D49617 llvm-svn: 337643
*	[InstrSimplify] fold sdiv if two operands are negated and non-overflow	Chen Zheng	2018-07-21	2	-9/+17
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49382 llvm-svn: 337642
*	[ORC] Re-apply r336760 with fixes.	Lang Hames	2018-07-21	3	-4/+38
\| \| \| \|	llvm-svn: 337637
*	Re-apply r337595 with fix for LLVM_ENABLE_THREADS=Off.	Lang Hames	2018-07-20	5	-281/+518
\| \| \| \|	llvm-svn: 337626
*	Change the cap on the amount of padding for each vtable to 32-byte ↵	Peter Collingbourne	2018-07-20	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	(previously it was 128-byte) We tested different cap values with a recent commit of Chromium. Our results show that the 32-byte cap yields the smallest binary and all the caps yield similar performance. Based on the results, we propose to change the cap value to 32-byte. Patch by Zhaomo Yang! Differential Revision: https://reviews.llvm.org/D49405 llvm-svn: 337622
*	AMDGPU: Use existing function to check for VGPRs	Matt Arsenault	2018-07-20	1	-16/+7
\| \| \| \|	llvm-svn: 337621
*	Revert "[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use ↵	Benjamin Kramer	2018-07-20	2	-48/+17
\| \| \| \| \| \| \| \|	SimplifyDemandedVectorElts" This reverts commit r337547. It triggers an infinite loop. llvm-svn: 337617
*	[COFF] Use symbolic constants instead of hardcoded numbers. NFCI.	Martin Storsjo	2018-07-20	1	-1/+6
\| \| \| \| \| \|	Patch by Martell Malone. llvm-svn: 337614