bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[CGP] Split some critical edges coming out of indirect branches	Michael Kuperstein	2017-02-24	3	-13/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060
*	[NVPTX] Added support for .f16x2 instructions.	Artem Belevich	2017-02-23	4	-36/+1629
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032
*	ARM: make sure FastISel bails on f64 operations for Cortex-M4.	Tim Northover	2017-02-23	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \|	FastISel wasn't checking the isFPOnlySP subtarget feature before emitting double-precision operations, so it got completely invalid CodeGen for doubles on Cortex-M4F. The normal ISel testing wasn't spectacular either so I added a second RUN line to improve that while I was in the area. llvm-svn: 296031
*	[Hexagon] Handle saturations in Hexagon bit tracker	Krzysztof Parzyszek	2017-02-23	1	-0/+57
\| \| \| \|	llvm-svn: 296026
*	Disable TLS for stack protector on Android API<17.	Evgeniy Stepanov	2017-02-23	1	-14/+26
\| \| \| \| \| \|	The TLS slot did not exist back then. llvm-svn: 296014
*	[GlobalISel] Emit opt remarks on isel fallbacks.	Ahmed Bougacha	2017-02-23	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Having more fine-grained information on the specific construct that caused us to fallback is valuable for large-scale data collection. We still have the fallback warning, that's also used for FastISel. We still need to remove the fallback warning, and teach FastISel to also emit remarks (it currently has a combination of the warning, stats, and debug prints: the remarks could unify all three). The abort-on-fallback path could also be better handled using remarks: one could imagine a "-Rpass-error", analoguous to "-Werror", which would promote missed/failed remarks to errors. It's not clear whether that would be useful for other remarks though, so we're not there yet. llvm-svn: 296013
*	Correct register pressure calculation in presence of subregs	Stanislav Mekhanoshin	2017-02-23	2	-16/+83
\| \| \| \| \| \| \| \| \| \|	If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009
*	[Hexagon] Avoid IMPLICIT_DEFs as new-value producers	Krzysztof Parzyszek	2017-02-23	1	-0/+79
\| \| \| \|	llvm-svn: 295997
*	AMDGPU/SI: Fix trunc i16 pattern	Jan Vesely	2017-02-23	1	-31/+60
\| \| \| \| \| \| \| \|	Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
*	[Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE	Krzysztof Parzyszek	2017-02-23	2	-36/+135
\| \| \| \|	llvm-svn: 295981
*	[ARM] GlobalISel: Lower call returns	Diana Picus	2017-02-23	1	-20/+40
\| \| \| \| \| \| \| \|	Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973
*	[ARM] GlobalISel: Lower call parameters in regs	Diana Picus	2017-02-23	1	-0/+77
\| \| \| \| \| \| \| \|	Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971
*	[X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register ↵	Ayman Musa	2017-02-23	2	-23/+2
\| \| \| \| \| \| \| \|	class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970
*	Fix assertion failure in ARMConstantIslandPass.	Kristof Beyls	2017-02-23	1	-9/+44
\| \| \| \| \| \| \| \| \| \|	The ARMConstantIslandPass didn't have support for handling accesses to constant island objects through ARM::t2LDRBpci instructions. This adds support for that. This fixes PR31997. llvm-svn: 295964
*	[X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency ↵	Ayman Musa	2017-02-23	1	-28/+28
\| \| \| \| \| \| \| \| \| \|	between VEX/EVEX versions. AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors. Differential Revision: https://reviews.llvm.org/D29988 llvm-svn: 295940
*	AMDGPU: Add another BFE pattern	Matt Arsenault	2017-02-23	1	-0/+163
\| \| \| \| \| \| \|	This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912
*	AMDGPU: Use clamp with f64	Matt Arsenault	2017-02-22	2	-6/+18
\| \| \| \|	llvm-svn: 295908
*	AMDGPU: Fold FP clamp as modifier bit	Matt Arsenault	2017-02-22	3	-9/+331
\| \| \| \| \| \| \| \| \| \| \|	The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
*	AMDGPU : Update TrapCode based on Trap Handler ABI.	Wei Ding	2017-02-22	1	-2/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
*	AMDGPU: Add replacement bfe intrinsics	Matt Arsenault	2017-02-22	2	-0/+1036
\| \| \| \|	llvm-svn: 295899
*	[AVR] Disable integrated assembler for a few tests	Dylan McKay	2017-02-22	3	-3/+3
\| \| \| \| \| \|	Fixes the build. llvm-svn: 295895
*	[Hexagon] Implement @llvm.readcyclecounter()	Krzysztof Parzyszek	2017-02-22	1	-0/+10
\| \| \| \|	llvm-svn: 295892
*	AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR	Matt Arsenault	2017-02-22	1	-3/+2
\| \| \| \| \| \| \| \| \|	This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891
*	[Hexagon] Add intrinsics for masked vector stores	Krzysztof Parzyszek	2017-02-22	2	-0/+82
\| \| \| \| \| \|	Patch by Harsha Jagasia. llvm-svn: 295879
*	AMDGPU: Don't look at chain users when adjusting writemask	Matt Arsenault	2017-02-22	1	-0/+86
\| \| \| \| \| \|	Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878
*	AMDGPU: Always allocate emergency stack slot at offset 0	Matt Arsenault	2017-02-22	18	-169/+203
\| \| \| \| \| \| \| \| \|	This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
*	AMDGPU: Change exp with compr bit printing	Matt Arsenault	2017-02-22	1	-26/+44
\| \| \| \|	llvm-svn: 295873
*	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."	Wei Ding	2017-02-22	1	-2/+2
\| \| \| \| \| \|	This reverts commit r295867. llvm-svn: 295871
*	AMDGPU : Update TrapCode based on Trap Handler ABI.	Wei Ding	2017-02-22	1	-2/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
*	Bring back 2>&1 redirection for this test	Matthias Braun	2017-02-22	1	-1/+1
\| \| \| \|	llvm-svn: 295864
*	[AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation.	Geoff Berry	2017-02-22	1	-0/+295
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend AArch64RedundantCopyElimination to catch cases where the register that is known to be zero is COPY'd in the predecessor block. Before this change, this pass would catch cases like: CBZW %W0, <BB#1> BB#1: %W0 = COPY %WZR // removed After this change, cases like the one below are also caught: %W0 = COPY %W1 CBZW %W1, <BB#1> BB#1: %W0 = COPY %WZR // removed This change results in a 4% increase in static copies removed by this pass when compiling the llvm test-suite. It also fixes regressions caused by doing post-RA copy propagation (a separate change to be put up for review shortly). Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30113 llvm-svn: 295863
*	MIRTests: Remove unnecessary 2>&1 redirection	Matthias Braun	2017-02-22	69	-69/+69
\| \| \| \| \| \| \|	llc mir output goes to stdout nowadays, so the 2>&1 is not necessary anymore for most tests. llvm-svn: 295859
*	[WebAssembly] Configure codegen to legalize f16 values.	Dan Gohman	2017-02-22	1	-0/+28
\| \| \| \|	llvm-svn: 295850
*	[DAGCombiner] revert r295336	Bill Seurer	2017-02-22	8	-55/+162
\| \| \| \| \| \| \| \| \| \| \|	r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849
*	[X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr	Igor Breger	2017-02-22	3	-0/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824
*	[X86] Regenerate CSE test with codegen instead of just the instruction count	Simon Pilgrim	2017-02-22	1	-2/+37
\| \| \| \|	llvm-svn: 295819
*	[ARM] Fix constant islands pass.	Roger Ferrer Ibanez	2017-02-22	1	-0/+1052
\| \| \| \| \| \| \| \| \| \| \| \|	The pass tries to fix a spill of LR that turns out to be unnecessary. So it removes the tPOP but forgets to remove tPUSH. This causes the stack be misaligned upon returning the function. Thus, remove the tPUSH as well in this case. Differential Revision: https://reviews.llvm.org/D30207 llvm-svn: 295816
*	[ARM] Classification Improvements to ARM Sched-Models. NFCI.	Javed Absar	2017-02-22	1	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811
*	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions ↵	Craig Topper	2017-02-22	2	-16/+36
\| \| \| \| \| \| \| \| \| \| \| \|	when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810
*	AMDGPU: Add cvt.pkrtz intrinsic	Matt Arsenault	2017-02-22	8	-46/+196
\| \| \| \| \| \|	Convert llvm.SI.packf16 test uses llvm-svn: 295797
*	AMDGPU: Remove some uses of llvm.SI.export in tests	Matt Arsenault	2017-02-22	32	-1039/+921
\| \| \| \| \| \|	Merge some of the old, smaller tests into more complete versions. llvm-svn: 295792
*	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic	Matt Arsenault	2017-02-21	9	-812/+776
\| \| \| \|	llvm-svn: 295789
*	AMDGPU: Redefine clamp node as clamp 0.0-1.0	Matt Arsenault	2017-02-21	3	-4/+550
\| \| \| \| \| \| \| \| \| \| \|	Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
*	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return ↵	Artem Belevich	2017-02-21	8	-36/+964
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784
*	[AArch64] Add test case for fusion of literal generation	Evandro Menezes	2017-02-21	1	-0/+46
\| \| \| \| \| \| \|	Add test case from https://reviews.llvm.org/D28698 that was somehow lost in transit. llvm-svn: 295775
*	[AArch64] Add test case for fusion of AES crypto operations	Evandro Menezes	2017-02-21	1	-0/+207
\| \| \| \| \| \| \|	Add test case from https://reviews.llvm.org/D28491 that was somehow lost in transit. llvm-svn: 295774
*	Fix PR31896.	Evgeniy Stepanov	2017-02-21	1	-0/+16
\| \| \| \| \| \|	Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762
*	AMDGPU: Remove dead declarations in tests	Matt Arsenault	2017-02-21	2	-8/+0
\| \| \| \|	llvm-svn: 295757
*	AMDGPU: Remove dead declarations from MIR tests	Matt Arsenault	2017-02-21	3	-48/+5
\| \| \| \|	llvm-svn: 295755
*	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic	Matt Arsenault	2017-02-21	1	-25/+0
\| \| \| \|	llvm-svn: 295754