bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX	Simon Pilgrim	2017-01-13	2	-28/+24
\| \| \| \| \| \| \| \| \| \|	Use v8i64 variable ASHR instructions if we don't have VLX. This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll. Differential Revision: https://reviews.llvm.org/D28604 llvm-svn: 291901
*	[ARM] Enable objdump to construct triple for ARM	Sam Parker	2017-01-13	8	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \|	Now that The ARMAttributeParser has been moved into the library, it has been modified so that it can parse the attributes without printing them and stores them in a map. ELFObjectFile now queries the attributes to fill out the architecture details of a provided triple for 'arm' and 'thumb' targets. llvm-objdump uses this new functionality. Differential Revision: https://reviews.llvm.org/D28281 llvm-svn: 291898
*	[X86][AVX512] Adding missing shuffle lowering to blend mask instructions	Michael Zuckerman	2017-01-13	10	-97/+279
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888
*	Track validity of pass results	Serge Pavlov	2017-01-13	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Running tests with expensive checks enabled exhibits some problems with verification of pass results. First, the pass verification may require results of analysis that are not available. For instance, verification of loop info requires results of dominator tree analysis. A pass may be marked as conserving loop info but does not need to be dependent on DominatorTreePass. When a pass manager tries to verify that loop info is valid, it needs dominator tree, but corresponding analysis may be already destroyed as no user of it remained. Another case is a pass that is skipped. For instance, entities with linkage available_externally do not need code generation and such passes are skipped for them. In this case result verification must also be skipped. To solve these problems this change introduces a special flag to the Pass structure to mark passes that have valid results. If this flag is reset, verifications dependent on the pass result are skipped. Differential Revision: https://reviews.llvm.org/D27190 llvm-svn: 291882
*	Move test of lazy BFI with ORE to a generic directory	Adam Nemet	2017-01-13	1	-0/+0
\| \| \| \|	llvm-svn: 291862
*	[asan] Don't overalign global metadata.	Evgeniy Stepanov	2017-01-12	1	-1/+1
\| \| \| \| \| \| \| \| \|	Other than on COFF with incremental linking, global metadata should not need any extra alignment. Differential Revision: https://reviews.llvm.org/D28628 llvm-svn: 291859
*	[asan] Refactor instrumentation of globals.	Evgeniy Stepanov	2017-01-12	1	-3/+2
\| \| \| \|	llvm-svn: 291858
*	[ThinLTO] Import static functions from the same module as caller	Teresa Johnson	2017-01-12	1	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can sometimes end up with multiple copies of a local function that have the same GUID in the index. This happens when there are local functions with the same name that are in different source files with the same name (but in different directories), and they were compiled in their own directory so had the same path at compile time. In this case make sure we import the copy in the caller's module. While it isn't a correctness problem (the renamed reference which is based on the module IR hash will be unique since the module must have had an externally visible function that was imported), importing the wrong copy will result in lost performance opportunity since it won't be referenced and inlined. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28440 llvm-svn: 291841
*	[DebugInfo] Handle same locations in DILocation::getMergedLocation	Robert Lougher	2017-01-12	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \|	Revision 289661 introduced the function DILocation::getMergedLocation for merging of debug locations. At the time is was simply a stub which always returned no location. This patch modifies getMergedLocation to handle the case where the two locations are the same or can't be discriminated. Differential Revision: https://reviews.llvm.org/D28521 llvm-svn: 291809
*	[X86] Replace AND+IMM64 with SRL/SHL	Nikolai Bozhenov	2017-01-12	3	-10/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result is unused and the mask has only higher/lower bits set. For example, with this patch LLVM emits shrq $41, %rdi je instead of movabsq $0xFFFFFE0000000000, %rcx testq %rcx, %rdi je This reduces number of instructions, code size and register pressure. The transformation is applied only for cases where the mask cannot be encoded as an immediate value within TESTQ instruction. Differential Revision: https://reviews.llvm.org/D28198 llvm-svn: 291806
*	[X86] Modify BypassSlowDivision tests to match their new names (NFC)	Nikolai Bozhenov	2017-01-12	3	-102/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- bypass-slow-division-32.ll: tests verifying correctness of divl-to-divb bypassing - bypass-slow-division-64.ll: tests verifying correctness of divq-to-divl bypassing - bypass-slow-division-tune.ll: tests verifying that bypassing is enabled only when appropriate Differential Revision: https://reviews.llvm.org/D28551 llvm-svn: 291804
*	[X86] Rename tests for bypassing slow division (NFC)	Nikolai Bozhenov	2017-01-12	3	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For tests on bypassing slow division there's no need to be Atom-specific. The patch renames all tests on division bypassing and makes their names more consistent: atom-bypass-slow-division.ll -> bypass-slow-division-32.ll (tests verifying correctness of divl-to-divb bypassing) atom-bypass-slow-division-64.ll -> bypass-slow-division-64.ll (tests verifying correctness of divq-to-divl bypassing) slow-div.ll -> bypass-slow-division-tune.ll (tests verifying that bypassing is enabled only when appropriate) Differential Revision: https://reviews.llvm.org/D28197 llvm-svn: 291802
*	[X86] Tune bypassing of slow division for Intel CPUs	Nikolai Bozhenov	2017-01-12	2	-15/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800
*	[X86] Update LLC tests for slow division bypassing (NFC)	Nikolai Bozhenov	2017-01-12	2	-74/+232
\| \| \| \| \| \| \| \| \| \| \| \|	Run update_llc_test_checks.py on CodeGen/X86/atom-bypass-slow-division.ll CodeGen/X86/atom-bypass-slow-division-64.ll CodeGen/X86/slow-div.ll Differential Revision: https://reviews.llvm.org/D28469 llvm-svn: 291799
*	AMDGPU: Skip fneg/select combine if it can fold into other	Matt Arsenault	2017-01-12	2	-0/+159
\| \| \| \|	llvm-svn: 291792
*	AMDGPU: Fold free fneg into sin	Matt Arsenault	2017-01-12	1	-0/+42
\| \| \| \|	llvm-svn: 291790
*	AMDGPU: Fold fneg into fmul_legacy	Matt Arsenault	2017-01-12	1	-0/+178
\| \| \| \|	llvm-svn: 291784
*	AMDGPU: Fold fneg into rcp	Matt Arsenault	2017-01-12	1	-0/+100
\| \| \| \|	llvm-svn: 291779
*	AMDGPU: Fold fneg into fp_round	Matt Arsenault	2017-01-12	1	-0/+172
\| \| \| \|	llvm-svn: 291778
*	AMDGPU: Fold fneg into fp_extend	Matt Arsenault	2017-01-12	1	-0/+126
\| \| \| \|	llvm-svn: 291777
*	[Devirtualization] MemDep returns non-local !invariant.group dependencies	Piotr Padlewski	2017-01-12	4	-10/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Memory Dependence Analysis was limited to return only local dependencies for invariant.group handling. Now it returns NonLocal when it finds it and then by asking getNonLocalPointerDependency we get found dep. Thanks to this we are able to devirtualize loops! void indirect(A &a, int n) { for (int i = 0 ; i < n; i++) a.foo(); } void test(int n) { A a; indirect(a); } After inlining a.foo() will be changed to direct call, even if foo and A::A() is external (but only if vtable definition is be available). Reviewers: nlewycky, dberlin, chandlerc, rsmith Subscribers: mehdi_amini, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D28137 llvm-svn: 291762
*	[XRay] Implement the `llvm-xray account` subcommand	Dean Michael Berris	2017-01-12	5	-0/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the third of a multi-part change to implement subcommands for the `llvm-xray` tool. Here we define the `account` subcommand which does simple function call accounting, generating basic statistics on function calls we find in an XRay log/trace. We support text output and csv output for this subcommand. This change also supports sorting, summing, and filtering the top N results. Part of this tool will later be turned into a library that could be used for basic function call accounting. Depends on D24376. Reviewers: dblaikie, echristo Subscribers: mehdi_amini, dberris, beanz, llvm-commits Differential Revision: https://reviews.llvm.org/D24377 llvm-svn: 291749
*	[AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵	Craig Topper	2017-01-12	1	-153/+42
\| \| \| \| \| \|	with VLX, but no DQ or BW support. llvm-svn: 291747
*	[AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵	Craig Topper	2017-01-12	1	-143/+60
\| \| \| \| \| \|	when avx512vl is available, but not avx512dq. llvm-svn: 291746
*	[X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 ↵	Elad Cohen	2017-01-12	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mask r289653 added a case where `vselect <cond> <vector1> <all-zeros>` is transformed to: `vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>` This was not aimed to catch cases where Cond is not a vXi1 mask but it does. Moreover, when Cond type is VxiN (N > 1) then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond). This patch changes the above to xor with allones, and avoids entering the case for non-mask Conds. llvm-svn: 291745
*	[AVX-512] Add more varied avx512 feature command lines to the avx512-cvt.ll ↵	Craig Topper	2017-01-12	1	-590/+954
\| \| \| \| \| \| \| \|	test to show some poor codegen examples. We're definitely doing bad things when avx512vl is enabled without avx512dq. It looks like avx512vl/dq without avx512bw may also have some issues. llvm-svn: 291744
*	Make a test actually test what it set out to test.	Chandler Carruth	2017-01-12	1	-11/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This test seems to have largely been relying on asserts being tripped. It had a very specific and somewhat uninteresting grep of the output, but it never really did anything to cause SCEV to be preserved across loop simplify, certainly not explicitly. And a later addition to it actually added CHECK lines despite the test never running FileCheck. Now we actually print SCEV before and after loop simplify to make sure it is changing and being updated. Which seems to be much more likely the point of the test. llvm-svn: 291740
*	AMDGPU: Fold fneg into fma or fmad	Matt Arsenault	2017-01-12	1	-0/+308
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291733
*	AMDGPU: Fold fneg into fmul	Matt Arsenault	2017-01-12	3	-12/+189
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291732
*	AMDGPU: Fold fneg into fadd	Matt Arsenault	2017-01-12	1	-0/+179
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291731
*	AMDGPU: Pull fneg/fabs out of a select	Matt Arsenault	2017-01-11	2	-2/+729
\| \| \| \| \| \|	Allows better source modifier usage. llvm-svn: 291729
*	AMDGPU: Fix shrinking of addc/subb.	Matt Arsenault	2017-01-11	1	-0/+292
\| \| \| \| \| \|	To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720
*	AMDGPU: Fix sext_inreg for i1 in i16	Matt Arsenault	2017-01-11	1	-0/+133
\| \| \| \| \| \| \| \|	This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717
*	AMDGPU: Fix breaking VOP3 v_add_i32s	Matt Arsenault	2017-01-11	1	-0/+305
\| \| \| \| \| \| \|	This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716
*	[asan] Set alignment of __asan_global_* globals to sizeof(GlobalStruct)	Kuba Mracek	2017-01-11	1	-1/+1
\| \| \| \| \| \| \| \|	When using profiling and ASan together (-fprofile-instr-generate -fcoverage-mapping -fsanitize=address), at least on Darwin, the section of globals that ASan emits (__asan_globals) is misaligned and starts at an odd offset. This really doesn't have anything to do with profiling, but it triggers the issue because profiling emits a string section, which can have arbitrary size. This patch changes the alignment to sizeof(GlobalStruct). Differential Revision: https://reviews.llvm.org/D28573 llvm-svn: 291715
*	AMDGPU: Fix folding immediates into mac src2	Matt Arsenault	2017-01-11	1	-0/+66
\| \| \| \| \| \| \|	Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711
*	Add test that verifies we don't peel loops in optsize functions. NFC.	Michael Kuperstein	2017-01-11	1	-0/+39
\| \| \| \|	llvm-svn: 291708
*	LowerTypeTests: Represent the memory region size with the constant size-1.	Peter Collingbourne	2017-01-11	4	-6/+6
\| \| \| \| \| \| \| \| \|	This means that we can use a shorter instruction sequence in the case where the size is a power of two and on the boundary between two representations. Differential Revision: https://reviews.llvm.org/D28421 llvm-svn: 291706
*	[SCEV] Make howFarToZero max backedge-taken count check for precondition.	Eli Friedman	2017-01-11	1	-4/+2
\| \| \| \| \| \| \| \| \|	Refines max backedge-taken count if a loop like "for (int i = 0; i != n; ++i) { /* body */ }" is rotated. Differential Revision: https://reviews.llvm.org/D28536 llvm-svn: 291704
*	[SCEV] Make howFarToZero use a simpler formula for max backedge-taken count.	Eli Friedman	2017-01-11	1	-0/+83
\| \| \| \| \| \| \| \| \|	This is both easier to understand, and produces a tighter bound in certain cases. Differential Revision: https://reviews.llvm.org/D28393 llvm-svn: 291701
*	Re-apply r291205, "LowerTypeTests: Split the pass in two: a resolution phase ↵	Peter Collingbourne	2017-01-11	7	-12/+9
\| \| \| \| \| \|	and a lowering phase.", with a fix for an off-by-one error. llvm-svn: 291699
*	NewGVN: Fix PR31594, by tracking the store count of congruence	Daniel Berlin	2017-01-11	1	-0/+119
\| \| \| \| \| \| \| \| \| \| \|	classes, and updating checking to allow for equivalence through reachability. (Sadly, the checking here is not perfect, and can't be made perfect, so we'll have to disable it after we are satisfied with correctness. Right now it is just "very unlikely" to happen.) llvm-svn: 291698
*	Resubmit "[PGO] Turn off comdat renaming in IR PGO by default"	Rong Xu	2017-01-11	5	-21/+81
\| \| \| \| \| \|	This patch resubmits the changes in r291588. llvm-svn: 291696
*	Revert "CodeGen: Allow small copyable blocks to "break" the CFG."	Kyle Butt	2017-01-11	57	-420/+205
\| \| \| \| \| \| \| \| \|	This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695
*	[ARM] More aggressive matching for vpadd and vpaddl.	Eli Friedman	2017-01-11	2	-18/+234
\| \| \| \| \| \| \| \| \|	The new matchers work after legalization to make them simpler, and to avoid blocking other optimizations. Differential Revision: https://reviews.llvm.org/D27779 llvm-svn: 291693
*	[SLP] Remove bogus assert.	Michael Kuperstein	2017-01-11	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	The removed assert seems bogus - it's perfectly legal for the roots of the vectorized subtrees to be equal even if the original scalar values aren't, if the original scalars happen to be equivalent. This fixes PR31599. Differential Revision: https://reviews.llvm.org/D28539 llvm-svn: 291692
*	[X86][XOP] Add vpermil2ps target shuffle -> insertps combine test	Simon Pilgrim	2017-01-11	1	-0/+14
\| \| \| \|	llvm-svn: 291690
*	Revert rL291205 because it breaks Chrome tests under CFI.	Ivan Krasin	2017-01-11	7	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Revert LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase. This change separates how type identifiers are resolved from how intrinsic calls are lowered. All information required to lower an intrinsic call is stored in a new TypeIdLowering data structure. The idea is that this data structure can either be initialized using the module itself during regular LTO, or using the module summary in ThinLTO backends. Original URL: https://reviews.llvm.org/D28341 Reviewers: pcc Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D28532 llvm-svn: 291684
*	[ARM] Fix test CodeGen/ARM/fpcmp_ueq.ll broken by rL290616	Evgeny Astigeevich	2017-01-11	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Commit rL290616 (https://reviews.llvm.org/rL290616) changed a checking command for the triple arm-apple-darwin in LLVM::CodeGen/ARM/fpcmp_ueq.ll. As a result of the changes the test could fail for the valid generated code. These changes fixes the test to check only instructions we would expect. Differential Revision: https://reviews.llvm.org/D28159 llvm-svn: 291678
*	X86 CodeGen: Optimized pattern for truncate with unsigned saturation.	Elena Demikhovsky	2017-01-11	2	-0/+231
\| \| \| \| \| \| \| \| \|	DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670