bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[InstCombine] X \| C == C --> (X & ~C) == 0	Sanjay Patel	2019-02-06	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	We should canonicalize to one of these forms, and compare-with-zero could be more conducive to follow-on transforms. This also leads to generally better codegen as shown in PR40611: https://bugs.llvm.org/show_bug.cgi?id=40611 llvm-svn: 353313
*	[InstCombine] add tests for PR40611 and regenerate checks; NFC	Sanjay Patel	2019-02-06	1	-294/+349
\| \| \| \| \| \|	Lots of unrelated diffs here from the newer version of the script. llvm-svn: 353312
*	[InstCombine] limit extracting shuffle transform based on uses	Sanjay Patel	2019-02-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	As discussed in D53037, this can lead to worse codegen, and we don't generally expect the backend to be able to optimize arbitrary shuffles. If there's only one use of the 1st shuffle, that means it's getting removed, so that should always be safe. llvm-svn: 353235
*	[InstCombine] split shuffle test to show extra use constraint; NFC	Sanjay Patel	2019-02-05	1	-3/+14
\| \| \| \| \| \| \|	As discussed in D53037, this transform can cause codegen problems if the 1st shuffle has multiple uses. llvm-svn: 353233
*	Revert "[PATCH] [TargetLibraryInfo] Update run time support for Windows"	Evandro Menezes	2019-02-04	4	-49/+73
\| \| \| \| \| \|	This reverts accidental commit ff5527718d5d3b9966f6e8948866c0dc15ffcf3c. llvm-svn: 353118
*	[PATCH] [TargetLibraryInfo] Update run time support for Windows	Evandro Menezes	2019-02-04	4	-73/+49
\| \| \| \| \| \| \| \| \| \| \| \| \|	It seems that the run time for Windows has changed and supports more math functions than before. Since LLVM requires at least VS2015, I assume that this is the run time that would be redistributed with programs built with Clang. Thus, I based this update on the header file `math.h` that accompanies it. This patch addresses the PR40541. Unfortunately, I have no access to a Windows development environment to validate it. llvm-svn: 353114
*	[InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded	Nicolai Haehnle	2019-02-04	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The fix added in r352904 is not quite correct, or rather misleading: 1. When the texfailctrl (TFC) argument was non-constant, the fix assumed non-TFE/LWE, which is incorrect. 2. Regardless, this code path cannot even be hit for correct TFE/LWE-enabled calls, because those return a struct. Added a test case for those for completeness. Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf Reviewers: hliao, dstuttard, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57681 llvm-svn: 353097
*	Commit tests for changes in revision D41608	Dmitry Venikov	2019-02-04	1	-0/+91
\| \| \| \|	llvm-svn: 353037
*	[InstCombine] Refactor test checks (NFC)	Evandro Menezes	2019-02-01	1	-198/+198
\| \| \| \|	llvm-svn: 352935
*	[Test] Update file w/update_test_checks.py to make a follow on change obvious	Philip Reames	2019-02-01	1	-29/+29
\| \| \| \|	llvm-svn: 352932
*	[InstCombine] Expand Windows test (NFC)	Evandro Menezes	2019-02-01	1	-52/+66
\| \| \| \| \| \|	Run checks for Win32 as well. llvm-svn: 352917
*	[InstCombine] Expand Windows test (NFC)	Evandro Menezes	2019-02-01	1	-21/+26
\| \| \| \| \| \|	Run checks for Win64 as well. llvm-svn: 352908
*	[InstCombine] Extra null-checking on TFE/LWE support	Michael Liao	2019-02-01	1	-0/+7
\| \| \| \| \| \| \| \|	- If that operand is not ConstantInt, skip enabling TFE/LWE. Differential Revision: https://reviews.llvm.org/D57539 llvm-svn: 352904
*	[InstCombine] Refactor test checks (NFC)	Evandro Menezes	2019-02-01	1	-16/+13
\| \| \| \|	llvm-svn: 352895
*	[InstCombine] Expand Windows test (NFC)	Evandro Menezes	2019-02-01	1	-15/+44
\| \| \| \| \| \|	Add checks for Win64 to existing cases. llvm-svn: 352892
*	[InstCombine] Refactor test checks (NFC)	Evandro Menezes	2019-02-01	1	-42/+24
\| \| \| \|	llvm-svn: 352886
*	[InstCombine] try to reduce x86 addcarry to generic uaddo intrinsic	Sanjay Patel	2019-02-01	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we can reduce the x86-specific intrinsic to the generic op, it allows existing simplifications and value tracking folds. AFAICT, this always results in identical x86 codegen in the non-reduced case...which should be true because we semi-generically (too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must already combine/lower this intrinsic as expected. This isn't quite what was requested in: https://bugs.llvm.org/show_bug.cgi?id=40486 ...but we want to have these kinds of folds early for efficiency and to enable greater simplifications. For the case in the bug report where we have: _addcarry_u64(0, ahi, 0, &ahi) ...this gets completely simplified away in IR. Differential Revision: https://reviews.llvm.org/D57453 llvm-svn: 352870
*	[InstCombine] Missed optimization in math expression: simplify calls exp ↵	Dmitry Venikov	2019-01-31	2	-30/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functions Summary: This patch enables folding following expressions under -ffast-math flag: exp(X) * exp(Y) -> exp(X + Y), exp2(X) * exp2(Y) -> exp2(X + Y). Motivation: https://bugs.llvm.org/show_bug.cgi?id=35594 Reviewers: hfinkel, spatel, efriedma, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D41342 llvm-svn: 352730
*	[InstCombine] Expand testing for Windows (NFC)	Evandro Menezes	2019-01-31	1	-48/+67
\| \| \| \| \| \|	Added the checks to the existing cases when the target is Win64. llvm-svn: 352714
*	[InstCombine] Simplify check clauses in test (NFC)	Evandro Menezes	2019-01-31	1	-71/+57
\| \| \| \|	llvm-svn: 352707
*	Add a 'dynamic' parameter to the objectsize intrinsic	Erik Pilkington	2019-01-30	6	-43/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
*	[Tests] Add tests for propagation of undef elements in vector GEPs	Philip Reames	2019-01-30	1	-0/+25
\| \| \| \|	llvm-svn: 352662
*	SimplifyDemandedVectorElts for all intrinsics	Philip Reames	2019-01-30	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones. This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output. This is due to order of transforms w/in instcombine resulting in two slightly different fixed points. That's something we should fix, but isn't a problem w/this patch per se. Differential Revision: https://reviews.llvm.org/D57398 llvm-svn: 352653
*	[InstCombine][x86] add tests for addcarry intrinsic; NFC	Sanjay Patel	2019-01-30	1	-0/+36
\| \| \| \|	llvm-svn: 352627
*	Commit tests for changes in revision D41342	Dmitry Venikov	2019-01-30	2	-0/+178
\| \| \| \|	llvm-svn: 352613
*	[InstCombine] canonicalize cmp/select form of uadd saturate with constant	Sanjay Patel	2019-01-29	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm circling back around to a loose end from D51929. The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants. Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier. Pre: C2 == ~C1 %a = add i32 %x, C1 %c = icmp ugt i32 %x, C2 %r = select i1 %c, i32 -1, i32 %a => %a = add i32 %x, C1 %c2 = icmp ult i32 %x, C2 %r = select i1 %c2, i32 %a, i32 -1 https://rise4fun.com/Alive/pkH Differential Revision: https://reviews.llvm.org/D57352 llvm-svn: 352536
*	[InstCombine] regenerate test checks; NFC	Sanjay Patel	2019-01-29	1	-63/+63
\| \| \| \|	llvm-svn: 352517
*	[InstCombine] add tests for ext-of-bool + add/sub; NFC	Sanjay Patel	2019-01-29	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should choose one of these as canonical: %z = zext i1 %cmp to i32 %r = sub i32 %x, %z => %s = sext i1 %cmp to i32 %r = add i32 %x, %s The test comments assume that the zext form is better, but we can adjust that if we decide to go the other way. llvm-svn: 352515
*	Correct contents for r352453	Philip Reames	2019-01-29	1	-72/+48
\| \| \| \| \| \|	I had a local change I hadn't realized when submitting that auto-update. As such, the auto-update was wrong. This should fix it, and with that, it's clearly time to stop submitting changes and go to bed. llvm-svn: 352454
*	[Tests] Regen to remove future test diffs	Philip Reames	2019-01-29	1	-128/+152
\| \| \| \| \| \|	This file appears to have been manually editted at some point after being auto-updated. A future change adjusts this file slightly, and all of the updates makes the diff super confusing. llvm-svn: 352453
*	[Test] Add tests for gather/maked.load demanded elements, and convert the ↵	Philip Reames	2019-01-29	1	-17/+67
\| \| \| \| \| \|	whole file to auto generated checks. llvm-svn: 352452
*	Demanded elements support for vector GEPs	Philip Reames	2019-01-28	1	-15/+9
\| \| \| \| \| \| \| \|	GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations Differential Revision: https://reviews.llvm.org/D57177 llvm-svn: 352440
*	[InstCombine] add another saturating uadd test (no undefs); NFC	Sanjay Patel	2019-01-28	1	-2/+15
\| \| \| \| \| \|	I forgot that our undef matching hasn't been completed in the previous commit. llvm-svn: 352424
*	[InstCombine] add tests for saturating uadd with constant; NFC	Sanjay Patel	2019-01-28	1	-0/+55
\| \| \| \|	llvm-svn: 352423
*	[ValueTracking] Look through casts when determining non-nullness	Johannes Doerfert	2019-01-26	4	-9/+108
\| \| \| \| \| \| \| \| \| \|	Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the value of their operand and can therefore be looked through when we determine non-nullness. Differential Revision: https://reviews.llvm.org/D54956 llvm-svn: 352293
*	Test cases for demanded elements on vector GEPs	Philip Reames	2019-01-24	1	-0/+127
\| \| \| \| \| \|	This is the first part of splitting apart https://reviews.llvm.org/D57140 into usuable pieces. Landing the tests in advance of posting a review specifically for the demanded elements part. llvm-svn: 352091
*	[InstCombine] Simplify cttz/ctlz + icmp ugt/ult	Nikita Popov	2019-01-19	1	-26/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Followup to D55745, this time handling comparisons with ugt and ult predicates (which are the canonical forms for non-equality predicates). For ctlz we can convert into a simple icmp, for cttz we can convert into a mask check. Differential Revision: https://reviews.llvm.org/D56355 llvm-svn: 351645
*	[InstCombine] Don't sink dynamic allocas	Reid Kleckner	2019-01-17	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: InstCombine's sinking algorithm only thinks about memory. It doesn't think about non-memory constraints like stack object lifetime. It can sink dynamic allocas across a stacksave call, which may be used with stackrestore, which can incorrectly reduce the lifetime of the dynamic alloca. Fixes PR40365 Reviewers: hfinkel, efriedma Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56872 llvm-svn: 351475
*	[InstCombine]Avoid introduction of unaligned mem access	Serguei Katkov	2019-01-16	1	-90/+158
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair. It might result in generation of unaligned atomic load/store which later in backend will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is and handle it at backend. Reviewers: reames, anna, apilipenko, mkazantsev Reviewed By: reames Subscribers: t.p.northover, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56582 llvm-svn: 351295
*	Remove irrelevant references to legacy git repositories from	James Y Knight	2019-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200
*	[InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.	Florian Hahn	2019-01-15	1	-13/+29
\| \| \| \| \| \| \| \| \|	Otherwise instcombine gets stuck in a cycle. The canonicalization was added in D55961. This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400 llvm-svn: 351187
*	AMDGPU: Add a fast path for icmp.i1(src, false, NE)	Marek Olsak	2019-01-15	1	-0/+193
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
*	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try	David Stuttard	2019-01-14	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
*	[ConstantFolding] Fold undef for integer intrinsics	Nikita Popov	2019-01-11	1	-411/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=40110. This implements handling of undef operands for integer intrinsics in ConstantFolding, in particular for the bitcounting intrinsics (ctpop, cttz, ctlz), the with.overflow intrinsics, the saturating math intrinsics and the funnel shift intrinsics. The undef behavior follows what InstSimplify does for the general cas e of non-constant operands. For the bitcount intrinsics (where InstSimplify doesn't do undef handling -- there cannot be a combination of an undef + non-constant operand) I'm using a 0 result if the intrinsic is defined for zero and undef otherwise. Differential Revision: https://reviews.llvm.org/D55950 llvm-svn: 350971
*	[SimplifyLibCalls] Fix memchr expansion for constant strings.	Eli Friedman	2019-01-09	2	-18/+22
\| \| \| \| \| \| \| \| \| \| \| \|	The C standard says "The memchr function locates the first occurrence of c (converted to an unsigned char)[...]". The expansion was missing the conversion to unsigned char. Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 . Differential Revision: https://reviews.llvm.org/D55947 llvm-svn: 350775
*	[InstCombine] remove stale comments; NFC	Sanjay Patel	2019-01-08	1	-6/+0
\| \| \| \| \| \|	These changed with rL350672. llvm-svn: 350674
*	[InstCombine] canonicalize another raw IR rotate pattern to funnel shift	Sanjay Patel	2019-01-08	1	-28/+5
\| \| \| \| \| \| \| \| \|	This is matching the equivalent of the DAG expansion, so it should never end up with worse perf than the original code even if the target doesn't have a rotate instruction. llvm-svn: 350672
*	fix comment typo - NFC	Chen Zheng	2019-01-08	1	-1/+1
\| \| \| \|	llvm-svn: 350587
*	[InstCombine] Improve cttz/ctlz + icmp tests; NFC	Nikita Popov	2019-01-05	1	-83/+134
\| \| \| \| \| \| \| \|	Change part of the tests to use vectors (I'm using scalar for ugt and vector for ult), add multiuse variations, rename %lz to %tz for the cttz tests. llvm-svn: 350471
*	[InstCombine] Add cttz/ctlz + icmp ugt/ult tests; NFC	Nikita Popov	2019-01-05	1	-0/+174
\| \| \| \|	llvm-svn: 350468