bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] FastISel fall back on !absolute_symbol GVs	Vlad Tsyrklevich	2018-08-01	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: D25878, which added support for !absolute_symbol for normal X86 ISel, did not add support for materializing references to absolute symbols for X86 FastISel. This causes build failures because FastISel generates PC-relative relocations for absolute symbols. Fall back to normal ISel for references to !absolute_symbol GVs. Fix for PR38200. Reviewers: pcc, craig.topper Reviewed By: pcc Subscribers: hiraditya, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D50116 llvm-svn: 338599
*	[x86] remove stale FIXME note from test; NFC	Sanjay Patel	2018-08-01	1	-1/+1
\| \| \| \| \| \|	This was fixed with rL338592. llvm-svn: 338593
*	[SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type	Sanjay Patel	2018-08-01	6	-281/+178
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug is visible in the constant-folded x86 tests. We can't use the negated shift amount when the type is not power-of-2: https://rise4fun.com/Alive/US1r ...so in that case, use the regular lowering that includes a select to guard against a shift-by-bitwidth. This path is improved by only calculating the modulo shift amount once now. Also, improve the rotate (with power-of-2 size) lowering to use a negate rather than subtract from bitwidth. This improves the codegen whether we have a rotate instruction or not (although we can still see that we're not matching to a legal rotate in all cases). llvm-svn: 338592
*	[x86] add tests to show miscompile for funnel shift with weird size; NFC	Sanjay Patel	2018-08-01	1	-0/+29
\| \| \| \|	llvm-svn: 338587
*	[ARM] Armv8.2-A FP16 vector intrinsics tests	Sjoerd Meijer	2018-08-01	1	-0/+1148
\| \| \| \| \| \| \| \| \| \| \| \| \|	Clang support for the Armv8.2-A FP16 vector intrinsic was committed in rC328277, but this was never followed up, i.e. the LLVM part is missing. I've raised PR38404, and this is the first step to address this. I.e., this adds tests for the Armv8.2-A FP16 vector intrinsic, and thus shows which intrinsics already work, and which need further work. Differential Revision: https://reviews.llvm.org/D50142 llvm-svn: 338568
*	[FPEnv] Widen illegal width StrictFP vector operations as needed	Cameron McInally	2018-08-01	1	-0/+1553
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49806 llvm-svn: 338562
*	[AArch64] Fix FCCMP with FP16 operands	Bryan Chan	2018-08-01	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection. Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50115 llvm-svn: 338554
*	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero	Ryan Taylor	2018-08-01	1	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523
*	[SystemZ, TableGen] Fix shift count handling	Ulrich Weigand	2018-08-01	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DAG combiner logic to simplify AND masks in shift counts is invalid. While it is true that the SystemZ shift instructions ignore all but the low 6 bits of the shift count, it is still invalid to simplify the AND masks while the DAG still uses the standard shift operators (which are not defined to match the SystemZ instruction behavior). Instead, this patch performs equivalent operations during instruction selection. For completely removing the AND, this now happens via additional DAG match patterns implemented by a multi-alternative PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG patterns were already mostly OK, they just needed an output XForm to actually truncate the immediate value. Unfortunately, the latter change also exposed a bug in TableGen: it seems XForms are currently only handled correctly for direct operands of the outermost operation node. This patch also fixes that bug by simply recurring through the whole pattern. This should be NFC for all other targets. Differential Revision: https://reviews.llvm.org/D50096 llvm-svn: 338521
*	[MIPS GlobalISel] Select global address	Petar Jovanovic	2018-08-01	5	-0/+193
\| \| \| \| \| \| \| \| \| \|	Select G_GLOBAL_VALUE for position dependent code. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49803 llvm-svn: 338499
*	[X86] Adding more test patterns for lea-opt (PR37939)	Jatin Bhateja	2018-08-01	1	-0/+151
\| \| \| \| \| \| \| \|	Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50128 llvm-svn: 338483
*	[x86] Fix a really subtle miscompile due to a somewhat glaring bug in	Chandler Carruth	2018-08-01	1	-0/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EFLAGS copy lowering. If you have a branch of LLVM, you may want to cherrypick this. It is extremely unlikely to hit this case empirically, but it will likely manifest as an "impossible" branch being taken somewhere, and will be ... very hard to debug. Hitting this requires complex conditions living across complex control flow combined with some interesting memory (non-stack) initialized with the results of a comparison. Also, because you have to arrange for an EFLAGS copy to be in just the right place, almost anything you do to the code will hide the bug. I was unable to reduce anything remotely resembling a "good" test case from the place where I hit it, and so instead I have constructed synthetic MIR testing that directly exercises the bug in question (as well as the good behavior for completeness). The issue is that we would mistakenly assume any SETcc with a valid condition and an initial operand that was a register and a virtual register at that to be a register defining SETcc... It isn't though.... This would in turn cause us to test some other bizarre register, typically the base pointer of some memory. Now, testing this register and using that to branch on doesn't make any sense. It even fails the machine verifier (if you are running it) due to the wrong register class. But it will make it through LLVM, assemble, and it looks fine... But wow do you get a very unsual and surprising branch taken in your actual code. The fix is to actually check what kind of SETcc instruction we're dealing with. Because there are a bunch of them, I just test the may-store bit in the instruction. I've also added an assert for sanity that ensure we are, in fact, defining the register operand. =D llvm-svn: 338481
*	[x86/slh] Add unwind info to several tests to make it more obvious that	Chandler Carruth	2018-08-01	1	-12/+48
\| \| \| \| \| \| \| \| \| \| \|	we aren't incorrectly generating any of it when doing SLH. There was a bug that only occured with SLH that very much looked like it could be caused by bad unwind info, and so this was a prime suspect. Turns out that everything is fine, but this way we'll see if we end up, for example, putting things we shouldn't inside the prolog. llvm-svn: 338480
*	[GlobalISel][IRTranslator] Use RPO traversal when visiting blocks to translate.	Amara Emerson	2018-08-01	3	-5/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Previously we were just visiting the blocks in the function in IR order, which is rather arbitrary. Therefore we wouldn't always visit defs before uses, but the translation code relies on this assumption in some places. Only codegen change seen in tests is an elision of a redundant copy. Fixes PR38396 llvm-svn: 338476
*	AMDGPU: Add clamp bit to dot intrinsics	Konstantin Zhuravlyov	2018-08-01	7	-35/+155
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470
*	Revert r338354 "[ARM] Revert r337821"	Reid Kleckner	2018-07-31	3	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452
*	AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests	Matt Arsenault	2018-07-31	4	-443/+667
\| \| \| \| \| \| \|	R600 breaks on too many things to usefully test changes with ieee_mode on vs. off. llvm-svn: 338435
*	AMDGPU: Break 64-bit arguments into 32-bit pieces	Matt Arsenault	2018-07-31	1	-7/+43
\| \| \| \|	llvm-svn: 338421
*	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls	Matt Arsenault	2018-07-31	3	-13/+71
\| \| \| \| \| \| \|	This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418
*	AMDGPU: Scalarize vector argument types to calls	Matt Arsenault	2018-07-31	3	-32/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416
*	[X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151)	Simon Pilgrim	2018-07-31	4	-504/+235
\| \| \| \| \| \| \| \| \| \|	As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering. Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW. Differential Revision: https://reviews.llvm.org/D49562 llvm-svn: 338407
*	[X86] Add pattern matching for PMADDUBSW	Craig Topper	2018-07-31	1	-1788/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned. A C example that triggers this pattern ``` static const int N = 128; int8_t A[2N]; uint8_t B[2N]; int16_t C[N]; void foo() { for (int i = 0; i != N; ++i) C[i] = MIN(MAX((int16_t)A[2i](int16_t)B[2i] + (int16_t)A[2i+1](int16_t)B[2i+1], -32768), 32767); } ``` Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49829 llvm-svn: 338402
*	[X86] Add test cases that could use PMADDUBSW.	Craig Topper	2018-07-31	1	-0/+2233
\| \| \| \|	llvm-svn: 338401
*	[X86] Preserve more liveness information in emitStackProbeInline	Francis Visoiu Mistrih	2018-07-31	2	-7/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit fixes two issues with the liveness information after the call: 1) The code always spills RCX and RDX if InProlog == true, which results in an use of undefined phys reg. 2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to the basic blocks that use them, therefore they are seen undefined. https://llvm.org/PR38376 Differential Revision: https://reviews.llvm.org/D50020 llvm-svn: 338400
*	DAG: Fix PromoteFloatResult for fcanonicalize	Matt Arsenault	2018-07-31	1	-83/+101
\| \| \| \|	llvm-svn: 338382
*	AMDGPU: Fold undef fcanonicalize to qNaN	Matt Arsenault	2018-07-31	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376
*	AMDGPU: Fix test check line bugs	Matt Arsenault	2018-07-31	3	-23/+32
\| \| \| \|	llvm-svn: 338374
*	[SystemZ] Improve decoding in case of instructions with four register operands.	Jonas Paulsson	2018-07-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since z13, the max group size will be 2 if any μop has more than 3 register sources. This has been ignored sofar in the SystemZHazardRecognizer, but is now handled by recognizing those instructions and adjusting the tracking of decoding and the cost heuristic for grouping. Review: Ulrich Weigand https://reviews.llvm.org/D49847 llvm-svn: 338368
*	[ARM] Revert r337821	Sam Parker	2018-07-31	3	-11/+11
\| \| \| \| \| \| \|	Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354
*	[X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont.	Craig Topper	2018-07-31	1	-1/+0
\| \| \| \| \| \|	In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path. llvm-svn: 338342
*	[RISCV] Fixed test case failure due to r338047	Ana Pazos	2018-07-31	1	-1/+1
\| \| \| \|	llvm-svn: 338341
*	[AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR.	Amara Emerson	2018-07-31	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \|	Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337
*	[AArch64][GlobalISel] Make G_BLOCK_ADDR legal.	Amara Emerson	2018-07-31	1	-0/+45
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336
*	[GlobalISel] Add a G_BLOCK_ADDR opcode to handle IR blockaddress constants.	Amara Emerson	2018-07-31	1	-0/+12
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49900 llvm-svn: 338335
*	[DAGCombiner] transform sub-of-shifted-signbit to add	Sanjay Patel	2018-07-30	3	-36/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317
*	[MachineOutliner][AArch64] Add support for saving LR to a register	Jessica Paquette	2018-07-30	3	-9/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278
*	Add machine verifier to arm64-opt-remarks-lazy-bfi	Jessica Paquette	2018-07-30	1	-5/+8
\| \| \| \| \| \| \| \|	Previously, I thought this was a Windows failure. Then I realized it failed on every bot that used the verifier. This makes it use the verifier always, and adds that pass to the pipeline checks so that it's consistent across all bots. llvm-svn: 338272
*	[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a ↵	David Bolvansky	2018-07-30	3	-123/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rotate can be formed Summary: Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op. Patch by: sameconrad (Sam Conrad) Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar Reviewed By: lebedev.ri Subscribers: efriedma, kparzysz, llvm-commits Differential Revision: https://reviews.llvm.org/D47681 llvm-svn: 338270
*	Reapply "Fix crash on inline asm with 64bit matching input in 32bit GPR"	Thomas Preud'homme	2018-07-30	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \|	This reapplies commit r338206 reverted by r338214 since the bug that r338206 uncovered has been fixed in r338268. Add support for inline assembly with matching input operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR). Note that regular input is already handled by existing code. llvm-svn: 338269
*	Fix uninitialized read in ARM's PrintAsmOperand	Thomas Preud'homme	2018-07-30	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268
*	Attempt to fix Windows test failure caused by r338133	Jessica Paquette	2018-07-30	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It seems like the pass pipeline on Windows is slightly different than on Linux and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing. This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly again. It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT lines following the line we care about (the optimization remark emitter) do a pretty good job of enforcing the ordering we want. Hopefully this works, since I don't have a Windows machine. ;) Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295 llvm-svn: 338267
*	[X86] Regenerate NOBMI/BMI combine-select tests.	Simon Pilgrim	2018-07-30	1	-34/+38
\| \| \| \| \| \|	Test cleanup for D38128 llvm-svn: 338265
*	[X86] Regenerate PKU test to merge 32/64-bit rdpkru checks	Simon Pilgrim	2018-07-30	1	-11/+5
\| \| \| \| \| \|	Test cleanup for D38128 llvm-svn: 338264
*	[X86] Regenerate fast-isel tests.	Simon Pilgrim	2018-07-30	3	-48/+20
\| \| \| \| \| \|	Test cleanup for D38128 llvm-svn: 338262
*	[Hexagon] Simplify A4_rcmp[n]eqi R, 0	Krzysztof Parzyszek	2018-07-30	1	-0/+154
\| \| \| \| \| \| \|	Consider cases when register R is known to be zero/non-zero, or when it is defined by a C2_muxii instruction. llvm-svn: 338251
*	AMDGPU: Reduce code size with fcanonicalize (fneg x)	Matt Arsenault	2018-07-30	4	-48/+71
\| \| \| \| \| \| \| \|	When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
*	AMDGPU: Make fneg combine handle fcanonicalize	Matt Arsenault	2018-07-30	1	-0/+21
\| \| \| \|	llvm-svn: 338243
*	[MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction	Francis Visoiu Mistrih	2018-07-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The machine verifier asserts with: Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542. It calls analyzeBranch which tries to call getMBB if the opcode is JMP_1, but in this case we do: JMP_1 @OUTLINED_FUNCTION I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used with brtarget8. Differential Revision: https://reviews.llvm.org/D49299 llvm-svn: 338237
*	AMDGPU: Force skip over s_sendmsg and exp instructions	Nicolai Haehnle	2018-07-30	3	-1/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235
*	[ARM] Fix over-alignment in arguments that are HA of 128-bit vectors	Petr Pavlu	2018-07-30	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233