bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merging r348444:	Tom Stellard	2018-12-07	1	-4/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r348444 \| matze \| 2018-12-05 17:40:23 -0800 (Wed, 05 Dec 2018) \| 15 lines AArch64: Fix invalid CCMP emission The code emitting AND-subtrees used to check whether any of the operands was an OR in order to figure out if the result needs to be negated. However the OR could be hidden in further subtrees and not immediately visible. Change the code so that canEmitConjunction() determines whether the result of the generated subtree needs to be negated. Cleanup emission logic to use this. I also changed the code a bit to make all negation decisions early before we actually emit the subtrees. This fixes http://llvm.org/PR39550 Differential Revision: https://reviews.llvm.org/D54137 ------------------------------------------------------------------------ llvm-svn: 348642
*	Merging r339260:	Tom Stellard	2018-11-30	12	-222/+1446
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339260 \| syzaara \| 2018-08-08 08:20:43 -0700 (Wed, 08 Aug 2018) \| 13 lines [PowerPC] Improve codegen for vector loads using scalar_to_vector This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 ------------------------------------------------------------------------ llvm-svn: 347957
*	Merging r347431:	Tom Stellard	2018-11-29	1	-8/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r347431 \| rnk \| 2018-11-21 14:01:10 -0800 (Wed, 21 Nov 2018) \| 12 lines [mingw] Use unmangled name after the $ in the section name GCC does it this way, and we have to be consistent. This includes stdcall and fastcall functions with suffixes. I confirmed that a fastcall function named "foo" ends up in ".text$foo", not ".text$@foo@8". Based on a patch by Andrew Yohn! Fixes PR39218. Differential Revision: https://reviews.llvm.org/D54762 ------------------------------------------------------------------------ llvm-svn: 347931
*	Merging r344591:	Tom Stellard	2018-11-16	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r344591 \| abeserminji \| 2018-10-16 01:27:28 -0700 (Tue, 16 Oct 2018) \| 11 lines [mips][micromips] Fix how values in .gcc_except_table are calculated When a landing pad is calculated in a program that is compiled for micromips, it will point to an even address. Such an error will cause a segmentation fault, as the instructions in micromips are aligned on odd addresses. This patch sets the last bit of the offset where a landing pad is, to 1, which will effectively be an odd address and point to the instruction exactly. Differential Revision: https://reviews.llvm.org/D52985 ------------------------------------------------------------------------ llvm-svn: 347028
*	Merging r342884:	Tom Stellard	2018-11-13	1	-0/+189
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r342884 \| petarj \| 2018-09-24 07:14:19 -0700 (Mon, 24 Sep 2018) \| 12 lines [Mips][FastISel] Fix selectBranch on icmp i1 The r337288 tried to fix result of icmp i1 when its input is not sanitized by falling back to DagISel. While it now produces the correct result for bit 0, the other bits can still hold arbitrary value which is not supported by MipsFastISel branch lowering. This patch fixes the issue by falling back to DagISel in this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D52045 ------------------------------------------------------------------------ llvm-svn: 346741
*	Merging r341919:	Tom Stellard	2018-11-13	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r341919 \| atanasyan \| 2018-09-11 02:57:25 -0700 (Tue, 11 Sep 2018) \| 18 lines [mips] Add a pattern for 64-bit GPR variant of the `rdhwr` instruction MIPS ISAs start to support third operand for the `rdhwr` instruction starting from Revision 6. But LLVM generates assembler code with three-operands version of this instruction on any MIPS64 ISA. The third operand is always zero, so in case of direct code generation we get correct code. This patch fixes the bug by adding an instruction alias. The same alias already exists for 32-bit ISA. Ideally, we also need to reject three-operands version of the `rdhwr` instruction in an assembler code if ISA revision is less than 6. That is a task for a separate patch. This fixes PR38861 (https://bugs.llvm.org/show_bug.cgi?id=38861) Differential revision: https://reviews.llvm.org/D51773 ------------------------------------------------------------------------ llvm-svn: 346739
*	Merging r341221:	Tom Stellard	2018-11-13	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r341221 \| atanasyan \| 2018-08-31 08:57:17 -0700 (Fri, 31 Aug 2018) \| 12 lines [mips] Fix `mtc1` and `mfc1` definitions for microMIPS R6 The `mtc1` and `mfc1` definitions in the MipsInstrFPU.td have MMRel, but do not have StdMMR6Rel tags. When these instructions are emitted for microMIPS R6 targets, `Mips::MipsR62MicroMipsR6` nor `Mips::Std2MicroMipsR6` cannot find correct op-codes and as a result the backend uses mips32 variant of the instructions encoding. The patch fixes this problem by adding the StdMMR6Rel tag and check instructions encoding in the test case. Differential revision: https://reviews.llvm.org/D51482 ------------------------------------------------------------------------ llvm-svn: 346737
*	Merging r340932:	Tom Stellard	2018-11-13	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340932 \| atanasyan \| 2018-08-29 07:54:01 -0700 (Wed, 29 Aug 2018) \| 11 lines [mips] Fix microMIPS unconditional branch offset handling MipsSEInstrInfo class defines for internal purpose unconditional branches as Mips::B nad Mips:J even in case of microMIPS code generation. Under some conditions that leads to the bug - for rather long branch which fits to Mips jump instruction offset size, but does not fit to microMIPS jump offset size, we generate 'short' branch and later show an error 'out of range PC16 fixup' after check in the isBranchOffsetInRange routine. Differential revision: https://reviews.llvm.org/D50615 ------------------------------------------------------------------------ llvm-svn: 346736
*	Merging r340931:	Tom Stellard	2018-11-13	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340931 \| atanasyan \| 2018-08-29 07:53:55 -0700 (Wed, 29 Aug 2018) \| 6 lines [mips] Involves microMIPS's jump in the analyzable branch set Involves microMIPS's jump in the analyzable branch set to reduce some code patterns. Differential revision: https://reviews.llvm.org/D50613 ------------------------------------------------------------------------ llvm-svn: 346735
*	Merging r340927:	Tom Stellard	2018-11-13	2	-0/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340927 \| vstefanovic \| 2018-08-29 07:07:14 -0700 (Wed, 29 Aug 2018) \| 14 lines [mips] Prevent shrink-wrap for BuildPairF64, ExtractElementF64 when they use $sp For a certain combination of options, BuildPairF64_{64}, ExtractElementF64{_64} may be expanded into instructions using stack. Add implicit operand $sp for such cases so that ShrinkWrapping doesn't move prologue setup below them. Fixes MultiSource/Benchmarks/MallocBench/cfrac for '--target=mips-img-linux-gnu -mcpu=mips32r6 -mfpxx -mnan=2008' and '--target=mips-img-linux-gnu -mcpu=mips32r6 -mfp64 -mnan=2008 -mno-odd-spreg'. Differential Revision: https://reviews.llvm.org/D50986 ------------------------------------------------------------------------ llvm-svn: 346734
*	Merging r343373:	Tom Stellard	2018-10-19	1	-11/+18
\| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r343373 \| rksimon \| 2018-09-29 06:25:22 -0700 (Sat, 29 Sep 2018) \| 3 lines [X86][SSE] Fixed issue with v2i64 variable shifts on 32-bit targets The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats. ------------------------------------------------------------------------ llvm-svn: 344810
*	Merging r343443:	Tom Stellard	2018-10-19	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r343443 \| ctopper \| 2018-10-01 00:08:41 -0700 (Mon, 01 Oct 2018) \| 9 lines [X86] Stop X86DomainReassignment from creating copies between GR8/GR16 physical registers and k-registers. We can only copy between a k-register and a GR32/GR64 register. This patch detects that the copy will be illegal and prevents the domain reassignment from happening for that closure. This probably isn't the best fix, and we should probably figure out how to handle this correctly. Fixes PR38803. ------------------------------------------------------------------------ llvm-svn: 344804
*	Merging r341642:	Hans Wennborg	2018-09-10	2	-0/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r341642 \| tnorthover \| 2018-09-07 11:21:25 +0200 (Fri, 07 Sep 2018) \| 8 lines ARM: fix Thumb2 CodeGen for ldrex with folded frame-index. Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a proper addressing-mode and tells the rewriter about it so that encodable offsets are exploited and others are rejected. Should fix PR38828. ------------------------------------------------------------------------ llvm-svn: 341783
*	Merging r341512:	Hans Wennborg	2018-09-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r341512 \| ctopper \| 2018-09-06 04:03:14 +0200 (Thu, 06 Sep 2018) \| 7 lines [X86][Assembler] Allow %eip as a register in 32-bit mode for .cfi directives. This basically reverts a change made in r336217, but improves the text of the error message for not allowing IP-relative addressing in 32-bit mode. Fixes PR38826. Patch by Iain Sandoe. ------------------------------------------------------------------------ llvm-svn: 341530
*	Merging r340959:	Hans Wennborg	2018-09-04	1	-14/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340959 \| mareko \| 2018-08-29 22:03:00 +0200 (Wed, 29 Aug 2018) \| 9 lines AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51203 ------------------------------------------------------------------------ llvm-svn: 341351
*	Merging r340417:	Hans Wennborg	2018-08-30	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340417 \| hakzsam \| 2018-08-22 18:08:48 +0200 (Wed, 22 Aug 2018) \| 14 lines AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 ------------------------------------------------------------------------ llvm-svn: 341041
*	Merging r340416:	Hans Wennborg	2018-08-30	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340416 \| hakzsam \| 2018-08-22 18:08:43 +0200 (Wed, 22 Aug 2018) \| 8 lines AMDGPU: fix existing alias rules for constant and global Constant and global may alias, also one rules table wasn't ordered correctly. Pinpointed by Matt. v2: add a test with swapped parameters ------------------------------------------------------------------------ llvm-svn: 341040
*	Merging r340641:	Hans Wennborg	2018-08-27	3	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340641 \| stefanp \| 2018-08-24 21:38:29 +0200 (Fri, 24 Aug 2018) \| 9 lines [Exception Handling] Unwind tables are required for all functions that have an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 ------------------------------------------------------------------------ llvm-svn: 340731
*	Merging r340303:	Hans Wennborg	2018-08-21	1	-0/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340303 \| ctopper \| 2018-08-21 19:15:33 +0200 (Tue, 21 Aug 2018) \| 9 lines [BypassSlowDivision] Teach bypass slow division not to interfere with div by constant where constants have been constant hoisted, but not moved from their basic block DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block. Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all. This fixes the case from PR38649. Differential Revision: https://reviews.llvm.org/D51000 ------------------------------------------------------------------------ llvm-svn: 340359
*	Merging r339674:	Hans Wennborg	2018-08-21	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339674 \| aemerson \| 2018-08-14 14:04:25 +0200 (Tue, 14 Aug 2018) \| 3 lines [GlobalISel][IRTranslator] Fix a bug in handling repeating struct types during argument lowering. Differential Revision: https://reviews.llvm.org/D49442 ------------------------------------------------------------------------ llvm-svn: 340358
*	Merging r339945:	Hans Wennborg	2018-08-17	2	-3/+317
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339945 \| ctopper \| 2018-08-16 23:54:02 +0200 (Thu, 16 Aug 2018) \| 9 lines [X86] In EFLAGS copy pass, don't emit EXTRACT_SUBREG instructions since we're after peephole Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up. To fix this, the eflags pass now emits a COPY with a subreg input. I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs Differential Revision: https://reviews.llvm.org/D50656 ------------------------------------------------------------------------ llvm-svn: 339999
*	Merging r339769:	Hans Wennborg	2018-08-16	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339769 \| nemanjai \| 2018-08-15 14:58:13 +0200 (Wed, 15 Aug 2018) \| 12 lines [PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 ------------------------------------------------------------------------ llvm-svn: 339859
*	Merging r339536:	Hans Wennborg	2018-08-16	1	-12/+47
\| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339536 \| ctopper \| 2018-08-13 08:53:49 +0200 (Mon, 13 Aug 2018) \| 3 lines [SelectionDAG] In PromoteFloatOp_BITCAST, insert a bitcast after the fp_to_fp16 in case the result type isn't a scalar integer. This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary. ------------------------------------------------------------------------ llvm-svn: 339857
*	Merging r339535:	Hans Wennborg	2018-08-16	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339535 \| ctopper \| 2018-08-13 08:53:47 +0200 (Mon, 13 Aug 2018) \| 5 lines [SelectionDAG] In PromoteIntRes_BITCAST, when the input is TypePromoteFloat, make sure the output type is scalar. For vectors, use a store and load of temporary. Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid. This is basically the opposite case of the root cause of PR38533. ------------------------------------------------------------------------ llvm-svn: 339856
*	Merging r339533:	Hans Wennborg	2018-08-16	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339533 \| ctopper \| 2018-08-13 07:26:49 +0200 (Mon, 13 Aug 2018) \| 5 lines [SelectionDAG] In PromoteFloatRes_BITCAST, insert a bitcast before the fp16_to_fp in case the input type isn't an i16. The bitcast can be further legalized as needed. Fixes PR38533. ------------------------------------------------------------------------ llvm-svn: 339855
*	Merging r339600:	Hans Wennborg	2018-08-14	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339600 \| scott.linder \| 2018-08-13 20:44:21 +0200 (Mon, 13 Aug 2018) \| 8 lines [CodeGen] Fix assert in SelectionDAG::computeKnownBits Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR when zero extending the demanded elements mask if it is already as long as the source vector. Differential Revision: https://reviews.llvm.org/D49574 ------------------------------------------------------------------------ llvm-svn: 339664
*	Merging r339225:	Hans Wennborg	2018-08-13	2	-122/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339225 \| thopre \| 2018-08-08 11:35:26 +0200 (Wed, 08 Aug 2018) \| 11 lines Support inline asm with multiple 64bit output in 32bit GPR Summary: Extend fix for PR34170 to support inline assembly with multiple output operands that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR). Reviewers: bogner, t.p.northover, lattner, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, tra, eraman, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45437 ------------------------------------------------------------------------ llvm-svn: 339539
*	Merging r339316:	Hans Wennborg	2018-08-09	1	-0/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339316 \| hahnfeld \| 2018-08-09 09:45:49 +0200 (Thu, 09 Aug 2018) \| 16 lines [NVPTX] Select atomic loads and stores According to PTX ISA .volatile has the same memory synchronization semantics as .relaxed.sys, so it can be used to implement monotonic atomic loads and stores. This is important for OpenMP's atomic construct where - 'read's and 'write's are lowered to atomic loads and stores, and - an update of float or double types are lowered into a cmpxchg loop. (Note that PTX could do better because it has atom.add.f{32,64} but LLVM's atomicrmw instruction only allows integer types.) Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error. Differential Revision: https://reviews.llvm.org/D50391 ------------------------------------------------------------------------ llvm-svn: 339338
*	Merging r339190:	Hans Wennborg	2018-08-08	1	-3/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339190 \| jvesely \| 2018-08-07 23:54:37 +0200 (Tue, 07 Aug 2018) \| 12 lines AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 ------------------------------------------------------------------------ llvm-svn: 339235
*	Merging r338915:	Hans Wennborg	2018-08-07	1	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r338915 \| ctopper \| 2018-08-03 22:14:18 +0200 (Fri, 03 Aug 2018) \| 5 lines [SelectionDAG] Teach LegalizeVectorTypes to widen the mask input to a masked store. The mask operand is visited before the data operand so we need to be able to widen it. Fixes PR38436. ------------------------------------------------------------------------ llvm-svn: 339106
*	Merging r338610:	Hans Wennborg	2018-08-07	2	-18/+10
\| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r338610 \| jvesely \| 2018-08-01 20:36:07 +0200 (Wed, 01 Aug 2018) \| 3 lines AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) ------------------------------------------------------------------------ llvm-svn: 339105
*	Merging r338665:	Hans Wennborg	2018-08-07	1	-3/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r338665 \| lliu0 \| 2018-08-02 03:54:12 +0200 (Thu, 02 Aug 2018) \| 11 lines Fix FCOPYSIGN expansion In expansion of FCOPYSIGN, the shift node is missing when the two operands of FCOPYSIGN are of the same size. We should always generate shift node (if the required shift bit is not zero) to put the sign bit into the right position, regardless of the size of underlying types. Differential Revision: https://reviews.llvm.org/D49973 ------------------------------------------------------------------------ llvm-svn: 339098
*	Merging r338599:	Hans Wennborg	2018-08-03	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r338599 \| vlad.tsyrklevich \| 2018-08-01 19:44:37 +0200 (Wed, 01 Aug 2018) \| 16 lines [X86] FastISel fall back on !absolute_symbol GVs Summary: D25878, which added support for !absolute_symbol for normal X86 ISel, did not add support for materializing references to absolute symbols for X86 FastISel. This causes build failures because FastISel generates PC-relative relocations for absolute symbols. Fall back to normal ISel for references to !absolute_symbol GVs. Fix for PR38200. Reviewers: pcc, craig.topper Reviewed By: pcc Subscribers: hiraditya, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D50116 ------------------------------------------------------------------------ llvm-svn: 338847
*	Merging r338554:	Hans Wennborg	2018-08-02	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r338554 \| bryanpkc \| 2018-08-01 15:50:29 +0200 (Wed, 01 Aug 2018) \| 11 lines [AArch64] Fix FCCMP with FP16 operands Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection. Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50115 ------------------------------------------------------------------------ llvm-svn: 338692
*	Merging r338658:	Hans Wennborg	2018-08-02	1	-204/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r338658 \| nemanjai \| 2018-08-02 02:03:22 +0200 (Thu, 02 Aug 2018) \| 13 lines [PowerPC] Do not round values prior to converting to integer Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector loses precision. This patch removes the code that adds these nodes to true f64 operands. It also adds patterns required to ensure the code is still vectorized rather than converting individual elements and inserting into a vector. Fixes https://bugs.llvm.org/show_bug.cgi?id=38342 Differential Revision: https://reviews.llvm.org/D50121 ------------------------------------------------------------------------ llvm-svn: 338678
*	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero	Ryan Taylor	2018-08-01	1	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523
*	[SystemZ, TableGen] Fix shift count handling	Ulrich Weigand	2018-08-01	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DAG combiner logic to simplify AND masks in shift counts is invalid. While it is true that the SystemZ shift instructions ignore all but the low 6 bits of the shift count, it is still invalid to simplify the AND masks while the DAG still uses the standard shift operators (which are not defined to match the SystemZ instruction behavior). Instead, this patch performs equivalent operations during instruction selection. For completely removing the AND, this now happens via additional DAG match patterns implemented by a multi-alternative PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG patterns were already mostly OK, they just needed an output XForm to actually truncate the immediate value. Unfortunately, the latter change also exposed a bug in TableGen: it seems XForms are currently only handled correctly for direct operands of the outermost operation node. This patch also fixes that bug by simply recurring through the whole pattern. This should be NFC for all other targets. Differential Revision: https://reviews.llvm.org/D50096 llvm-svn: 338521
*	[MIPS GlobalISel] Select global address	Petar Jovanovic	2018-08-01	5	-0/+193
\| \| \| \| \| \| \| \| \| \|	Select G_GLOBAL_VALUE for position dependent code. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49803 llvm-svn: 338499
*	[X86] Adding more test patterns for lea-opt (PR37939)	Jatin Bhateja	2018-08-01	1	-0/+151
\| \| \| \| \| \| \| \|	Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50128 llvm-svn: 338483
*	[x86] Fix a really subtle miscompile due to a somewhat glaring bug in	Chandler Carruth	2018-08-01	1	-0/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EFLAGS copy lowering. If you have a branch of LLVM, you may want to cherrypick this. It is extremely unlikely to hit this case empirically, but it will likely manifest as an "impossible" branch being taken somewhere, and will be ... very hard to debug. Hitting this requires complex conditions living across complex control flow combined with some interesting memory (non-stack) initialized with the results of a comparison. Also, because you have to arrange for an EFLAGS copy to be in just the right place, almost anything you do to the code will hide the bug. I was unable to reduce anything remotely resembling a "good" test case from the place where I hit it, and so instead I have constructed synthetic MIR testing that directly exercises the bug in question (as well as the good behavior for completeness). The issue is that we would mistakenly assume any SETcc with a valid condition and an initial operand that was a register and a virtual register at that to be a register defining SETcc... It isn't though.... This would in turn cause us to test some other bizarre register, typically the base pointer of some memory. Now, testing this register and using that to branch on doesn't make any sense. It even fails the machine verifier (if you are running it) due to the wrong register class. But it will make it through LLVM, assemble, and it looks fine... But wow do you get a very unsual and surprising branch taken in your actual code. The fix is to actually check what kind of SETcc instruction we're dealing with. Because there are a bunch of them, I just test the may-store bit in the instruction. I've also added an assert for sanity that ensure we are, in fact, defining the register operand. =D llvm-svn: 338481
*	[x86/slh] Add unwind info to several tests to make it more obvious that	Chandler Carruth	2018-08-01	1	-12/+48
\| \| \| \| \| \| \| \| \| \| \|	we aren't incorrectly generating any of it when doing SLH. There was a bug that only occured with SLH that very much looked like it could be caused by bad unwind info, and so this was a prime suspect. Turns out that everything is fine, but this way we'll see if we end up, for example, putting things we shouldn't inside the prolog. llvm-svn: 338480
*	[GlobalISel][IRTranslator] Use RPO traversal when visiting blocks to translate.	Amara Emerson	2018-08-01	3	-5/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Previously we were just visiting the blocks in the function in IR order, which is rather arbitrary. Therefore we wouldn't always visit defs before uses, but the translation code relies on this assumption in some places. Only codegen change seen in tests is an elision of a redundant copy. Fixes PR38396 llvm-svn: 338476
*	AMDGPU: Add clamp bit to dot intrinsics	Konstantin Zhuravlyov	2018-08-01	7	-35/+155
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470
*	Revert r338354 "[ARM] Revert r337821"	Reid Kleckner	2018-07-31	3	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452
*	AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests	Matt Arsenault	2018-07-31	4	-443/+667
\| \| \| \| \| \| \|	R600 breaks on too many things to usefully test changes with ieee_mode on vs. off. llvm-svn: 338435
*	AMDGPU: Break 64-bit arguments into 32-bit pieces	Matt Arsenault	2018-07-31	1	-7/+43
\| \| \| \|	llvm-svn: 338421
*	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls	Matt Arsenault	2018-07-31	3	-13/+71
\| \| \| \| \| \| \|	This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418
*	AMDGPU: Scalarize vector argument types to calls	Matt Arsenault	2018-07-31	3	-32/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416
*	[X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151)	Simon Pilgrim	2018-07-31	4	-504/+235
\| \| \| \| \| \| \| \| \| \|	As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering. Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW. Differential Revision: https://reviews.llvm.org/D49562 llvm-svn: 338407
*	[X86] Add pattern matching for PMADDUBSW	Craig Topper	2018-07-31	1	-1788/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned. A C example that triggers this pattern ``` static const int N = 128; int8_t A[2N]; uint8_t B[2N]; int16_t C[N]; void foo() { for (int i = 0; i != N; ++i) C[i] = MIN(MAX((int16_t)A[2i](int16_t)B[2i] + (int16_t)A[2i+1](int16_t)B[2i+1], -32768), 32767); } ``` Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49829 llvm-svn: 338402