bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AArch64] Optimize floating point materialization	Adhemerval Zanella	2019-02-01	7	-46/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes isFPImmLegal to return if the value can be enconded as the immediate operand of a logical instruction besides checking if for immediate field for fmov. This optimizes some floating point materization, inclusive values used on isinf lowering. Reviewed By: rengolin, efriedma, evandro Differential Revision: https://reviews.llvm.org/D57044 llvm-svn: 352866
*	GlobalISel: Fix MMO creation with non-power-of-2 mem size	Matt Arsenault	2019-01-31	1	-0/+9
\| \| \| \| \| \| \|	It should probably just be mandatory for getTgtMemIntrinsic to return the alignment. llvm-svn: 352817
*	[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS	Sjoerd Meijer	2019-01-31	1	-0/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736
*	GlobalISel: Allow bitcount ops to have different result type	Matt Arsenault	2019-01-31	1	-5/+5
\| \| \| \| \| \|	For AMDGPU the result is always 32-bit for 64-bit inputs. llvm-svn: 352717
*	GlobalISel: Fix creating MMOs with align 0	Matt Arsenault	2019-01-31	6	-29/+29
\| \| \| \|	llvm-svn: 352712
*	[GlobalISel][AArch64] Select G_FEXP	Jessica Paquette	2019-01-30	4	-1/+260
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also allows us to select G_FEXP. It... - Updates the legalizer-info tests - Adds a test for legalizing exp - Updates the existing fp tests to show that we can now select G_FEXP https://reviews.llvm.org/D57483 llvm-svn: 352692
*	[GlobalISel][AArch64] Select G_FABS	Jessica Paquette	2019-01-30	4	-1/+172
\| \| \| \| \| \| \| \| \|	This adds instruction selection support for G_FABS in AArch64. It also updates the existing basic FP tests, adds a selection test for G_FABS. https://reviews.llvm.org/D57418 llvm-svn: 352684
*	[DAGCombiner] sub X, 0/1 --> add X, 0/-1	Sanjay Patel	2019-01-30	1	-5/+2
\| \| \| \| \| \| \| \| \| \|	This extends the existing transform for: add X, 0/1 --> sub X, 0/-1 ...to allow the sibling subtraction fold. This pattern could regress with the proposed change in D57401. llvm-svn: 352680
*	[AArch64][x86] add tests for add/sub signbits fold; NFC	Sanjay Patel	2019-01-30	1	-0/+31
\| \| \| \| \| \| \|	As discussed/shown in D57401, we are missing a fold for subtract of 0/1 --> add 0/-1. llvm-svn: 352678
*	[GlobalISel][AArch64] Add instruction selection support for @llvm.log2	Jessica Paquette	2019-01-30	4	-1/+261
\| \| \| \| \| \| \| \| \| \| \| \| \|	This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters it. It updates the existing floating point tests to show that we don't fall back on the intrinsic, and select the correct instructions. It also adds a legalizer test for G_FLOG2. https://reviews.llvm.org/D57357 llvm-svn: 352673
*	[GlobalISel][AArch64] Add instruction selection support for @llvm.sqrt	Jessica Paquette	2019-01-30	4	-0/+250
\| \| \| \| \| \| \| \| \| \|	This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer test for G_FSQRT, a selection test for it, and updates existing floating point tests. https://reviews.llvm.org/D57361 llvm-svn: 352671
*	[GlobalISel] Add IRTranslator support for @llvm.sqrt -> G_FSQRT	Jessica Paquette	2019-01-30	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	Follow-up commit to https://reviews.llvm.org/D57359. (r352668) This adds IRTranslator support for recognising a @llvm.sqrt intrinsic and translating it into a G_FSQRT. https://reviews.llvm.org/D57360 llvm-svn: 352670
*	[GlobalISel] Introduce a G_FSQRT generic instruction	Jessica Paquette	2019-01-30	1	-0/+3
\| \| \| \| \| \| \| \| \|	This introduces a generic instruction for computing the floating point square root of a value. Right now, we can't select @llvm.sqrt, so this is working towards fixing that. llvm-svn: 352668
*	GlobalISel: Verify pointer casts	Matt Arsenault	2019-01-29	2	-6/+5
\| \| \| \| \| \| \|	Not sure if the old AArch64 tests should be just deleted or not. llvm-svn: 352562
*	GlobalISel: Partially implement widenScalar for MERGE_VALUES	Matt Arsenault	2019-01-29	1	-24/+28
\| \| \| \|	llvm-svn: 352560
*	[AArch64][GlobalISel] Unmerge into scalars from a vector should use FPR bank.	Amara Emerson	2019-01-29	1	-0/+26
\| \| \| \| \| \| \| \| \|	This currently shows up as a selection fallback since the dest regs were given GPR banks but the source was a vector FPR reg. Differential Revision: https://reviews.llvm.org/D57408 llvm-svn: 352545
*	[AArch64] add tests for vector bool math; NFC	Sanjay Patel	2019-01-29	1	-0/+29
\| \| \| \|	llvm-svn: 352519
*	Reversing the checkin for version 352484 as tests are failing.	Ayonam Ray	2019-01-29	1	-124/+0
\| \| \| \|	llvm-svn: 352504
*	[CodeGen] Omit range checks from jump tables when lowering switches with ↵	Ayonam Ray	2019-01-29	1	-0/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review ID: D52002 Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 352484
*	[COFF, ARM64] Don't put jump table into a separate COFF section for ↵	Martin Storsjo	2019-01-29	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EK_LabelDifference32 Windows ARM64 has PIC relocation model and uses jump table kind EK_LabelDifference32. This produces jump table entry as ".word LBB123 - LJTI1_2" which represents the distance between the block and jump table. A new relocation type (IMAGE_REL_ARM64_REL32) is needed to do the fixup correctly if they are in different COFF section. This change saves the jump table to the same COFF section as the associated code. An ideal fix could be utilizing IMAGE_REL_ARM64_REL32 relocation type. Patch by Tom Tan! Differential Revision: https://reviews.llvm.org/D57277 llvm-svn: 352465
*	[GlobalISel][AArch64] Add legalization for G_FLOG	Jessica Paquette	2019-01-28	4	-1/+258
\| \| \| \| \| \| \| \| \| \|	This adds support for legalizing G_FLOG into a RTLib call. It adds a legalizer test, and updates the existing floating point tests. https://reviews.llvm.org/D57347 llvm-svn: 352429
*	[GlobalISel][AArch64] Add instruction selection support for @llvm.log10	Jessica Paquette	2019-01-28	4	-1/+260
\| \| \| \| \| \| \| \| \| \|	This adds instruction selection support for @llvm.log10 in AArch64. It teaches GISel to lower it to a library call, updates the relevant tests, and adds a legalizer test for log10. https://reviews.llvm.org/D57341 llvm-svn: 352418
*	[AArch64] Add 'apple-latest' CPU alias	Francis Visoiu Mistrih	2019-01-28	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'apple-latest' alias is supposed to provide a CPU that contains the latest Apple processor model supported by LLVM. This is supposed to be used by tools like lldb to provide a target that supports most of the CPU features. For now, this is mapped to Cyclone. Differential Revision: https://reviews.llvm.org/D56384 llvm-svn: 352412
*	[GlobalISel] Add ISel support for @llvm.lifetime.start and @llvm.lifetime.end	Jessica Paquette	2019-01-28	4	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds ISel support for lifetime markers in opt levels above O0. It also updates the arm64-irtranslator test, and updates some AArch64 tests that use them for added coverage. It also adds a testcase taken from the X86 codegen tests which verified a bug caused by lifetime markers + stack colouring in the past. This is intended to make sure that GISel doesn't re-introduce the bug. (This is basically a straight copy from what SelectionDAG does in SelectionDAGBuilder.cpp) https://reviews.llvm.org/D57187 llvm-svn: 352410
*	[GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN	Jessica Paquette	2019-01-28	6	-6/+548
\| \| \| \| \| \| \| \| \| \| \| \|	This contains all of the legalizer changes from D57197 necessary to select G_FCOS and G_FSIN. It also updates several existing IR tests in test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN instructions. https://reviews.llvm.org/D57197 3/3 llvm-svn: 352402
*	[GlobalISel][AArch64] Add IRTranslator support for G_FCOS and G_FSIN	Jessica Paquette	2019-01-28	1	-0/+16
\| \| \| \| \| \| \| \| \|	This adds IRTranslator support for the G_FCOS and G_FSIN generic instructions. https://reviews.llvm.org/D57197 2/3 llvm-svn: 352401
*	[GlobalISel] Add G_FSIN and G_FCOS generic instructions	Jessica Paquette	2019-01-28	1	-1/+7
\| \| \| \| \| \| \| \| \| \|	This introduces generic instrutions for floating point sin and cos, G_FCOS and G_FSIN. It updates the tests, etc. https://reviews.llvm.org/D57197 1/3 llvm-svn: 352400
*	[AArch64][GlobalISel] Teach RBS about G_FNEG default mapping.	Amara Emerson	2019-01-28	1	-0/+14
\| \| \| \|	llvm-svn: 352340
*	[AArch64][GlobalISel] Add some missing vector support for FP arithmetic ops.	Amara Emerson	2019-01-28	4	-68/+94
\| \| \| \| \| \| \|	Moved the fneg lowering legalization test from AArch64 to X86, as we want to specify that it's already legal. llvm-svn: 352338
*	[AArch64][GlobalISel] Add some vector support for fp <-> int conversions.	Amara Emerson	2019-01-28	3	-10/+154
\| \| \| \| \| \|	Some unrelated, but benign, test changes as well due to the test update script. llvm-svn: 352337
*	[AArch64][GlobalISel] Fix the G_EXTLOAD combiner creating non-extending ↵	Amara Emerson	2019-01-27	2	-1/+40
\| \| \| \| \| \| \| \| \| \| \| \| \|	illegal instructions. This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any extension, by avoiding touching those < s8 size loads. This bug was uncovered by a verifier update r351584, which I reverted it to keep the bots green. llvm-svn: 352311
*	[GlobalISel][IRTranslator] Fix crash on translation of fneg.	Amara Emerson	2019-01-26	1	-0/+10
\| \| \| \| \| \| \|	When the fneg IR instruction was added the code to do translation wasn't tested, and tried to get an invalid operand. llvm-svn: 352296
*	GlobalISel: Fix address space limit in LLT	Matt Arsenault	2019-01-26	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \|	The IR enforced limit for the address space is 24-bits, but LLT was only using 23-bits. Additionally, the argument to the constructor was truncating to 16-bits. A similar problem still exists for the number of vector elements. The IR enforces no limit, so if you try to use a vector with > 65535 elements the IRTranslator asserts in the LLT constructor. llvm-svn: 352264
*	[GISel]: Change how CSE is enabled by default for each pass	Aditya Nandakumar	2019-01-24	12	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/D57178 Now add a hook in TargetPassConfig to query if CSE needs to be enabled. By default this hook returns false only for O0 opt level but this can be overridden by the target. As a consequence of the default of enabled for non O0, a few tests needed to be updated to not use CSE (by passing in -O0) to the run line. reviewed by: arsenm llvm-svn: 352126
*	[GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil	Jessica Paquette	2019-01-24	4	-1/+306
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for vector @llvm.ceil intrinsics when full 16 bit floating point support isn't available. To do this, this patch... - Implements basic isel for G_UNMERGE_VALUES - Teaches the legalizer about 16 bit floats - Teaches AArch64RegisterBankInfo to respect floating point registers on G_BUILD_VECTOR and G_UNMERGE_VALUES - Teaches selectCopy about 16-bit floating point vectors It also adds - A legalizer test for the 16-bit vector ceil which verifies that we create a G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported - An instruction selection test which makes sure we lower to G_FCEIL when full fp16 is supported - A test for selecting G_UNMERGE_VALUES And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types work as expected. https://reviews.llvm.org/D56682 llvm-svn: 352113
*	[SLH][AArch64] Remove accidentally retained -debug-only line from test.	Kristof Beyls	2019-01-23	1	-1/+0
\| \| \| \|	llvm-svn: 351932
*	[SLH] AArch64: correctly pick temporary register to mask SP	Kristof Beyls	2019-01-23	3	-47/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of speculation hardening, the stack pointer gets masked with the taint register (X16) before a function call or before a function return. Since there are no instructions that can directly mask writing to the stack pointer, the stack pointer must first be transferred to another register, where it can be masked, before that value is transferred back to the stack pointer. Before, that temporary register was always picked to be x17, since the ABI allows clobbering x17 on any function call, resulting in the following instruction pattern being inserted before function calls and returns/tail calls: mov x17, sp and x17, x17, x16 mov sp, x17 However, x17 can be live in those locations, for example when the call is an indirect call, using x17 as the target address (blr x17). To fix this, this patch looks for an available register just before the call or terminator instruction and uses that. In the rare case when no register turns out to be available (this situation is only encountered twice across the whole test-suite), just insert a full speculation barrier at the start of the basic block where this occurs. Differential Revision: https://reviews.llvm.org/D56717 llvm-svn: 351930
*	hwasan: Move memory access checks into small outlined functions on aarch64.	Peter Collingbourne	2019-01-23	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920
*	GlobalISel: Allow shift amount to be a different type	Matt Arsenault	2019-01-22	3	-13/+61
\| \| \| \| \| \| \| \| \|	For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882
*	[AArch64] Add patterns for zext/sext of shift amount.	Eli Friedman	2019-01-22	1	-9/+48
\| \| \| \| \| \| \| \| \|	Not sure this is the best fix, but it saves an instruction for certain constructs involving variable shifts. Differential Revision: https://reviews.llvm.org/D55572 llvm-svn: 351768
*	[AArch64] add more tests for buildvec to shuffle transform; NFC	Sanjay Patel	2019-01-21	1	-0/+419
\| \| \| \| \| \| \| \|	These are copied from the sibling x86 file. I'm not sure which of the current outputs (if any) is considered optimal, but someone more familiar with AArch may want to take a look. llvm-svn: 351754
*	[DAGCombiner] fix crash when converting build vector to shuffle	Sanjay Patel	2019-01-21	1	-0/+22
\| \| \| \| \| \| \| \| \| \|	The regression test is reduced from the example shown in D56281. This does raise a question as noted in the test file: do we want to handle this pattern? I don't have a motivating example for that on x86 yet, but it seems like we could have that pattern there too, so we could avoid the back-and-forth using a shuffle. llvm-svn: 351753
*	GlobalISel: Verify G_BITCAST	Matt Arsenault	2019-01-18	1	-4/+4
\| \| \| \|	llvm-svn: 351594
*	Fix the buildbot failure introduced by r351404	Sanjin Sijaric	2019-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	EXPENSIVE_CHECKS buildbots are failing due to r351404. Add x1 as live in to the funclet basic block for SEH funclets, as well as -verify-machineinstrs to the test case that triggered the failure. llvm-svn: 351472
*	[ARM64][Windows] Share unwind codes between epilogues	Sanjin Sijaric	2019-01-17	2	-3/+228
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are cases where we have multiple epilogues that have the exact same unwind code sequence. In that case, the epilogues can share the same unwind codes in the .xdata section. This should get us past the assert "SEH unwind data splitting not yet implemented" in many cases. We still need to add support for generating multiple .pdata/.xdata sections for those functions that need to be split into fragments. Differential Revision: https://reviews.llvm.org/D56813 llvm-svn: 351421
*	[SEH] [ARM64] Retrieve the frame pointer from SEH funclets	Sanjin Sijaric	2019-01-17	1	-0/+121
\| \| \| \| \| \| \|	The Windows ARM64 runtime passes the establisher frame to funclets as the first argument. llvm-svn: 351404
*	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally	Mandeep Singh Grang	2019-01-16	2	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370
*	[GISel]: Add support for CSEing continuously during GISel passes.	Aditya Nandakumar	2019-01-16	4	-0/+58
\| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/D52803 This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder. llvm-svn: 351283
*	Remove irrelevant references to legacy git repositories from	James Y Knight	2019-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200
*	[AArch64] Adjust the feature set for Exynos	Evandro Menezes	2019-01-15	1	-0/+1
\| \| \| \| \| \|	Enable the fusion of arithmetic and logic instructions for Exynos M4. llvm-svn: 351149