bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add D32039/PR31357 tests to show current BSWAP codegen	Simon Pilgrim	2017-04-19	2	-0/+255
\| \| \| \|	llvm-svn: 300672
*	[X86][SSE] Add scheduling latency/throughput tests for (most) SSE2 instructions	Simon Pilgrim	2017-04-19	1	-0/+6039
\| \| \| \|	llvm-svn: 300671
*	Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"	Renato Golin	2017-04-19	1	-94/+0
\| \| \| \| \| \|	This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668
*	[GlobalISel][X86] Split select tests. NFC.	Igor Breger	2017-04-19	7	-444/+455
\| \| \| \|	llvm-svn: 300666
*	[ARM] GlobalISel: Add support for G_MUL	Diana Picus	2017-04-19	4	-1/+326
\| \| \| \| \| \| \| \|	Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665
*	[GlobalISel] Support vector-of-pointers in LLT	Kristof Beyls	2017-04-19	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300664
*	ARMFrameLowering: Reserve emergency spill slot for large arguments	Matthias Braun	2017-04-19	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \|	We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300639
*	[x86] add tests for potential andn optimization; NFC	Sanjay Patel	2017-04-18	1	-2/+40
\| \| \| \|	llvm-svn: 300617
*	[X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64.	Chih-Hung Hsieh	2017-04-18	2	-0/+59
\| \| \| \| \| \| \| \| \| \|	Android x86_64 target uses f128 type and stores f128 values in %xmm* registers. SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value from f128 to i128. Differential Revision: http://reviews.llvm.org/D32102 llvm-svn: 300583
*	[X86][SSE] Add scheduling latency/throughput tests for (most) SSE1 instructions	Simon Pilgrim	2017-04-18	1	-0/+2415
\| \| \| \|	llvm-svn: 300576
*	[DAG] Improve store merge candidate pruning.	Nirav Dave	2017-04-18	2	-12/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove non-consecutive stores from store merge candidate search as they cannot be merged and will prevent us from finding subsequent mergeable store cases. Reviewers: jyknight, bogner, javed.absar, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32086 llvm-svn: 300561
*	Add base-index-based store merge test	Nirav Dave	2017-04-18	1	-0/+31
\| \| \| \|	llvm-svn: 300559
*	Add store Merge test.	Nirav Dave	2017-04-18	1	-0/+25
\| \| \| \|	llvm-svn: 300551
*	[ARM] Add hardware build attributes in assembler	Oliver Stannard	2017-04-18	1	-234/+227
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the assembler, we should emit build attributes based on the target selected with command-line options. This matches the GNU assembler's behaviour. We only do this for build attributes which describe the hardware that is expected to be available, not the ones that describe ABI compatibility. This is done by moving some of the attribute emission code to ARMTargetStreamer, so that it can be shared between the assembly and code-generation code paths. Since the assembler only creates a MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to check raw features, and not use the convenience functions in ARMSubtarget. If different attributes are later specified using the .eabi_attribute directive, then they will take precedence, as happens when the same .eabi_attribute is specified twice. This must be enabled by an option, because we don't want to do this when parsing inline assembly. The attributes would match the ones emitted at the start of the file, so wouldn't actually change the emitted object file, but the extra directives would be added to every inline assembly block when emitting assembly, which we'd like to avoid. The majority of the changes in the build-attributes.ll test are just re-ordering the directives, because the hardware attributes are now emitted before the ABI ones. However, I did fix one bug which I spotted: Tag_CPU_arch_profile was not being emitted for v6M. Differential revision: https://reviews.llvm.org/D31812 llvm-svn: 300547
*	[ARM] GlobalISel: Add support for G_SUB	Diana Picus	2017-04-18	5	-0/+329
\| \| \| \| \| \| \|	Support G_SUB throughout the GlobalISel pipeline. It is exactly the same as G_ADD, nothing fancy. llvm-svn: 300546
*	Revert "[GlobalISel] Support vector-of-pointers in LLT"	Kristof Beyls	2017-04-18	1	-16/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts r300535 and r300537. The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll produces slightly different code between LLVM versions being built with different compilers. E.g., dependent on the compiler LLVM is built with, either one of the following can be produced: remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement) remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement) Non-determinism like this is clearly a bad thing, so reverting this until I can find and fix the root cause of the non-determinism. llvm-svn: 300538
*	[ARM] Check for correct HW div when lowering divmod	Diana Picus	2017-04-18	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For subtargets that use the custom lowering for divmod, e.g. gnueabi, we used to check if the subtarget has hardware divide and then lower to a div-mul-sub sequence if true, or to a libcall if false. However, judging by the usage of hasDivide vs hasDivideInARMMode, it seems that hasDivide only refers to Thumb. For instance, in the ARMTargetLowering constructor, the code that specifies whether to use libcalls for (S\|U)DIV looks like this: bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide() : Subtarget->hasDivideInARMMode(); In the case of divmod for arm-gnueabi, using only hasDivide() to determine what to do means that instead of lowering to __aeabi_idivmod to get the remainder, we lower to div-mul-sub and then further lower the div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but not in Thumb, we generate a libcall instead of using it (this is not an issue in practice since AFAICT none of the cores that we support have hardware divide in ARM but not Thumb). This patch fixes the code dealing with custom lowering to take into account the mode (Thumb or ARM) when deciding whether or not hardware division is available. Differential Revision: https://reviews.llvm.org/D32005 llvm-svn: 300536
*	[GlobalISel] Support vector-of-pointers in LLT	Kristof Beyls	2017-04-18	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300535
*	Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mir	Haicheng Wu	2017-04-17	2	-94/+105
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D32037 llvm-svn: 300506
*	[WebAssembly] Fix WebAssemblyOptimizeReturned after r300367	Jacob Gravelle	2017-04-17	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to paramHasAttr(i). Add more tests to WebAssemblyOptimizeReturned that catch that regression. Reviewers: dschuff Subscribers: jfb, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D32136 llvm-svn: 300502
*	AMDGPU: Use MachineRegisterInfo to find max used register	Matt Arsenault	2017-04-17	3	-16/+51
\| \| \| \| \| \| \| \| \| \|	Avoid looping through program to determine register counts. This avoids needing to look at regmask operands. Also fixes some counting errors with flat_scr when there are no stack objects. llvm-svn: 300482
*	AArch64: put nonlazybind special handling behind a flag for now.	Tim Northover	2017-04-17	1	-1/+9
\| \| \| \| \| \| \| \|	It's basically a terrible idea anyway but objc_msgSend gets emitted like that. We can decide on a better way to deal with it in the unlikely event that anyone actually uses it. llvm-svn: 300474
*	AArch64: support nonlazybind	Tim Northover	2017-04-17	1	-0/+32
\| \| \| \| \| \| \| \|	It's almost certainly not a good idea to actually use it in most cases (there's a pretty large code size overhead on AArch64), but we can't do those experiments until it's supported. llvm-svn: 300462
*	[X86] Remove special handling for 16 bit for A asm constraints.	Benjamin Kramer	2017-04-16	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	Our 16 bit support is assembler-only + the terrible hack that is .code16gcc. Simply using 32 bit registers does the right thing for the latter. Fixes PR32681. llvm-svn: 300429
*	Use correct registers for "A" inline asm constraint	Dimitry Andric	2017-04-15	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In PR32594, inline assembly using the 'A' constraint on x86_64 causes llvm to crash with a "Cannot select" stack trace. This is because `X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A' means the EAX and EDX registers. However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86 (ia16?) it means the old AX and DX registers. Add new register classes in `X86RegisterInfo.td` to support these cases, and amend the logic in `getRegForInlineAsmConstraint` to cope with different subtargets. Also add a test case, derived from PR32594. Reviewers: craig.topper, qcolombet, RKSimon, ab Reviewed By: ab Subscribers: ab, emaste, royger, llvm-commits Differential Revision: https://reviews.llvm.org/D31902 llvm-svn: 300404
*	[AMDGPU] set read_only access qualifier for pointers	Stanislav Mekhanoshin	2017-04-14	1	-0/+33
\| \| \| \| \| \| \| \| \| \|	If a kernel's pointer argument is known to be readonly set access qualifier accordingly. This allows RT not to flush caches before dispatches. Differential Revision: https://reviews.llvm.org/D32091 llvm-svn: 300362
*	[X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)	Simon Pilgrim	2017-04-14	4	-33/+51
\| \| \| \| \| \| \| \| \| \|	MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. Clang companion patch: D31766. Differential Revision: https://reviews.llvm.org/D31767 llvm-svn: 300325
*	Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.	Andrew V. Tischenko	2017-04-14	1	-0/+22
\| \| \| \| \| \|	Patch by Dinar Temirbulatov llvm-svn: 300314
*	This patch closes PR#32216: Better testing of schedule model instruction ↵	Andrew V. Tischenko	2017-04-14	2	-595/+631
\| \| \| \| \| \| \| \|	latencies/throughputs. The details are here: https://reviews.llvm.org/D30941 llvm-svn: 300311
*	[AMDGPU] added SIInstrInfo::getAddNoCarry() helper	Stanislav Mekhanoshin	2017-04-14	1	-119/+146
\| \| \| \| \| \| \| \|	Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288
*	[AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16	Adam Nemet	2017-04-13	3	-4/+85
\| \| \| \| \| \| \| \| \| \| \| \|	This further improves Ahmed's change in rL299482. See the new comment for the rationale. The patch recovers most of the regression for bzip2 after D31965. We're down to +2.68% from +6.97%. Differential Revision: https://reviews.llvm.org/D32028 llvm-svn: 300276
*	AMDGPU/GFX9: Do not use v_pack_b32_f16 when packing	Konstantin Zhuravlyov	2017-04-13	5	-57/+37
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D31819 llvm-svn: 300275
*	[bpf] Fix memory offset check for loads and stores	Alexei Starovoitov	2017-04-13	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the offset cannot fit into the instruction, an addition to the pointer is emitted before the actual access. However, BPF offsets are 16-bit but LLVM considers them to be, for the matter of this check, to be 32-bit long. This causes the following program: int bpf_prog1(void ign) { volatile unsigned long t = 0x8983984739ull; return (unsigned long )((0xffffffff8fff0002ull) + t); } To generate the following (wrong) code: 0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00 r1 = 590618314553ll 2: 7b 1a f8 ff 00 00 00 00 (u64 )(r10 - 8) = r1 3: 79 a1 f8 ff 00 00 00 00 r1 = (u64 )(r10 - 8) 4: 79 10 02 00 00 00 00 00 r0 = (u64 *)(r1 + 2) 5: 95 00 00 00 00 00 00 00 exit Fix it by changing the offset check to 16-bit. Patch by Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Differential Revision: https://reviews.llvm.org/D32055 llvm-svn: 300269
*	[AMDGPU] Combine DS operations with offsets bigger than byte	Stanislav Mekhanoshin	2017-04-13	1	-0/+385
\| \| \| \| \| \| \| \| \|	In many cases ds operations can be combined even if offsets do not fit into 8 bit encoding. What it takes is to adjust base address. Differential Revision: https://reviews.llvm.org/D31993 llvm-svn: 300227
*	[Hexagon] Unxfail passing tests	Krzysztof Parzyszek	2017-04-13	2	-4/+0
\| \| \| \| \| \| \|	r300198 fixed a problem that caused two tests to be xfailed. Unxfail these tests now, since they are passing. llvm-svn: 300203
*	AMDGPU : Fix common dominator of two incoming blocks terminates with uniform ↵	Wei Ding	2017-04-12	4	-6/+65
\| \| \| \| \| \| \| \|	branch issue. Differential Revision: http://reviews.llvm.org/D31350 llvm-svn: 300142
*	AMDGPU: Fix invalid copies when copying i1 to phys reg	Matt Arsenault	2017-04-12	1	-0/+36
\| \| \| \| \| \| \|	Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
*	[AMDGPU] Generate range metadata for workitem id	Stanislav Mekhanoshin	2017-04-12	14	-55/+141
\| \| \| \| \| \| \| \| \|	If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
*	[GlobalIsel][X86] support G_CONSTANT selection.	Igor Breger	2017-04-12	3	-0/+224
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: [GlobalISel][X86] support G_CONSTANT selection. Add regbank select tests. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: llvm-commits, dberris, rovka, kristof.beyls Differential Revision: https://reviews.llvm.org/D31974 llvm-svn: 300057
*	[AMDGPU] SDWA: make pass global	Sam Kolton	2017-04-12	1	-18/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Remove checks for basic blocks. Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31935 llvm-svn: 300040
*	[AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.	Kannan Narayanan	2017-04-12	2	-5/+0
\| \| \| \| \| \|	Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
*	AMDGPU: Insert wait at start of callee functions	Matt Arsenault	2017-04-11	2	-1/+26
\| \| \| \|	llvm-svn: 300000
*	AMDGPU: Refactor SIMachineFunctionInfo slightly	Matt Arsenault	2017-04-11	1	-7/+7
\| \| \| \| \| \|	Prepare for handling non-entry functions. llvm-svn: 299999
*	AMDGPU: Refactor argument lowering	Matt Arsenault	2017-04-11	1	-2/+3
\| \| \| \| \| \| \|	Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
*	AMDGPU: Fix folding reg_sequence into copy to phys reg	Matt Arsenault	2017-04-11	1	-0/+13
\| \| \| \| \| \| \|	This was producing an illegal reg_sequence defining a physical register with virtual register inputs. llvm-svn: 299997
*	[DAGCombine] Add more test cases for shuffle of splat. NFC.	Zvi Rackover	2017-04-11	1	-0/+56
\| \| \| \| \| \|	Tests added contain splat-masks with undef elements. llvm-svn: 299988
*	[x86] Relax the check in areLoadsFromSameBasePtr	Easwaran Raman	2017-04-11	1	-4/+4
\| \| \| \| \| \| \| \| \|	Check if the scale operand is identical (doesn't have to be 1) and do not check the chaain operand. Differential revision: https://reviews.llvm.org/D31833 llvm-svn: 299986
*	MIR: Allow parsing of empty machine functions	Justin Bogner	2017-04-11	8	-41/+17
\| \| \| \| \| \| \| \| \| \| \| \|	If you run llc -stop-after=codegenprepare and feed the resulting MIR to llc -start-after=codegenprepare, you'll have an empty machine function since we haven't run any isel yet. Of course, this only works if the MIRParser believes you that this is okay. This is essentially a revert of r241862 with a fix for the problem it was papering over. llvm-svn: 299975
*	[X86] Create the correct ADC/SBB SDNode when lowering add.	Davide Italiano	2017-04-11	1	-0/+27
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D31911 llvm-svn: 299973
*	[AMDGPU] Add A5 to data layout for amdgiz environment	Yaxun Liu	2017-04-11	2	-2/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964