bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	add skylake	Clement Courbet	2017-04-21	1	-2/+3
\| \| \| \|	llvm-svn: 300962
*	add 32 bit tests	Clement Courbet	2017-04-21	1	-8/+10
\| \| \| \|	llvm-svn: 300961
*	use repmovsb when optimizing forminsize	Clement Courbet	2017-04-21	1	-0/+26
\| \| \| \|	llvm-svn: 300960
*	Rename FastString flag.	Clement Courbet	2017-04-21	1	-2/+2
\| \| \| \|	llvm-svn: 300959
*	add more tests	Clement Courbet	2017-04-21	1	-0/+4
\| \| \| \|	llvm-svn: 300958
*	X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies	Clement Courbet	2017-04-21	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \|	when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957
*	[Thumb1] The recently added tADCS and tSBCS pseudo-instructions were missing ↵	Artyom Skrobov	2017-04-21	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`Uses = [CPSR]` Summary: Thanks to Oliver Stannard for helping catch this. Reviewers: olista01, efriedma Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31815 llvm-svn: 300951
*	Revert r300932 and r300930.	Akira Hatanaka	2017-04-21	1	-64/+0
\| \| \| \| \| \| \| \| \|	It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940
*	[AArch64] Improve code generation for logical instructions taking	Akira Hatanaka	2017-04-21	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930
*	Revert "[AArch64] Improve code generation for logical instructions taking"	Akira Hatanaka	2017-04-20	1	-64/+0
\| \| \| \| \| \| \| \|	This reverts r300913. This broke bots. llvm-svn: 300916
*	[AArch64] Improve code generation for logical instructions taking	Akira Hatanaka	2017-04-20	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913
*	ARM: lower "fence singlethread" to a pure compiler barrier.	Tim Northover	2017-04-20	1	-0/+16
\| \| \| \| \| \| \| \|	Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300904
*	ARM: handle post-indexed NEON ops where the offset isn't the access width.	Tim Northover	2017-04-20	7	-63/+121
\| \| \| \| \| \| \| \| \| \| \|	Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878
*	CodeGen: Let frame index value type match alloca addr space	Yaxun Liu	2017-04-20	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently alloca address space has been added to data layout. Due to this change, pointer returned by alloca may have different size as pointer in address space 0. However, currently the value type of frame index is assumed to be of the same size as pointer in address space 0. This patch fixes that. Most targets assume alloca returning pointer in address space 0, which is the default alloca address space. Therefore it is NFC for them. AMDGCN target with amdgiz environment requires this change since it assumes alloca returning pointer to addr space 5 and its size is 32, which is different from the size of pointer in addr space 0 which is 64. Differential Revision: https://reviews.llvm.org/D32021 llvm-svn: 300864
*	[mips][msa] Mask vectors holding shift amounts	Petar Jovanovic	2017-04-20	2	-0/+631
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Masked vectors which hold shift amounts when creating the following nodes: ISD::SHL, ISD::SRL or ISD::SRA. Instructions that use said nodes, which have had their arguments altered are sll, srl, sra, bneg, bclr and bset. For said instructions, the shift amount or the bit position that is specified in the corresponding vector elements will be interpreted as the shift amount/bit position modulo the size of the element in bits. The problem lies in compiling with -O2 enabled, where the instructions for formats .w and .d are not generated, but are instead optimized away. In this case, having shift amounts that are either negative or greater than the element bit size results in generation of incorrect results when constant folding. We remedy this by masking the operands for the nodes mentioned above before actually creating them, so that the final result is correct before placed into the constant pool. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D31331 llvm-svn: 300839
*	Temporarily revert r299221 to fix nondeterminism in ThinLTO builder.	Galina Kistanova	2017-04-19	1	-11/+17
\| \| \| \|	llvm-svn: 300783
*	X86FrameLowering: Fix getFrameIndexReference() for 'fixed' objects	Matthias Braun	2017-04-19	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \|	Debug information is calculated with getFrameIndexReference() which was missing some logic for the fixed object cases (= parameters on the stack). rdar://24557797 Differential Revision: https://reviews.llvm.org/D32204 llvm-svn: 300781
*	[DAG] add splat vector support for 'or' in SimplifyDemandedBits	Sanjay Patel	2017-04-19	2	-19/+15
\| \| \| \| \| \| \| \| \| \| \|	I've changed one of the tests to not fold away, but we didn't and still don't do the transform that the comment claims we do (and I don't know why we'd want to do that). Follow-up to: https://reviews.llvm.org/rL300725 https://reviews.llvm.org/rL300763 llvm-svn: 300772
*	[DAG] add splat vector support for 'xor' in SimplifyDemandedBits	Sanjay Patel	2017-04-19	5	-45/+36
\| \| \| \| \| \| \| \| \|	This allows forming more 'not' ops, so we get improvements for ISAs that have and-not. Follow-up to: https://reviews.llvm.org/rL300725 llvm-svn: 300763
*	ARMFrameLowering: Reserve emergency spill slot for large arguments	Matthias Braun	2017-04-19	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-commit after revert in r300668. Changed getMaxFPOffset() to a more conservative heuristic instead of trying to be clever and missing for some exotic calling conventions. We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300761
*	AMDGPU: Custom lower illegal small select types	Matt Arsenault	2017-04-19	1	-117/+272
\| \| \| \| \| \| \|	Promote them to i32 vectors to avoid unpacking and re-packing the vectors. llvm-svn: 300754
*	[ARM] Use TableGen patterns to select vtbl. NFC.	Eli Friedman	2017-04-19	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749
*	Update the madd.ll test with utils/update_llc_test_checks.py (NFC)	Dehao Chen	2017-04-19	1	-48/+264
\| \| \| \|	llvm-svn: 300740
*	PR32710: Disable using PMADDWD for unsigned short.	Dehao Chen	2017-04-19	1	-5/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: PMADDWD can only handle signed short. Reviewers: mkuper, wmi Reviewed By: mkuper Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D32236 llvm-svn: 300737
*	AMDGPU: Don't emit amd_kernel_code_t for callable functions	Matt Arsenault	2017-04-19	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \|	This is inserted directly in the text section. The relocation for the function ends up resolving to the beginning of the amd_kernel_code_t header rather than the actual function entry point. Also skip some of the comments for initialization that only makes sense for kernels. llvm-svn: 300736
*	StructurizeCFG: Directly invert cmp instructions	Matt Arsenault	2017-04-19	4	-104/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The most common case for a branch condition is a single use compare. Directly invert the branch predicate rather than adding a lot of xor i1 true which the DAG will have to fold later. This produces nicer to read structurizer output. This produces some random changes in codegen due to the DAG swapping branch conditions itself, and then does a poor job of dealing with those inverts. llvm-svn: 300732
*	ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin.	Tim Northover	2017-04-19	1	-0/+24
\| \| \| \|	llvm-svn: 300726
*	[DAG] add splat vector support for 'and' in SimplifyDemandedBits	Sanjay Patel	2017-04-19	3	-29/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch itself is simple: stop discriminating against vectors in visitAnd() and again in SimplifyDemandedBits(). Some notes for reference: 1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions. Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in. 2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could make similar simultaneous changes in other places if we think that would be better. 3. I don't know what the intent of the changed tests in this patch was supposed to be, but since they wiggled in a positive way, I'm just going with that. :) 4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253. 5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards improving the vector codegen in that patch without writing any actual new code. Differential Revision: https://reviews.llvm.org/D32230 llvm-svn: 300725
*	AMDGPU: Don't align callable functions to 256	Matt Arsenault	2017-04-19	2	-0/+31
\| \| \| \|	llvm-svn: 300720
*	AMDGPU: Change DivergenceAnalysis for function arguments	Matt Arsenault	2017-04-19	1	-588/+738
\| \| \| \| \| \|	Stop assuming all functions are kernels. llvm-svn: 300719
*	[Hexagon] Generate proper offset in opt-addr-mode	Krzysztof Parzyszek	2017-04-19	1	-0/+25
\| \| \| \| \| \| \| \| \|	Also, make a few changes to allow using the pass in .mir testcases. Among other things, change the abbreviation from opt-amode to amode-opt, because otherwise lit would expand the "opt" part to the full path to the opt binary. llvm-svn: 300707
*	[PowerPC] add test and auto-generate checks; NFC	Sanjay Patel	2017-04-19	1	-19/+33
\| \| \| \|	llvm-svn: 300700
*	[ARM] add test and auto-generate checks; NFC	Sanjay Patel	2017-04-19	1	-122/+440
\| \| \| \|	llvm-svn: 300698
*	[AVR] Remove the 'multibyte' asm test	Dylan McKay	2017-04-19	1	-135/+0
\| \| \| \| \| \|	It tests registers which are not actually used on AVR. llvm-svn: 300684
*	[AVR] Fix the test suite	Dylan McKay	2017-04-19	4	-35/+40
\| \| \| \| \| \| \| \| \| \| \| \|	A bunch of tests failed because memory operations have been reordered. I am unsure which commit changed this behaviour as the AVR build was failing at that point with an unrelated error. This commit just reoders some of the CHECK lines in some tests to suit current llc output. llvm-svn: 300682
*	[GlobalIsel][X86] support G_TRUNC selection.	Igor Breger	2017-04-19	4	-0/+299
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: [GlobalIsel][X86] support G_TRUNC selection. Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported. Reviewers: ab, zvi, rovka Reviewed By: zvi Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32115 llvm-svn: 300678
*	[X86] Add D32039/PR31357 tests to show current BSWAP codegen	Simon Pilgrim	2017-04-19	2	-0/+255
\| \| \| \|	llvm-svn: 300672
*	[X86][SSE] Add scheduling latency/throughput tests for (most) SSE2 instructions	Simon Pilgrim	2017-04-19	1	-0/+6039
\| \| \| \|	llvm-svn: 300671
*	Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"	Renato Golin	2017-04-19	1	-94/+0
\| \| \| \| \| \|	This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668
*	[GlobalISel][X86] Split select tests. NFC.	Igor Breger	2017-04-19	7	-444/+455
\| \| \| \|	llvm-svn: 300666
*	[ARM] GlobalISel: Add support for G_MUL	Diana Picus	2017-04-19	4	-1/+326
\| \| \| \| \| \| \| \|	Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665
*	[GlobalISel] Support vector-of-pointers in LLT	Kristof Beyls	2017-04-19	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300664
*	ARMFrameLowering: Reserve emergency spill slot for large arguments	Matthias Braun	2017-04-19	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \|	We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300639
*	[x86] add tests for potential andn optimization; NFC	Sanjay Patel	2017-04-18	1	-2/+40
\| \| \| \|	llvm-svn: 300617
*	[X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64.	Chih-Hung Hsieh	2017-04-18	2	-0/+59
\| \| \| \| \| \| \| \| \| \|	Android x86_64 target uses f128 type and stores f128 values in %xmm* registers. SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value from f128 to i128. Differential Revision: http://reviews.llvm.org/D32102 llvm-svn: 300583
*	[X86][SSE] Add scheduling latency/throughput tests for (most) SSE1 instructions	Simon Pilgrim	2017-04-18	1	-0/+2415
\| \| \| \|	llvm-svn: 300576
*	[DAG] Improve store merge candidate pruning.	Nirav Dave	2017-04-18	2	-12/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove non-consecutive stores from store merge candidate search as they cannot be merged and will prevent us from finding subsequent mergeable store cases. Reviewers: jyknight, bogner, javed.absar, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32086 llvm-svn: 300561
*	Add base-index-based store merge test	Nirav Dave	2017-04-18	1	-0/+31
\| \| \| \|	llvm-svn: 300559
*	Add store Merge test.	Nirav Dave	2017-04-18	1	-0/+25
\| \| \| \|	llvm-svn: 300551
*	[ARM] Add hardware build attributes in assembler	Oliver Stannard	2017-04-18	1	-234/+227
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the assembler, we should emit build attributes based on the target selected with command-line options. This matches the GNU assembler's behaviour. We only do this for build attributes which describe the hardware that is expected to be available, not the ones that describe ABI compatibility. This is done by moving some of the attribute emission code to ARMTargetStreamer, so that it can be shared between the assembly and code-generation code paths. Since the assembler only creates a MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to check raw features, and not use the convenience functions in ARMSubtarget. If different attributes are later specified using the .eabi_attribute directive, then they will take precedence, as happens when the same .eabi_attribute is specified twice. This must be enabled by an option, because we don't want to do this when parsing inline assembly. The attributes would match the ones emitted at the start of the file, so wouldn't actually change the emitted object file, but the extra directives would be added to every inline assembly block when emitting assembly, which we'd like to avoid. The majority of the changes in the build-attributes.ll test are just re-ordering the directives, because the hardware attributes are now emitted before the ABI ones. However, I did fix one bug which I spotted: Tag_CPU_arch_profile was not being emitted for v6M. Differential revision: https://reviews.llvm.org/D31812 llvm-svn: 300547