bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Fix unnecessary ands when packing f16 vectors	Matt Arsenault	2017-03-15	6	-6/+25
\| \| \| \| \| \| \| \| \|	computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
*	ARM: avoid clobbering register in v6 jump-table expansion.	Tim Northover	2017-03-15	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \|	If we got unlucky with register allocation and actual constpool placement, we could end up producing a tTBB_JT with an index that's already been clobbered. Technically, we might be able to fix this situation up with a MOV, but I think the constant islands pass is complex enough without having to deal with more weird edge-cases. llvm-svn: 297871
*	AMDGPU: Minor SIAnnotateControlFlow cleanups	Matt Arsenault	2017-03-15	1	-31/+35
\| \| \| \| \| \|	Newline fixes, early return, range loops. llvm-svn: 297865
*	[PowerPC][Altivec] Add mfvrd and mffprd extended mnemonic	Nemanja Ivanovic	2017-03-15	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \|	mfvrd and mffprd are both alias to mfvrsd. This patch enables correct parsing of the aliases, but we still emit a mfvrsd. Committing on behalf of brunoalr (Bruno Rosa). Differential Revision: https://reviews.llvm.org/D29177 llvm-svn: 297849
*	Cyle -> Cycle; NFCI	Sanjay Patel	2017-03-15	3	-8/+8
\| \| \| \|	llvm-svn: 297846
*	Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648"	Artyom Skrobov	2017-03-15	1	-4/+4
\| \| \| \| \| \|	This reverts r297820 which apparently fails on A15 hosts. llvm-svn: 297842
*	Reverted unintended commit	Simon Pilgrim	2017-03-15	1	-2/+2
\| \| \| \|	llvm-svn: 297841
*	Fix Wint-in-bool-context warning (PR32248)	Simon Pilgrim	2017-03-15	1	-2/+2
\| \| \| \|	llvm-svn: 297840
*	Reverting r297821 due to breaking lld test.	Sam Parker	2017-03-15	1	-55/+33
\| \| \| \|	llvm-svn: 297838
*	[X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs	Simon Pilgrim	2017-03-15	1	-4/+5
\| \| \| \| \| \| \| \|	Turns out it can happen, so the assertion was too harsh Found during fuzz testing llvm-svn: 297833
*	[Mips] Add support to match more patterns for DEXT and CINS	Petar Jovanovic	2017-03-15	5	-40/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for recognizing more patterns to match to DEXT and CINS instructions. It finds cases where multiple instructions could be replaced with a single DEXT or CINS instruction. For example, for the following: define i64 @dext_and32(i64 zeroext %a) { entry: %and = and i64 %a, 4294967295 ret i64 %and } instead of generating: 0000000000000088 <dext_and32>: 88: 64010001 daddiu at,zero,1 8c: 0001083c dsll32 at,at,0x0 90: 6421ffff daddiu at,at,-1 94: 03e00008 jr ra 98: 00811024 and v0,a0,at 9c: 00000000 nop the following gets generated: 0000000000000068 <dext_and32>: 68: 03e00008 jr ra 6c: 7c82f803 dext v0,a0,0x0,0x20 Cases that are covered: DEXT: 1. and $src, mask where mask > 0xffff 2. zext $src zero extend from i32 to i64 CINS: 1. and (shl $src, pos), mask 2. shl (and $src, mask), pos 3. zext (shl $src, pos) zero extend from i32 to i64 Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D30464 llvm-svn: 297832
*	Align cost model columns. NFCI.	Simon Pilgrim	2017-03-15	1	-4/+4
\| \| \| \|	llvm-svn: 297824
*	[ARM] Fix for branch label disassembly for Thumb	Sam Parker	2017-03-15	1	-33/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Different MCInstrAnalysis classes for arm and thumb mode, each with their own evaluateBranch implementation. I added a test case and fixed the coff-relocations test to use '<label>:' rather than '<label>' in the CHECK-LABEL entries, since the ones without the colon would match branch targets. Might be worth noticing that llvm-objdump does not lookup the relocation and thus assigns it a target depending on the encoded immediate which #0, so it thinks it branches to the next instruction. Committed on behalf of Andre Vieira (avieira). Differential Revision: https://reviews.llvm.org/D30943 llvm-svn: 297821
*	[Thumb1] Fix the bug when adding/subtracting -2147483648	Artyom Skrobov	2017-03-15	1	-4/+4
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D30829 llvm-svn: 297820
*	[ARM] Enable SMLAL[B\|T] isel	Sam Parker	2017-03-15	4	-40/+182
\| \| \| \| \| \| \| \| \| \| \|	Enable the selection of the 64-bit signed multiply accumulate instructions which operate on 16-bit operands. These are enabled for ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb architectures. Differential Revision: https://reviews.llvm.org/D30044 llvm-svn: 297809
*	[globalisel] LLVM_BUILD_GLOBAL_ISEL=OFF should prevent GlobalISel ↵	Daniel Sanders	2017-03-14	2	-0/+5
\| \| \| \| \| \|	instruction selector from being declared. llvm-svn: 297786
*	[globalisel][tblgen] Add support for ComplexPatterns	Daniel Sanders	2017-03-14	6	-3/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds a new kind of MachineOperand: MO_Placeholder. This operand must not appear in the MIR and only exists as a way of creating an 'uninitialized' operand until a matcher function overwrites it. Depends on D30046, D29712 Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30089 llvm-svn: 297782
*	[SelectionDAG] Add a signed integer absolute ISD node	Simon Pilgrim	2017-03-14	6	-152/+56
\| \| \| \| \| \| \| \| \| \| \| \|	Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
*	[WebAssembly] Use LEB encoding for value types	Derek Schuff	2017-03-14	6	-251/+222
\| \| \| \| \| \| \| \| \| \| \| \|	Previously we were using the encoded LEB hex values for the value types. This change uses the decoded negative value and the LEB encoder to write them out. Differential Revision: https://reviews.llvm.org/D30847 Patch by Sam Clegg llvm-svn: 297777
*	Fix asm printing of associated sections.	Evgeniy Stepanov	2017-03-14	1	-1/+2
\| \| \| \| \| \| \|	Make MCSectionELF::AssociatedSection be a link to a symbol, because that's how it works in the assembly, and use it in the asm printer. llvm-svn: 297769
*	[ARM] Replace some C++ selection code with TableGen patterns. NFC.	Eli Friedman	2017-03-14	5	-64/+33
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D30794 llvm-svn: 297768
*	[Hexagon] Fix a condition in HexagonEarlyIfConv.cpp	Krzysztof Parzyszek	2017-03-14	1	-1/+1
\| \| \| \| \| \|	This fixes llvm.org/PR32265. llvm-svn: 297745
*	Fix typo in comment	Artyom Skrobov	2017-03-14	1	-1/+1
\| \| \| \|	llvm-svn: 297742
*	[ARM] Diagnose ARM MOVT without :lower16: or :upper16: expression	Oliver Stannard	2017-03-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This instruction was missing from the list of opcodes that we check, so we were hitting an llvm_unreachable in ARMMCCodeEmitter.cpp for the ARM MOVT instruction, rather than the diagnostic that is emitted for the other MOVW/MOVT instructions. Differential revision: https://reviews.llvm.org/D30936 llvm-svn: 297739
*	De-duplicate the two implementations of ↵	Artyom Skrobov	2017-03-14	1	-13/+5
\| \| \| \| \| \| \| \| \| \| \| \|	ARMBaseInstrInfo::isProfitableToIfCvt() [NFC] Reviewers: congh, rengolin Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D30934 llvm-svn: 297738
*	[ARM] Move SMULW[B\|T] isel to DAG Combine	Sam Parker	2017-03-14	5	-150/+132
\| \| \| \| \| \| \| \| \| \| \| \|	Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716
*	Disable Callee Saved Registers	Oren Ben Simhon	2017-03-14	3	-12/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715
*	[AVX-512] Use iPTR instead of i64 in patterns for ↵	Craig Topper	2017-03-14	1	-6/+6
\| \| \| \| \| \|	extract_subvector/insert_subvector index. llvm-svn: 297707
*	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved	Jonas Paulsson	2017-03-14	2	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
*	[AVX-512] Pre-emptively fix more places in fastisel where we might copy a ↵	Craig Topper	2017-03-14	1	-9/+28
\| \| \| \| \| \|	VK1 register into a AH/BH/CH/DH register. llvm-svn: 297704
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-03-14	2	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695
*	[Thumb1] combine ADDC/SUBC with a negative immediate	Artyom Skrobov	2017-03-13	2	-6/+20
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400 Reviewers: efriedma, jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30829 llvm-svn: 297682
*	[AVX-512] Fix another case where we are copying from a mask register using ↵	Craig Topper	2017-03-13	1	-1/+2
\| \| \| \| \| \| \| \|	AH/BH/CH/DH with fastisel. Fixes PR32256. Still planning to do an audit for other possible cases. llvm-svn: 297678
*	[X86][MMX] Fix folding of shift value loads to cover whole 64-bits	Simon Pilgrim	2017-03-13	2	-21/+0
\| \| \| \| \| \| \| \| \| \| \| \|	rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value. This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source. Found during fuzz testing. Differential Revision: https://reviews.llvm.org/D30833 llvm-svn: 297667
*	Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier	Andrew Kaylor	2017-03-13	3	-37/+24
\| \| \| \| \| \|	I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced. llvm-svn: 297664
*	AMDGPU: Re-use TM.getNullPointerValue	Matt Arsenault	2017-03-13	1	-10/+8
\| \| \| \|	llvm-svn: 297662
*	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering	Matt Arsenault	2017-03-13	2	-8/+14
\| \| \| \|	llvm-svn: 297658
*	[Outliner] Add tail call support	Jessica Paquette	2017-03-13	2	-22/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds tail call support to the MachineOutliner pass. This allows the outliner to insert jumps rather than calls in areas where tail calling is possible. Outlined tail calls include the return or terminator of the basic block being outlined from. Tail call support allows the outliner to take returns and terminators into consideration while finding candidates to outline. It also allows the outliner to save more instructions. For example, in the X86-64 outliner, a tail called outlined function saves one instruction since no return has to be inserted. llvm-svn: 297653
*	[X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input ↵	Craig Topper	2017-03-13	3	-84/+50
\| \| \| \| \| \| \| \|	source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652
*	[AVX-512] If gather mask is all ones, force the input to a zero vector.	Craig Topper	2017-03-13	1	-1/+4
\| \| \| \| \| \| \| \|	We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651
*	[ARM] GlobalISel: Support SP in regbankselect	Diana Picus	2017-03-13	1	-0/+1
\| \| \| \| \| \| \|	We used to hit an unreachable in getRegBankFromRegClass when dealing with the stack pointer. This commit adds support for the GPRsp reg class. llvm-svn: 297621
*	[AArch64] Map Sched Read/Write resources for Falkor.	Balaram Makam	2017-03-13	1	-1/+183
\| \| \| \|	llvm-svn: 297611
*	ARMDisassembler: loop over ARM decode tables	Sjoerd Meijer	2017-03-13	1	-57/+20
\| \| \| \| \| \| \| \| \|	Loop over the ARM decode tables; this is a clean-up to reduce some code duplication. Differential Revision: https://reviews.llvm.org/D30814 llvm-svn: 297608
*	[AVX-512] Add VEX_WIG to VEX vcvtsd2ss/vcvtss2sd intrinsic instructions so ↵	Craig Topper	2017-03-13	1	-8/+8
\| \| \| \| \| \|	they can be correctly matched by EVEX2VEX table generation. llvm-svn: 297601
*	[AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns.	Craig Topper	2017-03-13	1	-3/+2
\| \| \| \|	llvm-svn: 297600
*	[AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns.	Craig Topper	2017-03-13	1	-10/+10
\| \| \| \|	llvm-svn: 297599
*	[X86] Remove unused SDTypeProfile. NFC	Craig Topper	2017-03-12	1	-2/+0
\| \| \| \|	llvm-svn: 297594
*	[X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.	Craig Topper	2017-03-12	3	-46/+17
\| \| \| \| \| \|	This allows us to remove a duplicate set of patterns. llvm-svn: 297593
*	[AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.	Craig Topper	2017-03-12	1	-2/+3
\| \| \| \| \| \|	The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591
*	[x86] don't blindly transform SETB into SBB	Sanjay Patel	2017-03-12	1	-39/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586