bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Merged Reverse/Alternate shuffle cost tables. NFCI.	Simon Pilgrim	2017-01-04	1	-141/+81
\| \| \| \| \| \|	As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956
*	[framelowering] Skip dbg values when getting next/previous instruction.	Florian Hahn	2017-01-04	1	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: aprantl, MatzeB, mkuper Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290955
*	[LLC][MIPS] Fix crash after enabling LLVM_ENABLE_EXPENSIVE_CHECKS	Nitesh Jain	2017-01-04	2	-0/+8
\| \| \| \| \| \| \| \| \|	Reviewers: sdardis, vkalintiris Subscribers: jaydeep, slthakur, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27841 llvm-svn: 290949
*	[X86][AVX512] Passing the appropriate memory operand class to ↵	Ayman Musa	2017-01-04	2	-26/+43
\| \| \| \| \| \| \| \| \| \|	INT_{U}COMIS{S\|D} instructions Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly. Differential Revision: https://reviews.llvm.org/D28138 llvm-svn: 290948
*	[X86] Attempt to pre-truncate arithmetic operations if useful	Simon Pilgrim	2017-01-04	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 llvm-svn: 290947
*	[AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit ↵	Craig Topper	2017-01-04	1	-3/+33
\| \| \| \| \| \| \| \|	subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946
*	[AVX-512] Simplify code for creating 512-bit SHUF128 operations.	Craig Topper	2017-01-04	1	-18/+11
\| \| \| \| \| \|	We don't need two loops and we can safely assume assume and hardcode the size of the widened mask. llvm-svn: 290942
*	[Hexagon, TableGen] Fix some Clang-tidy modernize and Include What You Use ↵	Eugene Zelenko	2017-01-04	12	-385/+301
\| \| \| \| \| \|	warnings; other minor fixes (NFC). llvm-svn: 290925
*	[X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle ↵	Craig Topper	2017-01-03	1	-22/+17
\| \| \| \| \| \|	to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle. llvm-svn: 290872
*	[AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector ↵	Craig Topper	2017-01-03	1	-30/+7
\| \| \| \| \| \|	inserts and avoid calling isShuffleEquivalent on a widened mask. llvm-svn: 290871
*	[AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles ↵	Craig Topper	2017-01-03	1	-0/+39
\| \| \| \| \| \|	corresponding to 256-bit subvector inserts. llvm-svn: 290870
*	[AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT ↵	Craig Topper	2017-01-03	1	-0/+16
\| \| \| \| \| \|	instructions. llvm-svn: 290869
*	[X86] Remove trailing whitespace and an unnecessary line wrap. NFC	Craig Topper	2017-01-03	1	-37/+35
\| \| \| \|	llvm-svn: 290867
*	[X86] Fix header comment. NFC	Craig Topper	2017-01-03	1	-1/+1
\| \| \| \|	llvm-svn: 290866
*	[AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to ↵	Craig Topper	2017-01-03	1	-0/+23
\| \| \| \| \| \|	select a masked operation. llvm-svn: 290865
*	[AVX-512] Remove vinsert intrinsics and autoupgrade to native ↵	Craig Topper	2017-01-03	2	-34/+2
\| \| \| \| \| \|	shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864
*	[AVX-512] Remove vextract intrinsics and autoupgrade to native ↵	Craig Topper	2017-01-03	1	-27/+0
\| \| \| \| \| \| \| \|	shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal. Hopefully we can improve that in future patches. llvm-svn: 290863
*	[XRay] Merge instrumentation point table emission code into AsmPrinter.	Dean Michael Berris	2017-01-03	6	-150/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: No need to have this per-architecture. While there, unify 32-bit ARM's behaviour with what changed elsewhere and start function names lowercase as per the coding standards. Individual entry emission code goes to the entry's own class. Fully tested on amd64, cross-builds on both ARMs and PowerPC. Reviewers: dberris Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28209 llvm-svn: 290858
*	Fixed shuffle-reverse cost on AVX-512.	Elena Demikhovsky	2017-01-02	1	-0/+1
\| \| \| \| \| \|	(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812
*	AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.	Elena Demikhovsky	2017-01-02	2	-9/+250
\| \| \| \| \| \| \| \| \| \| \| \|	X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
*	[AVR] Optimize 16-bit ANDs with '1'	Dylan McKay	2016-12-31	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR 31345 Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28186 llvm-svn: 290778
*	Caught a simple typo. I do not know of a way to test this, but it seems like ↵	Aaron Ballman	2016-12-30	1	-1/+1
\| \| \| \| \| \|	an unlikely thing to regress in the future. llvm-svn: 290757
*	[AVR] Optimize 16-bit ORs with '0'	Dylan McKay	2016-12-30	1	-12/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR 31344 Authored by Anmol P. Paralkar Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28121 llvm-svn: 290732
*	Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"	Reid Kleckner	2016-12-29	2	-26/+0
\| \| \| \| \| \| \| \|	This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714
*	[AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive	Artem Tamazov	2016-12-29	1	-12/+17
\| \| \| \| \| \| \| \| \| \| \|	Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710
*	[COFF] Use 32-bit jump table entries in .rdata for Win64	Reid Kleckner	2016-12-29	2	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694
*	This is a large patch for X86 AVX-512 of an optimization for reducing code ↵	Gadi Haber	2016-12-28	7	-0/+1384
\| \| \| \| \| \| \| \| \| \| \| \|	size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663
*	[AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.	Chad Rosier	2016-12-27	1	-3/+5
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27953 llvm-svn: 290609
*	[AMDGPU][llvm-mc] Predefined symbols to access register counts ↵	Artem Tamazov	2016-12-27	1	-7/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(.kernel.{v\|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
*	[AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions	Sam Kolton	2016-12-27	3	-6/+37
\| \| \| \| \| \| \| \| \| \|	Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
*	[AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load ↵	Craig Topper	2016-12-27	1	-2/+27
\| \| \| \| \| \|	folding tables. llvm-svn: 290591
*	[AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them ↵	Craig Topper	2016-12-27	1	-12/+0
\| \| \| \| \| \|	to unmasked intrinsics plus a select. llvm-svn: 290583
*	[AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can ↵	Craig Topper	2016-12-27	1	-0/+2
\| \| \| \| \| \| \| \|	add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573
*	[AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div ↵	Craig Topper	2016-12-27	1	-0/+22
\| \| \| \| \| \|	into masked instructions. llvm-svn: 290564
*	[AVX-512] Fix some patterns to use extended register classes.	Craig Topper	2016-12-26	1	-64/+73
\| \| \| \|	llvm-svn: 290536
*	[AVX-512] Don't assume that the rounding mode argument to intrinsics is a ↵	Craig Topper	2016-12-26	1	-16/+17
\| \| \| \| \| \| \| \|	constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532
*	revert commit 290516	Michael Zuckerman	2016-12-25	1	-1/+0
\| \| \| \|	llvm-svn: 290517
*	Commit try added new empty line	Michael Zuckerman	2016-12-25	1	-0/+1
\| \| \| \|	llvm-svn: 290516
*	AMDGPU: split ret/noret patterns for global atomics	Jan Vesely	2016-12-23	3	-22/+52
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
*	[AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit)	Renato Golin	2016-12-23	2	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit (W-unit in AArch64SchedA57.td, the same as cryptography instructions), not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA). This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW instead of A57UnitX. Also, latencies for those instructions are corrected. Patch by Andrew Zhogin. llvm-svn: 290426
*	Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.	Florian Hahn	2016-12-23	1	-5/+0
\| \| \| \|	llvm-svn: 290425
*	[framelowering] Skip dbg values when getting next/previous instruction.	Florian Hahn	2016-12-23	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: mkuper, MatzeB, aprantl Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290423
*	[WebAssembly] Annotate call and load/store immediates.	Dan Gohman	2016-12-23	4	-26/+36
\| \| \| \| \| \|	These will be used to guide the binary encoding of these immediates. llvm-svn: 290412
*	Enable '-Wstring-conversion' and fix some bad asserts that it helped	Chandler Carruth	2016-12-23	1	-1/+1
\| \| \| \| \| \| \| \|	find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
*	[AArch64][CallLowering] Constraint registers on target specific instruction	Quentin Colombet	2016-12-22	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \|	The InstructionSelect pass will not look at target specific instructions since they are already selected. As a result, the operands of target specific instructions must be properly constrained, because it is not going to fix them. This fixes invalid register classes on call instruction. llvm-svn: 290377
*	AMDGPU: Invert cmp + select with constant	Matt Arsenault	2016-12-22	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
*	[Hexagon] Add DAG mutations for machine pipeliner	Krzysztof Parzyszek	2016-12-22	2	-0/+9
\| \| \| \|	llvm-svn: 290366
*	Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.	Wei Mi	2016-12-22	1	-8/+4
\| \| \| \| \| \| \| \| \|	This is for splitMergedValStore in DAG Combine to share the target query interface with similar logic in CodeGenPrepare. Differential Revision: https://reviews.llvm.org/D24707 llvm-svn: 290363
*	[mips] Fix compact branch hazard detection, part 2	Petar Jovanovic	2016-12-22	1	-22/+15
\| \| \| \| \| \| \| \| \| \| \|	Follow up to D27209 fix, this patch now properly handles single transient instruction in basic block. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D27856 llvm-svn: 290361
*	AMDGPU: Use i16 for i16 shift amount	Matt Arsenault	2016-12-22	2	-8/+10
\| \| \| \|	llvm-svn: 290351