bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[XRay] Support for for tail calls for ARM no-Thumb	Dean Michael Berris	2016-10-18	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456
*	[AVX-512] Add test case to check shuffle decoding for masked vpermilps for ↵	Craig Topper	2016-10-18	1	-0/+19
\| \| \| \| \| \| \| \|	r284450. This is harder to do for vpermilpd as shuffle combining turns the constant vector into an immediate since all vpermilpd's inputs with constant vector can also be encoded with the immediate form. llvm-svn: 284455
*	[X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has ↵	Craig Topper	2016-10-18	1	-8/+2
\| \| \| \| \| \| \| \|	a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453
*	[AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool ↵	Craig Topper	2016-10-18	1	-16/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451
*	[AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it	Konstantin Zhuravlyov	2016-10-17	1	-0/+5
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25694 llvm-svn: 284435
*	Next set of additional error checks for invalid Mach-O files for the	Kevin Enderby	2016-10-17	5	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	load commands that use the MachO::sub_framework_command, MachO::sub_umbrella_command, MachO::sub_library_command and MachO::sub_client_command types but are not used in llvm libObject code but used in llvm tool code. This includes the LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_LIBRARY and LC_SUB_CLIENT load commands. llvm-svn: 284431
*	remove FIXME comment (fixed with r284424); NFC	Sanjay Patel	2016-10-17	1	-2/+0
\| \| \| \|	llvm-svn: 284427
*	[DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA	Sanjay Patel	2016-10-17	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scalar version of this pattern was noted in: https://reviews.llvm.org/D25485 and fixed with: https://reviews.llvm.org/rL284395 More refactoring of the constant/splat helpers is needed and will happen in follow-up patches. Differential Revision: https://reviews.llvm.org/D25685 llvm-svn: 284424
*	[opt] Strip coverage if debug info is not present.	Davide Italiano	2016-10-17	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If -coverage is passed, but -g is not, clang populates the PassManager pipeline with StripSymbols(debugOnly = true). The stripSymbol pass therefore scans the list of named metadata, drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD) around. The verifier runs, and finds out that there's a CU not listed in !llvm.dbg.cu (as it was previously dropped) -> crash. When we strip debug info, so, check if there's coverage data, and strip it as well, in order to avoid pending metadata left around. Differential Revision: https://reviews.llvm.org/D25689 llvm-svn: 284418
*	Ignore debug info when making optimization decisions in SimplifyCFG.	Dehao Chen	2016-10-17	1	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Debug info should not affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info. Reviewers: davidxl, mzolotukhin, jmolloy Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D25286 llvm-svn: 284415
*	Handle relocations to thumb functions when dynamic linking COFF modules	Walter Erquinigo	2016-10-17	1	-4/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds the necessary logic to support relocations to thumb functions in the COFF dynamic linker. The jumps to function addresses are mostly blx, which requires the ISA selection bit when jumping to a thumb function. Note: I'm determining if the relocation requires the ISA bit when creating the relocation entries and not when resolving the relocation. I have to do that because I need the ObjectFile and the actual Symbol, which are available only when creating the entries. It would require a gross refactor if I do it otherwise, but I'm okay with doing it if you think it's better. Reviewers: peter.smith, compnerd Subscribers: rengolin, sas Differential Revision: https://reviews.llvm.org/D25151 llvm-svn: 284410
*	GlobalISel: support wider range of load/store sizes in AArch64.	Tim Northover	2016-10-17	1	-0/+160
\| \| \| \|	llvm-svn: 284406
*	AMDGPU/SI: Fix LowerParameter() for i16 arguments	Tom Stellard	2016-10-17	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we are loading an i16 value from a 32-bit memory location, then we need to be able to truncate the loaded value to i16. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25198 llvm-svn: 284397
*	[DAG] optimize away an arithmetic-right-shift of a 0 or -1 value	Sanjay Patel	2016-10-17	1	-3/+0
\| \| \| \| \| \| \| \| \|	This came up as part of: https://reviews.llvm.org/D25485 Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors. llvm-svn: 284395
*	[x86] add tests to show missing DAG folds for arithmetic-shift-right	Sanjay Patel	2016-10-17	1	-0/+44
\| \| \| \|	llvm-svn: 284394
*	[x86] auto-generate checks	Sanjay Patel	2016-10-17	1	-0/+17
\| \| \| \|	llvm-svn: 284393
*	[Object/ELF] - Check Header->e_shoff value earlier and do not crash.	George Rimar	2016-10-17	2	-0/+4
\| \| \| \| \| \| \| \| \|	Patch checks that section pointer is aligned properly. This should be done before getStringTable() call. Differential revision: https://reviews.llvm.org/D25462 llvm-svn: 284387
*	[SDAG] Use ABI type alignment for constant pools when optimizing for size	James Molloy	2016-10-17	1	-0/+19
\| \| \| \| \| \| \| \|	SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128. In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding. llvm-svn: 284381
*	[SimplifyCFG] Don't lower complex ConstantExprs to lookup tables	Oliver Stannard	2016-10-17	1	-0/+40
\| \| \| \| \| \| \| \| \| \|	Not all ConstantExprs can be represented by a global variable, for example most pointer arithmetic other than addition of a constant, so we can't convert these values from switch statements to lookup tables. Differential Revision: https://reviews.llvm.org/D25550 llvm-svn: 284379
*	[SCEV] Consider delinearization pattern with extension with identity factor	Tobias Grosser	2016-10-17	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor. Reviewers: sanjoy, grosser Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser Differential Revision: https://reviews.llvm.org/D16492 llvm-svn: 284378
*	[CodeGenPrepare] When moving a zext near to its associated load, do not ↵	Andrea Di Biagio	2016-10-17	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	retain the original debug location. CodeGenPrepare knows how to move a zext of a load into the same basic block where the load lives. The goal is to help ISel match a zero-extending load instead of two separated instructions. CGP attempts to move a zext computation even if it lives in a basic block that does not post-dominate the load's basic block. That means, the hoisted zext may be speculated. Preserving the zext location would hurt the debugging experience and the quality of sample pgo. With this patch, when moving a zext near to its associated load, CGP no longer propagates the zext's debug location. Instead, CGP conservatively reuses the same debug location for the load and the zext. An alternative approach would be to assign an artificial line-0 location to the zext. However we don't want to over-use the 'line-0' for this particular case because it would have a size cost in the line-table section for no additional benefit. Differential Revision: https://reviews.llvm.org/D25611 llvm-svn: 284377
*	Recommit r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is."	George Rimar	2016-10-17	4	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With fix: hex edited the precompiled inputs from another testcases to pass new checks. Original commit message: [Object/ELF] - Check that e_shnum is null when e_shoff is. Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) : e_shnum This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero. Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540 That was the reason of crash in lld on incorrect input file. Binary reduced using afl-min. Differential revision: https://reviews.llvm.org/D25090 llvm-svn: 284374
*	Revert r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is."	George Rimar	2016-10-17	2	-3/+0
\| \| \| \| \| \| \|	It broke build bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/908/steps/test-stage1-compiler/logs/stdio llvm-svn: 284373
*	[Object/ELF] - Check that e_shnum is null when e_shoff is.	George Rimar	2016-10-17	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) : e_shnum This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero. Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540 That was the reason of crash in lld on incorrect input file. Binary reduced using afl-min. Differential revision: https://reviews.llvm.org/D25090 llvm-svn: 284371
*	[Object/ELF] - Do not crash on invalid section index.	George Rimar	2016-10-17	2	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If object has wrong (large) string table index and also incorrect large value for amount of sections in total, then section index passes the check: if (Index >= getNumSections()) return object_error::invalid_section_index; But result pointer then is far after end of file data, what result in a crash. Differential revision: https://reviews.llvm.org/D25081 llvm-svn: 284369
*	[AVX-512] Add shuffle combining support for vpermi2var shuffles derived from ↵	Craig Topper	2016-10-17	1	-32/+0
\| \| \| \| \| \|	existing support for vpermt2var. llvm-svn: 284357
*	[AVX-512] Add vpermi2var test cases to shuffle combining test case. ↵	Craig Topper	2016-10-17	1	-0/+112
\| \| \| \| \| \|	Combining will be added in a future commit. llvm-svn: 284356
*	[AVX-512] Add support for turning a 256-bit load that goes to both halfs of ↵	Craig Topper	2016-10-16	2	-60/+30
\| \| \| \| \| \| \| \|	an insert_subvector into a subvector broadcast. Differential Revision: https://reviews.llvm.org/D25650 llvm-svn: 284353
*	[AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the ↵	Craig Topper	2016-10-16	2	-9/+9
\| \| \| \| \| \|	other vpermi2var intrinsics. llvm-svn: 284329
*	[AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.	Craig Topper	2016-10-16	7	-90/+90
\| \| \| \|	llvm-svn: 284328
*	[GVN/PRE] Hoist global values outside of loops.	Davide Italiano	2016-10-15	2	-10/+57
\| \| \| \| \| \| \| \| \| \| \|	In theory this could be generalized to move anything where we prove the operands are available, but that would require rewriting PRE. As NewGVN will hopefully come soon, and we're trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably not worth spending too much time on it. Fix provided by Daniel Berlin! llvm-svn: 284311
*	[X86][SSE] Added some basic examples of knownbits failing for vector types	Simon Pilgrim	2016-10-15	1	-0/+119
\| \| \| \| \| \|	computeKnownBits only returns the common bits of each vector element instead of only the elements that are actually used llvm-svn: 284308
*	[X86] Regenerate known bits test	Simon Pilgrim	2016-10-15	1	-4/+18
\| \| \| \|	llvm-svn: 284306
*	[AVX-512] Add shuffle comments for vbroadcast instructions.	Craig Topper	2016-10-15	5	-92/+98
\| \| \| \|	llvm-svn: 284305
*	AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer	Tom Stellard	2016-10-15	1	-5/+62
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25526 llvm-svn: 284298
*	GlobalISel: rename legalizer components to match others.	Tim Northover	2016-10-14	23	-23/+23
\| \| \| \| \| \| \| \| \| \|	The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287
*	PowerPC: specify full triple to avoid different Darwin asm syntax.	Tim Northover	2016-10-14	1	-1/+1
\| \| \| \|	llvm-svn: 284281
*	[ARM] add tests for PR30660	Sanjay Patel	2016-10-14	1	-0/+26
\| \| \| \|	llvm-svn: 284280
*	[PowerPC] add tests for PR30661	Sanjay Patel	2016-10-14	1	-0/+26
\| \| \| \|	llvm-svn: 284279
*	[PPC] Shorter sequence to load 64bit constant with same hi/lo words	Guozhi Wei	2016-10-14	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276
*	AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations	Tom Stellard	2016-10-14	3	-8/+65
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff) Reviewers: arsenm, nhaehnle Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24672 llvm-svn: 284267
*	Add a pass to optimize patterns of vectorized interleaved memory accesses for	David L Kreitzer	2016-10-14	3	-0/+236
\| \| \| \| \| \| \| \| \| \| \| \| \|	X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260
*	AMDGPU/SI: Don't allow unaligned scratch access	Tom Stellard	2016-10-14	3	-17/+43
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257
*	[safestack] Use non-thread-local unsafe stack pointer for Contiki OS	David L Kreitzer	2016-10-14	1	-2/+1
\| \| \| \| \| \| \| \|	Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254
*	[X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵	Pierre Gousseau	2016-10-14	1	-0/+341
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
*	[InstCombine] use m_APInt to allow sub with constant folds for splat vectors	Sanjay Patel	2016-10-14	2	-5/+3
\| \| \| \|	llvm-svn: 284247
*	[InstCombine] add tests for missing vector folds	Sanjay Patel	2016-10-14	2	-0/+33
\| \| \| \|	llvm-svn: 284245
*	[InstCombine] auto-generate checks	Sanjay Patel	2016-10-14	1	-22/+24
\| \| \| \|	llvm-svn: 284244
*	[InstCombine] remove redundant test	Sanjay Patel	2016-10-14	1	-11/+0
\| \| \| \| \| \| \| \|	This test was apparently checking for 2 independent folds, but we have plenty of tests for those individual folds already. We are lacking vector tests, however, because we don't have the shift folds for vectors. llvm-svn: 284243
*	[InstCombine] update test to use FileCheck and auto-generate checks	Sanjay Patel	2016-10-14	1	-71/+148
\| \| \| \|	llvm-svn: 284242