bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Strip trailing whitespace (NFCI)	Simon Pilgrim	2016-10-18	1	-1/+1
\| \| \| \|	llvm-svn: 284478
*	[XRay] Support for for tail calls for ARM no-Thumb	Dean Michael Berris	2016-10-18	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456
*	Fix differences in codegen between Linux and Windows toolchains	Mandeep Singh Grang	2016-10-18	2	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are differences in codegen between Linux and Windows due to: 1. Using std::sort which uses quicksort which is a non-stable sort. 2. Iterating over Set data structure where the iteration order is non deterministic. Reviewers: arsenm, grosbach, junbuml, zinob, MatzeB Subscribers: MatzeB, wdng Differential Revision: https://reviews.llvm.org/D25695 llvm-svn: 284441
*	[DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA	Sanjay Patel	2016-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scalar version of this pattern was noted in: https://reviews.llvm.org/D25485 and fixed with: https://reviews.llvm.org/rL284395 More refactoring of the constant/splat helpers is needed and will happen in follow-up patches. Differential Revision: https://reviews.llvm.org/D25685 llvm-svn: 284424
*	[DAG] make isConstOrConstSplat and isConstOrConstSplatFP more accessible; NFC	Sanjay Patel	2016-10-17	2	-38/+34
\| \| \| \| \| \| \| \| \| \| \| \|	As noted in: https://reviews.llvm.org/D25685 This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch. In a minor attempt to keep some structure, we're pulling the FP helper over along with its integer sibling, but clearly we can and should do more refactoring of the similar helper functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality. llvm-svn: 284421
*	Test commit.	Michael LeMay	2016-10-17	1	-1/+1
\| \| \| \|	llvm-svn: 284411
*	[DAG] optimize away an arithmetic-right-shift of a 0 or -1 value	Sanjay Patel	2016-10-17	1	-0/+4
\| \| \| \| \| \| \| \| \|	This came up as part of: https://reviews.llvm.org/D25485 Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors. llvm-svn: 284395
*	[SDAG] Use ABI type alignment for constant pools when optimizing for size	James Molloy	2016-10-17	1	-1/+3
\| \| \| \| \| \| \| \|	SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128. In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding. llvm-svn: 284381
*	[CodeGenPrepare] When moving a zext near to its associated load, do not ↵	Andrea Di Biagio	2016-10-17	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	retain the original debug location. CodeGenPrepare knows how to move a zext of a load into the same basic block where the load lives. The goal is to help ISel match a zero-extending load instead of two separated instructions. CGP attempts to move a zext computation even if it lives in a basic block that does not post-dominate the load's basic block. That means, the hoisted zext may be speculated. Preserving the zext location would hurt the debugging experience and the quality of sample pgo. With this patch, when moving a zext near to its associated load, CGP no longer propagates the zext's debug location. Instead, CGP conservatively reuses the same debug location for the load and the zext. An alternative approach would be to assign an artificial line-0 location to the zext. However we don't want to over-use the 'line-0' for this particular case because it would have a size cost in the line-table section for no additional benefit. Differential Revision: https://reviews.llvm.org/D25611 llvm-svn: 284377
*	[MachineMemOperand] Move synchronization scope and atomic orderings from ↵	Konstantin Zhuravlyov	2016-10-15	6	-74/+57
\| \| \| \| \| \| \| \|	SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG. Differential Revision: https://reviews.llvm.org/D24577 llvm-svn: 284312
*	GlobalISel: rename legalizer components to match others.	Tim Northover	2016-10-14	7	-80/+78
\| \| \| \| \| \| \| \| \| \|	The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287
*	[DAG] avoid creating illegal node when transforming negated shifted sign bit	Sanjay Patel	2016-10-14	1	-2/+3
\| \| \| \| \| \| \| \|	Eli noted this potential bug in the post-commit thread for: https://reviews.llvm.org/rL284239 ...but I'm not sure how to trigger it, so there's no test case yet. llvm-svn: 284268
*	TargetLowering: Add SimplifyDemandedBits() helper to TargetLoweringOpt	Tom Stellard	2016-10-14	1	-2/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The main purpose of this new helper is to enable simplifying operations that have multiple uses. SimplifyDemandedBits does not handle multiple uses currently, and this new function makes it possible to optimize: and v1, v0, 0xffffff mul24 v2, v1, v1 ; Multiply ignoring high 8-bits. To: mul24 v2, v0, v0 Where before this would not be optimized, because v1 has multiple uses. Reviewers: bogner, arsenm Subscribers: nhaehnle, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24964 llvm-svn: 284266
*	Add a pass to optimize patterns of vectorized interleaved memory accesses for	David L Kreitzer	2016-10-14	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260
*	[safestack] Use non-thread-local unsafe stack pointer for Contiki OS	David L Kreitzer	2016-10-14	2	-50/+34
\| \| \| \| \| \| \| \|	Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254
*	Revert "In preparation for removing getNameWithPrefix off of	Eric Christopher	2016-10-14	1	-8/+1
\| \| \| \| \| \| \| \| \|	TargetMachine," as it's causing sanitizer/memory issues until I can track down this set. This reverts commit r284203 llvm-svn: 284252
*	[DAG] add folds for negated shifted sign bit	Sanjay Patel	2016-10-14	1	-0/+13
\| \| \| \| \| \| \| \| \|	The same folds exist in InstCombine already. This came up as part of: https://reviews.llvm.org/D25485 llvm-svn: 284239
*	Fix use-after-frees	Nicolai Haehnle	2016-10-14	1	-2/+2
\| \| \| \| \| \|	Extracted from D25313, as suggested by Justin Bogner. llvm-svn: 284220
*	[DAGCombiner] Teach createBuildVecShuffle to handle cases where input ↵	Craig Topper	2016-10-14	1	-5/+9
\| \| \| \| \| \| \| \|	vectors are less than half of the output vector size. This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code. llvm-svn: 284204
*	In preparation for removing getNameWithPrefix off of TargetMachine,	Eric Christopher	2016-10-14	1	-1/+8
\| \| \| \| \| \| \|	sink the current behavior into the callers and sink TargetMachine::getNameWithPrefix into TargetMachine::getSymbol. llvm-svn: 284203
*	Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to help	Eric Christopher	2016-10-14	1	-1/+1
\| \| \| \| \| \|	readability a bit. llvm-svn: 284202
*	[DAG] hoist DL(N) and fix formatting; NFC	Sanjay Patel	2016-10-13	1	-24/+31
\| \| \| \|	llvm-svn: 284170
*	LegalizeDAG: Implement PROMOTE for ISD::BITREVERSE	Tom Stellard	2016-10-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This operation is promoted the same way was ISD::BSWAP. This will prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16 support is implemented. Reviewers: bogner, hfinkel Subscribers: hfinkel, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D25202 llvm-svn: 284163
*	[safestack] Reapply r283248 after moving X86-targeted SafeStack tests into	David L Kreitzer	2016-10-13	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \|	the X86 subdirectory. Original commit message: Requires a valid TargetMachine to be passed to the SafeStack pass. Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D24896 llvm-svn: 284161
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-10-13	2	-121/+272
\| \| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157
*	[RAGreedy] Empty live-ranges always succeed in last chance recoloring.	Quentin Colombet	2016-10-13	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \|	Relax the constraint for empty live-ranges while doing last chance recoloring. Indeed, those live-ranges do not need an actual color to be fond for the recoloring to work. Empty live-range may happen as a result of splitting/spilling. Unfortunately no test case for in-tree targets. llvm-svn: 284152
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-10-13	2	-272/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151
*	[DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), ↵	Simon Pilgrim	2016-10-13	1	-7/+6
\| \| \| \| \| \|	Y) style combines llvm-svn: 284122
*	[DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding	Simon Pilgrim	2016-10-13	1	-5/+5
\| \| \| \|	llvm-svn: 284117
*	[DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization	Simon Pilgrim	2016-10-13	1	-1/+12
\| \| \| \| \| \|	Improves commutation potential llvm-svn: 284113
*	Handle lane masks in LivePhysRegs when adding live-ins	Krzysztof Parzyszek	2016-10-12	1	-5/+12
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25533 llvm-svn: 284076
*	Create llvm.addressofreturnaddress intrinsic	Albert Gutowski	2016-10-12	4	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows. Reviewers: majnemer, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25293 llvm-svn: 284061
*	[MIRParser] Parse lane masks for register live-ins	Krzysztof Parzyszek	2016-10-12	4	-24/+64
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25530 llvm-svn: 284052
*	Do not remove implicit defs in BranchFolder	Krzysztof Parzyszek	2016-10-12	2	-55/+0
\| \| \| \| \| \| \| \| \| \| \|	Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct. Differential Revision: https://reviews.llvm.org/D25478 llvm-svn: 284036
*	BranchRelaxation: Unique live ins when creating block	Matt Arsenault	2016-10-12	1	-0/+1
\| \| \| \|	llvm-svn: 284018
*	[DAGCombiner] Update most ADD combines to support general vector combines	Simon Pilgrim	2016-10-12	1	-12/+54
\| \| \| \| \| \| \| \|	Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors. Differential Revision: https://reviews.llvm.org/D25374 llvm-svn: 284015
*	[DAGCombiner] Do not remove the load of stored values when optimizations are ↵	Konstantin Zhuravlyov	2016-10-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	disabled This combiner breaks debug experience and should not be run when optimizations are disabled. For example: int main() { int j = 0; j += 2; if (j == 2) return 0; return 5; } When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function. Differential Revision: https://reviews.llvm.org/D19268 llvm-svn: 284014
*	[DAG] Fix crash in build_vector -> vector_shuffle combine	Michael Kuperstein	2016-10-11	1	-0/+5
\| \| \| \| \| \| \| \|	Fixes a crash in the build_vector -> vector_shuffle combine when the first vector input is twice as wide as the output, and the second input vector is even wider. llvm-svn: 283953
*	MIRParser: allow types on registers with a RegBank.	Tim Northover	2016-10-11	1	-1/+2
\| \| \| \| \| \|	This fixes some GlobalISel regression tests. llvm-svn: 283936
*	Codegen: Tail-duplicate during placement.	Kyle Butt	2016-10-11	3	-41/+329
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout. Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block. Issue with early tail-duplication of blocks that branch to a fallthrough predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283934
*	Silence -Wunused-but-set-variable warning	Arnold Schwaighofer	2016-10-11	1	-0/+1
\| \| \| \|	llvm-svn: 283927
*	[DAG] add fold for masked negated sign-extended bool	Sanjay Patel	2016-10-11	1	-5/+11
\| \| \| \| \| \| \|	This enhances the fold added with: https://reviews.llvm.org/rL283900 llvm-svn: 283905
*	[DAG] add fold for masked negated extended bool	Sanjay Patel	2016-10-11	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The non-obvious motivation for adding this fold (which already happens in InstCombine) is that we want to canonicalize IR towards select instructions and canonicalize DAG nodes towards boolean math. So we need to recreate some folds in the DAG to handle that change in direction. An interesting implementation difference for cases like this is that InstCombine generally works top-down while the DAG goes bottom-up. That means we need to detect different patterns. In this case, the SimplifyDemandedBits fold prevents us from performing a zext to sext fold that would then be recognized as a negation of a sext. llvm-svn: 283900
*	[DAG] simplify logic; NFC	Sanjay Patel	2016-10-11	1	-8/+6
\| \| \| \|	llvm-svn: 283885
*	[DAG] hoist DL(N) and fix formatting; NFC	Sanjay Patel	2016-10-11	1	-25/+32
\| \| \| \|	llvm-svn: 283884
*	[DAG] fix formatting; NFC	Sanjay Patel	2016-10-11	1	-72/+68
\| \| \| \|	llvm-svn: 283878
*	Fix formatting in findRegisterUseOperandIdx. NFC.	Fraser Cormack	2016-10-11	1	-7/+5
\| \| \| \|	llvm-svn: 283860
*	Revert "Codegen: Tail-duplicate during placement."	Daniel Jasper	2016-10-11	3	-330/+41
\| \| \| \| \| \| \| \| \|	This reverts commit r283842. test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our internal testing. I'll share a link with you. llvm-svn: 283857
*	Fix warning; NFC	Matthias Braun	2016-10-11	1	-2/+2
\| \| \| \|	llvm-svn: 283851
*	MIRParser: generic register operands with types	Matthias Braun	2016-10-11	2	-2/+3
\| \| \| \| \| \|	This should fix the fallout of r283848. llvm-svn: 283850