bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[WinEH] Make FuncletLayout more robust against catchret	David Majnemer	2015-10-01	1	-2/+59
\| \| \| \| \| \| \| \| \|	Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052
*	[SystemZ] Add some generic (floating point support) load instructions.	Jonas Paulsson	2015-10-01	8	-14/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add generic instructions for load complement, load negative and load positive for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so give scheduler more freedom. SystemZElimCompare pass will convert them when it can to the CC-setting variants. Regression tests updated to expect the new opcodes in places where the old ones where used. New test case SystemZ/fp-cmp-05.ll checks that SystemZCompareElim.cpp can handle the new opcodes. README.txt updated (bullet removed). Note that fp128 is not yet handled, because it is relatively rare, and is a bit trickier, because of the fact that l.dfr would operate on the sign bit of one of the subregisters of a fp128, but we would not want to copy the other sub-reg in case src and dst regs are not the same. Reviewed by Ulrich Weigand. llvm-svn: 249046
*	Fix printing of 64 bit values and make test more strict.	Rafael Espindola	2015-10-01	1	-12/+26
\| \| \| \|	llvm-svn: 249043
*	AMDGPU: Add MEM_RAT STORE_TYPED.	Tom Stellard	2015-10-01	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042
*	Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"	NAKAMURA Takumi	2015-10-01	3	-107/+4
\| \| \| \| \| \|	It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll llvm-svn: 249032
*	[InstCombine] Remove trivially empty lifetime start/end ranges.	Arnaud A. de Grandmaison	2015-10-01	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Some passes may open up opportunities for optimizations, leaving empty lifetime start/end ranges. For example, with the following code: void foo(char , char ); void bar(int Size, bool flag) { for (int i = 0; i < Size; ++i) { char text[1]; char buff[1]; if (flag) foo(text, buff); // BBFoo } } the loop unswitch pass will create 2 versions of the loop, one with flag==true, and the other one with flag==false, but always leaving the BBFoo basic block, with lifetime ranges covering the scope of the for loop. Simplify CFG will then remove BBFoo in the case where flag==false, but will leave the lifetime markers. This patch teaches InstCombine to remove trivially empty lifetime marker ranges, that is ranges ending right after they were started (ignoring debug info or other lifetime markers in the range). This fixes PR24598: excessive compile time after r234581. Reviewers: reames, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13305 llvm-svn: 249018
*	[SystemZ] Add assembly instructions for obtaining clock values as well as ↵	Ulrich Weigand	2015-10-01	2	-1/+129
\| \| \| \| \| \| \| \| \| \| \|	CPU features Provide assembler support for STCK, STCKF, STCKE, and STFLE. Author: joncmu Differential Revision: http://reviews.llvm.org/D13299 llvm-svn: 249015
*	[mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions	Zoran Jovanovic	2015-10-01	6	-0/+23
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D10337 llvm-svn: 249004
*	[ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer	Scott Douglass	2015-10-01	1	-0/+27
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D13240 llvm-svn: 249002
*	[NaryReassociate] SeenExprs records WeakVH	Jingyue Wu	2015-10-01	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The instructions SeenExprs records may be deleted during rewriting. FindClosestMatchingDominator should ignore these deleted instructions. Fixes PR24301. Reviewers: grosser Subscribers: grosser, llvm-commits Differential Revision: http://reviews.llvm.org/D13315 llvm-svn: 248983
*	Update sample profile propagation algorithm.	Dehao Chen	2015-10-01	4	-109/+207
\| \| \| \| \| \|	http://reviews.llvm.org/D13218 llvm-svn: 248968
*	[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.	Ahmed Bougacha	2015-10-01	1	-0/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The custom code produces incorrect results if later reassociated. Since r221657, on x86, vNi32 uitofp is lowered using an optimized sequence: movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...] pand %xmm0, %xmm1 por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...] psrld $16, %xmm0 por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...] addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...] addps %xmm1, %xmm0 Since r240361, the machine combiner opportunistically reassociates 2-instruction sequences (with -ffast-math). In the new code sequence, the ADDPS' are eligible. In isolation, for simple examples (without reassociable users), this makes no performance difference (the goal being to enable reassociation of longer chains). In the trivial example (just one uitofp), the reassociation doesn't happen, because (I think) it would require the emission of a separate movaps for a constantpool load (instead of folding it into addps). However, when we have multiple uitofp sequences, and the constantpool loads are CSE'd earlier, the machine combiner can do the reassociation. When the ADDPS' are reassociated, the resulting sequence isn't correct anymore, as we'd be adding large (239) constants with comparatively smaller values (~223). Given that two of the three inputs are powers of 2 larger than 216, and that ulp(239) == 2(39-24) == 215, the reassociated chain will produce 0 for any input in [0, 214[. In my testing, it also produces wrong results for 99.5% of [0, 232[. Avoid this by disabling the new lowering when -ffast-math. It does mean that we'll get slower code than without it, but at least we won't get egregiously incorrect code. One might argue that, considering -ffast-math is all but meaningless, uitofp producing wrong results isn't a compiler bug. But it really is. Fixes PR24512. ...though this is really more of a workaround. Ideally, we'd have some sort of Machine FMF, but that's a problem that's not worth tackling until we do more with machine IR. llvm-svn: 248965
*	[WinEH] Emit int3 after noreturn calls on Win64	Reid Kleckner	2015-09-30	3	-4/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Win64 unwinder disassembles forwards from each PC to try to determine if this PC is in an epilogue. If so, it skips calling the EH personality function for that frame. Typically, this means you cannot catch an exception in the same frame that you threw it, because 'throw' calls a noreturn runtime function. Previously we avoided this problem with the TrapUnreachable TargetOption, but that's a much bigger hammer than we need. All we need is a 1 byte non-epilogue instruction right after the call. Instead, what we got was an unconditional branch to a shared block containing the ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which added TrapUnreachable, and replaces it with something better. The new code pattern matches for invoke/call followed by unreachable and inserts an int3 into the DAG. To be 100% watertight, we would need to insert SEH_Epilogue instructions into all basic blocks ending in a call with no terminators or successors, but in practice this is unlikely to come up. llvm-svn: 248959
*	[x86] enable machine combiner reassociations for 256-bit vector logical ↵	Sanjay Patel	2015-09-30	1	-2/+46
\| \| \| \| \| \|	integer insts llvm-svn: 248955
*	[AArch64] Remove an unnecessary run line and other cleanup. NFC.	Chad Rosier	2015-09-30	2	-99/+95
\| \| \| \| \| \| \|	Unscaled load/store combining has been enabled since the initial ARM64 port. No need for a redundance run. Also, add CHECK-LABEL directives. llvm-svn: 248945
*	[SLP] Don't vectorize loads of non-packed types (like i1, i2).	Michael Zolotukhin	2015-09-30	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943
*	Move dw_op_minus test to DebugInfo/X86.	Evgeniy Stepanov	2015-09-30	1	-0/+0
\| \| \| \| \| \| \| \|	The test requires X86 target support, and checks the actual debug info contents, including register numbers which would be different on other platforms. llvm-svn: 248938
*	Fix debug info with SafeStack.	Evgeniy Stepanov	2015-09-30	2	-0/+167
\| \| \| \|	llvm-svn: 248933
*	[AArch64] Remove an unnecessary restriction on pre-index instructions.	Chad Rosier	2015-09-30	8	-20/+205
\| \| \| \| \| \| \| \|	Previously, the index was constrained to the size of the memory operation for no apparent reason. This change removes that constraint so that we can form pre-index instructions with any valid offset. llvm-svn: 248931
*	[PowerPC] Disable shrink wrapping	Hal Finkel	2015-09-30	1	-0/+1
\| \| \| \| \| \| \|	Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for now until the problem can be fixed. llvm-svn: 248924
*	SLPVectorizer: add a test to check if the minimum region size works.	Erik Eckstein	2015-09-30	1	-1/+28
\| \| \| \| \| \|	This is an addition to rL248917. llvm-svn: 248923
*	[ARM] Support for ARMv6-Z / ARMv6-ZK missing	Artyom Skrobov	2015-09-30	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121 TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK. In particular, there were no tests for TrustZone being supported in these architectures. The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes his test case which hadn't really been testing what it was claiming to test. Differential Revision: http://reviews.llvm.org/D13236 llvm-svn: 248921
*	SLPVectorizer: limit the scheduling region size per basic block.	Erik Eckstein	2015-09-30	1	-0/+66
\| \| \| \| \| \| \| \| \| \| \|	Usually large blocks are not a problem. But if a large block (> 10k instructions) contains many (potential) chains of vector instructions, and those chains are spread over a wide range of instructions, then scheduling becomes a compile time problem. This change introduces a limit for the accumulate scheduling region size of a block. For real-world functions this limit will never be exceeded (it's about 10x larger than the maximum value seen in the test-suite and external test suite). llvm-svn: 248917
*	[InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin ↵	Andrea Di Biagio	2015-09-30	1	-0/+267
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913
*	[ARM][NEON] Use address space in vld([1234]\|[234]lane) and ↵	Jeroen Ketema	2015-09-30	42	-502/+662
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vst([1234]\|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887
*	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP ↵	Simon Pilgrim	2015-09-30	7	-2/+1234
\| \| \| \| \| \| \| \| \| \| \| \|	shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878
*	Add unittest for new samle profile format.	Dehao Chen	2015-09-30	2	-0/+115
\| \| \| \| \| \|	http://reviews.llvm.org/D13145 llvm-svn: 248870
*	http://reviews.llvm.org/D13145	Dehao Chen	2015-09-30	13	-68/+68
\| \| \| \| \| \|	Support hierarachical sample profile format. llvm-svn: 248865
*	[safestack] Fix a stupid mix-up in the direct-tls code path.	Evgeniy Stepanov	2015-09-30	2	-6/+38
\| \| \| \|	llvm-svn: 248863
*	[WinEH] Setup RBP correctly in Win64 funclet prologues	Reid Kleckner	2015-09-29	4	-21/+30
\| \| \| \| \| \| \|	Previously local variable captures just didn't work in 64-bit. Now we can access local variables more or less correctly. llvm-svn: 248857
*	[WinEH] Ensure that funclets obey the x64 ABI	David Majnemer	2015-09-29	3	-12/+18
\| \| \| \| \| \| \| \| \| \|	The x64 ABI requires that epilogues do not contain code other than stack adjustments and some limited control flow. However, we'd insert code to initialize the return address after stack adjustments. Instead, insert EAX/RAX with the current value before we create the stack adjustments in the epilogue. llvm-svn: 248839
*	HHVM calling conventions.	Maksim Panchenko	2015-09-29	2	-0/+248
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HHVM calling convention, hhvmcc, is used by HHVM JIT for functions in translated cache. We currently support LLVM back end to generate code for X86-64 and may support other architectures in the future. In HHVM calling convention any GP register could be used to pass and return values, with the exception of R12 which is reserved for thread-local area and is callee-saved. Other than R12, we always pass RBX and RBP as args, which are our virtual machine's stack pointer and frame pointer respectively. When we enter translation cache via hhvmcc function, we expect the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed to standard ABI alignment. This affects stack object alignment and stack adjustments for function calls. One extra calling convention, hhvm_ccc, is used to call C++ helpers from HHVM's translation cache. It is almost identical to standard C calling convention with an exception of first argument which is passed in RBP (before we use RDI, RSI, etc.) Differential Revision: http://reviews.llvm.org/D12681 llvm-svn: 248832
*	Fix test from r248825.	Chad Rosier	2015-09-29	1	-1/+1
\| \| \| \|	llvm-svn: 248827
*	[AArch64] Add support for pre- and post-index LDPSWs.	Chad Rosier	2015-09-29	1	-0/+31
\| \| \| \|	llvm-svn: 248825
*	[WinEH] Teach AsmPrinter about funclets	David Majnemer	2015-09-29	10	-32/+164
\| \| \| \| \| \| \| \| \| \| \|	Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824
*	[llvm-pdbdump] Add include-only filters.	Zachary Turner	2015-09-29	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PDB files have a lot of noise in them, with hundreds (or thousands) of symbols from system libraries and compiler generated types. If you're only looking for a specific type, this can be problematic. This CL allows you to display only types, variables, or compilands matching a particular pattern. These filters can even be combined with exclude filters. Include-only filters are given priority, so that first the set of items to display is limited only to those that match the include filters, and then the set of exclude filters is applied to those. If there are no include filters specified, then it means "display everything". llvm-svn: 248822
*	[AArch64] Add integer pre- and post-index halfword/byte loads and stores.	Chad Rosier	2015-09-29	1	-0/+156
\| \| \| \|	llvm-svn: 248817
*	Revert r248810 which breaks tests.	Dehao Chen	2015-09-29	2	-249/+0
\| \| \| \|	llvm-svn: 248814
*	http://reviews.llvm.org/D13231	Dehao Chen	2015-09-29	2	-0/+249
\| \| \| \| \| \|	Change lookup functions to const functions. llvm-svn: 248810
*	[ValueTracking] Teach isKnownNonZero about monotonically increasing PHIs	James Molloy	2015-09-29	1	-0/+49
\| \| \| \| \| \| \| \|	If a PHI starts at a non-negative constant, monotonically increases (only adds of a constant are supported at the moment) and that add does not wrap, then the PHI is known never to be zero. llvm-svn: 248796
*	Arguments spilled on the stack before a function call may have	Jeroen Ketema	2015-09-29	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786
*	[InstCombine] Improve Vector Demanded Bits Through Bitcasts	Simon Pilgrim	2015-09-29	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \|	Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements. This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector). This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts. Differential Revision: http://reviews.llvm.org/D12935 llvm-svn: 248784
*	[WebAssembly] Rename test files to match platform naming conventions.	Dan Gohman	2015-09-29	4	-0/+0
\| \| \| \|	llvm-svn: 248783
*	[LoopUnswitch] Add block frequency analysis to recognize hot/cold regions	Chen Li	2015-09-29	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default. Reviewers: broune, silvas, dnovillo, reames Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D11605 llvm-svn: 248777
*	Move dbg.declare intrinsics when merging and replacing allocas.	Evgeniy Stepanov	2015-09-29	2	-3/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Place new and update dbg.declare calls immediately after the corresponding alloca. Current code in replaceDbgDeclareForAlloca puts the new dbg.declare at the end of the basic block. LLVM codegen has problems emitting debug info in a situation when dbg.declare appears after all uses of the variable. This usually kinda works for inlining and ASan (two users of this function) but not for SafeStack (see the pending change in http://reviews.llvm.org/D13178). llvm-svn: 248769
*	[WinEH] Fix ip2state table emission with funclets	Reid Kleckner	2015-09-28	2	-33/+82
\| \| \| \| \| \| \|	Previously we were hijacking the old LandingPadInfo data structures to communicate our state numbers. Now we don't need that anymore. llvm-svn: 248763
*	[SCEV] Don't crash on pointer comparisons	Sanjoy Das	2015-09-28	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \|	`ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the operand type of the comparison it is given to an `IntegerType`. This is incorrect because it could actually be simplifying a comparison between two pointers. Switch it to using `getTypeSizeInBits` instead, which does the right thing for both pointers and integers. Fixed PR24956. llvm-svn: 248743
*	AMDGPU: Fix splitting x16 SMRD loads	Matt Arsenault	2015-09-28	1	-0/+43
\| \| \| \| \| \| \| \|	When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741
*	AMDGPU: Fix moving SMRD loads with literal offsets on CI	Matt Arsenault	2015-09-28	1	-1/+98
\| \| \| \|	llvm-svn: 248740
*	AMDGPU: Add testcases	Matt Arsenault	2015-09-28	1	-0/+119
\| \| \| \| \| \| \|	Make sure we are testing moving users of the moved and split SMRD loads. llvm-svn: 248738