bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AArch64] Unify the integer min/max vector selection patterns with the ↵	Silviu Baranga	2015-08-26	2	-52/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	intrinsic ones Summary: This change lowers the aarch64 integer vector min/max intrinsic nodes to generic min/max nodes and replaces the intrinsic selection patterns with the generic ones. There should already be testing in place for this, so no further tests were added. Reviewers: jmolloy Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12276 llvm-svn: 246030
*	FastISel: Use finishCondBranch() for ARM,Mips,PowerPC FastISel	Matthias Braun	2015-08-26	3	-10/+5
\| \| \| \| \| \|	Note that after this change branch probabilities are preserved now. llvm-svn: 245998
*	FastISel: Factor out common code; NFC intended	Matthias Braun	2015-08-26	2	-69/+10
\| \| \| \| \| \| \| \| \|	This should be no functional change but for the record: For three cases in X86FastISel this will change the order in which the FalseMBB and TrueMBB of a conditional branch is addedd to the successor/predecessor lists. llvm-svn: 245997
*	WebAssembly: add small FIXME for AsmPrinter.	JF Bastien	2015-08-26	1	-0/+1
\| \| \| \| \| \|	Suggested by @sunfish as a follow-up to r245982. llvm-svn: 245996
*	Make variable argument intrinsics behave correctly in a Win64 CC function.	Charles Davis	2015-08-25	1	-10/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change makes the variable argument intrinsics, `llvm.va_start` and `llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch I have to add support for GCC's `__builtin_ms_va_list` constructs. Reviewers: nadav, asl, eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1622 llvm-svn: 245990
*	WebAssembly: assert that there aren't any constant pools	JF Bastien	2015-08-25	1	-0/+7
\| \| \| \| \| \|	WebAssembly will either use globals or immediates, since it's a virtual ISA. llvm-svn: 245989
*	WebAssembly: emit `(func (param t) (result t))` s-expressions	JF Bastien	2015-08-25	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Match spec format: https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wasm Reviewers: sunfish Subscribers: llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D12307 llvm-svn: 245986
*	WebAssembly: comment out .globl when printing textual assembly	JF Bastien	2015-08-25	1	-1/+4
\| \| \| \| \| \|	Do the same for .weak (not implemented for now, but may as well to it). Update comment string to two semicolons. llvm-svn: 245982
*	make fast unaligned memory accesses implicit with SSE4.2 or SSE4a	Sanjay Patel	2015-08-25	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-on from the discussion in http://reviews.llvm.org/D12154. This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops. A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example: $ clang -O1 foo.c -mavx This resolves a difference in lowering noted in PR24449: https://llvm.org/bugs/show_bug.cgi?id=24449 Before this patch, we used different store types depending on whether the example can be lowered as a memset or not. Differential Revision: http://reviews.llvm.org/D12288 llvm-svn: 245950
*	[X86] Remove references to _ftol2	Michael Kuperstein	2015-08-25	5	-54/+0
\| \| \| \| \| \| \|	As of r245924, _ftol2 is no longer used for fptoui on MS platforms. Remove the dead code associated with it. llvm-svn: 245925
*	[X86] Fix fptoui conversions	Michael Kuperstein	2015-08-25	2	-69/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes two issues in x86 fptoui lowering. 1) Makes conversions from f80 go through the right path on AVX-512. 2) Implements an inline sequence for fptoui i64 instead of a library call. This improves performance by 6X on SSE3+ and 3X otherwise. Incidentally, it also removes the use of ftol2 for fptoui, which was wrong to begin with, as ftol2 converts to a signed i64, producing wrong results for values >= 2^63. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D11316 llvm-svn: 245924
*	Pass function attributes instead of boolean in isIntDivCheap().	Steve King	2015-08-25	2	-2/+4
\| \| \| \|	llvm-svn: 245921
*	Revert "Fix LLVM C API for DataLayout"	Mehdi Amini	2015-08-25	1	-8/+22
\| \| \| \| \| \| \| \|	This reverts commit 433bfd94e4b7e3cc3f8b08f8513ce47817941b0c. Broke some bot, have to see why it passed locally. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245917
*	Fix LLVM C API for DataLayout	Mehdi Amini	2015-08-25	1	-22/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We removed access to the DataLayout on the TargetMachine and deprecated the C API function LLVMGetTargetMachineData() in r243114. However the way I tried to be backward compatible was broken: I changed the wrapper of the TargetMachine to be a structure that includes the DataLayout as well. However the TargetMachine is also wrapped by the ExecutionEngine, in the more classic way. A client using the TargetMachine wrapped by the ExecutionEngine and trying to get the DataLayout would break. It seems tricky to solve the problem completely in the C API implementation. This patch tries to address this backward compatibility in a more lighter way in the C++ API. The C API is restored in its original state and the removed C++ API is reintroduced, but privately. The C API is friended to the TargetMachine and should be the only consumer for this API. Reviewers: ributzka Differential Revision: http://reviews.llvm.org/D12263 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245916
*	[PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends	Hal Finkel	2015-08-24	1	-5/+8
\| \| \| \| \| \| \| \|	We might end up with a trivial copy as the addend, and if so, we should ignore the corresponding FMA instruction. The trivial copy can be coalesced away later, so there's nothing to do here. We should not, however, assert. Fixes PR24544. llvm-svn: 245907
*	MachineBasicBlock: Add liveins() method returning an iterator_range	Matthias Braun	2015-08-24	4	-22/+13
\| \| \| \|	llvm-svn: 245895
*	[WebAssembly] DYNAMIC_STACKALLOC returns a pointer.	Dan Gohman	2015-08-24	1	-1/+1
\| \| \| \|	llvm-svn: 245893
*	WebAssembly: Implement call	JF Bastien	2015-08-24	10	-36/+163
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Support function calls. Reviewers: sunfish, sunfishcode Subscribers: sunfishcode, jfb, llvm-commits Differential revision: http://reviews.llvm.org/D12219 llvm-svn: 245887
*	Revert two bad commits.	JF Bastien	2015-08-24	8	-96/+24
\| \| \| \| \| \| \| \| \| \|	Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting. Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D12298 llvm-svn: 245886
*	Missing print.	JF Bastien	2015-08-24	2	-13/+14
\| \| \| \|	llvm-svn: 245883
*	call	JF Bastien	2015-08-24	7	-8/+144
\| \| \| \|	llvm-svn: 245882
*	[WebAssembly] Make the assembly printer indent instructions.	Dan Gohman	2015-08-24	1	-0/+2
\| \| \| \|	llvm-svn: 245875
*	[WebAssembly] CodeGen support for __builtin_wasm_page_size()	Dan Gohman	2015-08-24	2	-1/+8
\| \| \| \|	llvm-svn: 245872
*	[PPC64LE] Fix PR24546 - Swap optimization and debug values	Bill Schmidt	2015-08-24	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	This patch fixes PR24546, which demonstrates a segfault during the VSX swap removal pass. The problem is that debug value instructions were not excluded from the list of instructions to be analyzed for webs of related computation. I've added the test case from the PR as a crash test in test/CodeGen/PowerPC. llvm-svn: 245862
*	[WebAssembly] Skeleton FastISel support	Dan Gohman	2015-08-24	5	-0/+97
\| \| \| \|	llvm-svn: 245860
*	[WebAssembly] Implement floating point rounding operators.	Dan Gohman	2015-08-24	2	-12/+16
\| \| \| \|	llvm-svn: 245859
*	[WebAssembly] Tell TargetTransformInfo about popcnt and sqrt.	Dan Gohman	2015-08-24	2	-4/+10
\| \| \| \|	llvm-svn: 245853
*	[WebAssembly] Use the checked form of MachineFunction::getSubtarget. NFC.	Dan Gohman	2015-08-24	2	-4/+3
\| \| \| \|	llvm-svn: 245852
*	[WebAssembly] Implement the is_zero_undef forms of cttz and ctlz	Dan Gohman	2015-08-24	1	-0/+6
\| \| \| \|	llvm-svn: 245851
*	[X86] Add support for mmword memory operand size for Intel-syntax x86 assembly	Michael Zuckerman	2015-08-24	1	-1/+1
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D12151 llvm-svn: 245835
*	[ARM] Use AEABI helpers for i64 div and rem	Scott Douglass	2015-08-24	2	-5/+59
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D12232 llvm-svn: 245830
*	[ARM] Refactor LowerDivRem before adding LowerREM (nfc)	Scott Douglass	2015-08-24	1	-17/+36
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D12230 llvm-svn: 245829
*	first commit to llvm	Michael Zuckerman	2015-08-24	1	-0/+1
\| \| \| \|	llvm-svn: 245825
*	Add missing break in AArch64DAGToDAGISel::Select() switch case	Mehdi Amini	2015-08-23	1	-0/+1
\| \| \| \| \| \| \|	Reported by coverity. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245800
*	[NVPTX] Allow undef value as global initializer	Jingyue Wu	2015-08-22	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: __shared__ variable may now emit undef value as initializer, do not throw error on that. Test Plan: test/CodeGen/NVPTX/global-addrspace.ll Patch by Xuetian Weng Reviewers: jholewinski, tra, jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D12242 llvm-svn: 245785
*	AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM	Matt Arsenault	2015-08-22	2	-15/+21
\| \| \| \| \| \| \| \|	Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775
*	AMDGPU: Improve accuracy of instruction rates for some FP instructions	Matt Arsenault	2015-08-22	2	-7/+27
\| \| \| \|	llvm-svn: 245774
*	AMDGPU: Use DFS to avoid second loop over function	Matt Arsenault	2015-08-22	1	-15/+13
\| \| \| \|	llvm-svn: 245772
*	AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges	Matt Arsenault	2015-08-22	1	-1/+1
\| \| \| \|	llvm-svn: 245769
*	AMDGPU: Improve debug printing in SIFixSGPRLiveRanges	Matt Arsenault	2015-08-22	1	-6/+15
\| \| \| \|	llvm-svn: 245768
*	AMDGPU: Move CI instructions into CIInstructions.td	Matt Arsenault	2015-08-22	2	-70/+69
\| \| \| \| \| \|	There are still a couple of CI patterns left in SIInstructions. llvm-svn: 245767
*	AMDGPU: Minor cleanups to help with f16 support	Matt Arsenault	2015-08-21	1	-9/+11
\| \| \| \| \| \| \| \|	The main change is inverting the condition for the operand class classes so that VT.Size == 16 uses VGPR_32 instead of 64. llvm-svn: 245764
*	AMDGPU/SI: Better handle s_wait insertion	Tom Stellard	2015-08-21	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
*	[ARM] Fix MachO CPU Subtype selection	Vedant Kumar	2015-08-21	1	-12/+35
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D12040 llvm-svn: 245744
*	[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers	Hal Finkel	2015-08-21	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	When PPCVSXFMAMutate would look at the input addend register, it would get its input value number. This would fail, however, if the register was undef, causing a segfault. Don't segfault (just skip such FMA instructions). Fixes the test case from PR24542 (although that may have been over-reduced). llvm-svn: 245741
*	[x86] enable machine combiner reassociations for 256-bit vector min/max	Sanjay Patel	2015-08-21	1	-0/+4
\| \| \| \|	llvm-svn: 245735
*	remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or later	Sanjay Patel	2015-08-21	1	-11/+7
\| \| \| \| \| \| \|	See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733
*	[x86] invert logic for attribute 'FeatureFastUAMem'	Sanjay Patel	2015-08-21	5	-89/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 llvm-svn: 245729
*	[x86] enable machine combiner reassociations for 128-bit vector min/max	Sanjay Patel	2015-08-21	1	-0/+8
\| \| \| \|	llvm-svn: 245715
*	Fix typo - symetric -> symmetric.	Eric Christopher	2015-08-21	1	-1/+1
\| \| \| \|	llvm-svn: 245705