summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Unify the integer min/max vector selection patterns with the ↵Silviu Baranga2015-08-262-52/+16
| | | | | | | | | | | | | | | | | | | | intrinsic ones Summary: This change lowers the aarch64 integer vector min/max intrinsic nodes to generic min/max nodes and replaces the intrinsic selection patterns with the generic ones. There should already be testing in place for this, so no further tests were added. Reviewers: jmolloy Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12276 llvm-svn: 246030
* FastISel: Use finishCondBranch() for ARM,Mips,PowerPC FastISelMatthias Braun2015-08-263-10/+5
| | | | | | Note that after this change branch probabilities are preserved now. llvm-svn: 245998
* FastISel: Factor out common code; NFC intendedMatthias Braun2015-08-262-69/+10
| | | | | | | | | This should be no functional change but for the record: For three cases in X86FastISel this will change the order in which the FalseMBB and TrueMBB of a conditional branch is addedd to the successor/predecessor lists. llvm-svn: 245997
* WebAssembly: add small FIXME for AsmPrinter.JF Bastien2015-08-261-0/+1
| | | | | | Suggested by @sunfish as a follow-up to r245982. llvm-svn: 245996
* Make variable argument intrinsics behave correctly in a Win64 CC function.Charles Davis2015-08-251-10/+18
| | | | | | | | | | | | | | | | Summary: This change makes the variable argument intrinsics, `llvm.va_start` and `llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch I have to add support for GCC's `__builtin_ms_va_list` constructs. Reviewers: nadav, asl, eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1622 llvm-svn: 245990
* WebAssembly: assert that there aren't any constant poolsJF Bastien2015-08-251-0/+7
| | | | | | WebAssembly will either use globals or immediates, since it's a virtual ISA. llvm-svn: 245989
* WebAssembly: emit `(func (param t) (result t))` s-expressionsJF Bastien2015-08-251-0/+61
| | | | | | | | | | | | Summary: Match spec format: https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wasm Reviewers: sunfish Subscribers: llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D12307 llvm-svn: 245986
* WebAssembly: comment out .globl when printing textual assemblyJF Bastien2015-08-251-1/+4
| | | | | | Do the same for .weak (not implemented for now, but may as well to it). Update comment string to two semicolons. llvm-svn: 245982
* make fast unaligned memory accesses implicit with SSE4.2 or SSE4aSanjay Patel2015-08-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | This is a follow-on from the discussion in http://reviews.llvm.org/D12154. This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops. A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example: $ clang -O1 foo.c -mavx This resolves a difference in lowering noted in PR24449: https://llvm.org/bugs/show_bug.cgi?id=24449 Before this patch, we used different store types depending on whether the example can be lowered as a memset or not. Differential Revision: http://reviews.llvm.org/D12288 llvm-svn: 245950
* [X86] Remove references to _ftol2Michael Kuperstein2015-08-255-54/+0
| | | | | | | As of r245924, _ftol2 is no longer used for fptoui on MS platforms. Remove the dead code associated with it. llvm-svn: 245925
* [X86] Fix fptoui conversionsMichael Kuperstein2015-08-252-69/+143
| | | | | | | | | | | | | | | This fixes two issues in x86 fptoui lowering. 1) Makes conversions from f80 go through the right path on AVX-512. 2) Implements an inline sequence for fptoui i64 instead of a library call. This improves performance by 6X on SSE3+ and 3X otherwise. Incidentally, it also removes the use of ftol2 for fptoui, which was wrong to begin with, as ftol2 converts to a signed i64, producing wrong results for values >= 2^63. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D11316 llvm-svn: 245924
* Pass function attributes instead of boolean in isIntDivCheap().Steve King2015-08-252-2/+4
| | | | llvm-svn: 245921
* Revert "Fix LLVM C API for DataLayout"Mehdi Amini2015-08-251-8/+22
| | | | | | | | This reverts commit 433bfd94e4b7e3cc3f8b08f8513ce47817941b0c. Broke some bot, have to see why it passed locally. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245917
* Fix LLVM C API for DataLayoutMehdi Amini2015-08-251-22/+8
| | | | | | | | | | | | | | | | | | | | | | | | | We removed access to the DataLayout on the TargetMachine and deprecated the C API function LLVMGetTargetMachineData() in r243114. However the way I tried to be backward compatible was broken: I changed the wrapper of the TargetMachine to be a structure that includes the DataLayout as well. However the TargetMachine is also wrapped by the ExecutionEngine, in the more classic way. A client using the TargetMachine wrapped by the ExecutionEngine and trying to get the DataLayout would break. It seems tricky to solve the problem completely in the C API implementation. This patch tries to address this backward compatibility in a more lighter way in the C++ API. The C API is restored in its original state and the removed C++ API is reintroduced, but privately. The C API is friended to the TargetMachine and should be the only consumer for this API. Reviewers: ributzka Differential Revision: http://reviews.llvm.org/D12263 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245916
* [PowerPC] PPCVSXFMAMutate should ignore trivial-copy addendsHal Finkel2015-08-241-5/+8
| | | | | | | | We might end up with a trivial copy as the addend, and if so, we should ignore the corresponding FMA instruction. The trivial copy can be coalesced away later, so there's nothing to do here. We should not, however, assert. Fixes PR24544. llvm-svn: 245907
* MachineBasicBlock: Add liveins() method returning an iterator_rangeMatthias Braun2015-08-244-22/+13
| | | | llvm-svn: 245895
* [WebAssembly] DYNAMIC_STACKALLOC returns a pointer.Dan Gohman2015-08-241-1/+1
| | | | llvm-svn: 245893
* WebAssembly: Implement callJF Bastien2015-08-2410-36/+163
| | | | | | | | | | | | Summary: Support function calls. Reviewers: sunfish, sunfishcode Subscribers: sunfishcode, jfb, llvm-commits Differential revision: http://reviews.llvm.org/D12219 llvm-svn: 245887
* Revert two bad commits.JF Bastien2015-08-248-96/+24
| | | | | | | | | | Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting. Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D12298 llvm-svn: 245886
* Missing print.JF Bastien2015-08-242-13/+14
| | | | llvm-svn: 245883
* callJF Bastien2015-08-247-8/+144
| | | | llvm-svn: 245882
* [WebAssembly] Make the assembly printer indent instructions.Dan Gohman2015-08-241-0/+2
| | | | llvm-svn: 245875
* [WebAssembly] CodeGen support for __builtin_wasm_page_size()Dan Gohman2015-08-242-1/+8
| | | | llvm-svn: 245872
* [PPC64LE] Fix PR24546 - Swap optimization and debug valuesBill Schmidt2015-08-241-0/+3
| | | | | | | | | | This patch fixes PR24546, which demonstrates a segfault during the VSX swap removal pass. The problem is that debug value instructions were not excluded from the list of instructions to be analyzed for webs of related computation. I've added the test case from the PR as a crash test in test/CodeGen/PowerPC. llvm-svn: 245862
* [WebAssembly] Skeleton FastISel supportDan Gohman2015-08-245-0/+97
| | | | llvm-svn: 245860
* [WebAssembly] Implement floating point rounding operators.Dan Gohman2015-08-242-12/+16
| | | | llvm-svn: 245859
* [WebAssembly] Tell TargetTransformInfo about popcnt and sqrt.Dan Gohman2015-08-242-4/+10
| | | | llvm-svn: 245853
* [WebAssembly] Use the checked form of MachineFunction::getSubtarget. NFC.Dan Gohman2015-08-242-4/+3
| | | | llvm-svn: 245852
* [WebAssembly] Implement the is_zero_undef forms of cttz and ctlzDan Gohman2015-08-241-0/+6
| | | | llvm-svn: 245851
* [X86] Add support for mmword memory operand size for Intel-syntax x86 assemblyMichael Zuckerman2015-08-241-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D12151 llvm-svn: 245835
* [ARM] Use AEABI helpers for i64 div and remScott Douglass2015-08-242-5/+59
| | | | | | Differential Revision: http://reviews.llvm.org/D12232 llvm-svn: 245830
* [ARM] Refactor LowerDivRem before adding LowerREM (nfc)Scott Douglass2015-08-241-17/+36
| | | | | | Differential Revision: http://reviews.llvm.org/D12230 llvm-svn: 245829
* first commit to llvmMichael Zuckerman2015-08-241-0/+1
| | | | llvm-svn: 245825
* Add missing break in AArch64DAGToDAGISel::Select() switch caseMehdi Amini2015-08-231-0/+1
| | | | | | | Reported by coverity. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245800
* [NVPTX] Allow undef value as global initializerJingyue Wu2015-08-221-3/+5
| | | | | | | | | | | | | | | | | | Summary: __shared__ variable may now emit undef value as initializer, do not throw error on that. Test Plan: test/CodeGen/NVPTX/global-addrspace.ll Patch by Xuetian Weng Reviewers: jholewinski, tra, jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D12242 llvm-svn: 245785
* AMDGPU: Allow specifying different opcode on VI for SMRD/SMEMMatt Arsenault2015-08-222-15/+21
| | | | | | | | Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775
* AMDGPU: Improve accuracy of instruction rates for some FP instructionsMatt Arsenault2015-08-222-7/+27
| | | | llvm-svn: 245774
* AMDGPU: Use DFS to avoid second loop over functionMatt Arsenault2015-08-221-15/+13
| | | | llvm-svn: 245772
* AMDGPU: Make sure to run verifier after SIFixSGPRLiveRangesMatt Arsenault2015-08-221-1/+1
| | | | llvm-svn: 245769
* AMDGPU: Improve debug printing in SIFixSGPRLiveRangesMatt Arsenault2015-08-221-6/+15
| | | | llvm-svn: 245768
* AMDGPU: Move CI instructions into CIInstructions.tdMatt Arsenault2015-08-222-70/+69
| | | | | | There are still a couple of CI patterns left in SIInstructions. llvm-svn: 245767
* AMDGPU: Minor cleanups to help with f16 supportMatt Arsenault2015-08-211-9/+11
| | | | | | | | The main change is inverting the condition for the operand class classes so that VT.Size == 16 uses VGPR_32 instead of 64. llvm-svn: 245764
* AMDGPU/SI: Better handle s_wait insertionTom Stellard2015-08-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
* [ARM] Fix MachO CPU Subtype selectionVedant Kumar2015-08-211-12/+35
| | | | | | Differential Revision: http://reviews.llvm.org/D12040 llvm-svn: 245744
* [PowerPC] PPCVSXFMAMutate should not segfault on undef input registersHal Finkel2015-08-211-0/+5
| | | | | | | | | | When PPCVSXFMAMutate would look at the input addend register, it would get its input value number. This would fail, however, if the register was undef, causing a segfault. Don't segfault (just skip such FMA instructions). Fixes the test case from PR24542 (although that may have been over-reduced). llvm-svn: 245741
* [x86] enable machine combiner reassociations for 256-bit vector min/maxSanjay Patel2015-08-211-0/+4
| | | | llvm-svn: 245735
* remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or laterSanjay Patel2015-08-211-11/+7
| | | | | | | See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733
* [x86] invert logic for attribute 'FeatureFastUAMem'Sanjay Patel2015-08-215-89/+98
| | | | | | | | | | | | | | | | This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 llvm-svn: 245729
* [x86] enable machine combiner reassociations for 128-bit vector min/maxSanjay Patel2015-08-211-0/+8
| | | | llvm-svn: 245715
* Fix typo - symetric -> symmetric.Eric Christopher2015-08-211-1/+1
| | | | llvm-svn: 245705
OpenPOWER on IntegriCloud