summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] GlobalISel: Allow i8 and i16 addsDiana Picus2016-12-191-1/+6
| | | | | | | | | Teach the instruction selector and legalizer that it's ok to have adds with 8 or 16-bit integers. This is the second part of https://reviews.llvm.org/D27704 llvm-svn: 290105
* [ARM] GlobalISel: Select i8 and i16 copiesDiana Picus2016-12-191-2/+9
| | | | | | | | | Teach the instruction selector that it's ok to copy small values from physical registers. First part of https://reviews.llvm.org/D27704 llvm-svn: 290104
* [Power9] Processor Model for SchedulingEhsan Amiri2016-12-194-3/+1145
| | | | | | | | PWR9 processor model for instruction scheduling. A subsequent patch will migrate PWR9 to Post RA MIScheduler. https://reviews.llvm.org/D24525 llvm-svn: 290102
* [Hexagon] Restore minimum profit check accidentally changed in r290024Malcolm Parsons2016-12-191-2/+2
| | | | llvm-svn: 290100
* [ARM] GlobalISel: Lower more than 4 argumentsDiana Picus2016-12-191-10/+22
| | | | | | | | | | This adds support for lowering more than 4 arguments (although still i32 only). It uses the handleAssignments / ValueHandler infrastructure extracted from the AArch64 backend in r288658. Differential Revision: https://reviews.llvm.org/D27195 llvm-svn: 290098
* AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for ↵Sam Kolton2016-12-194-72/+143
| | | | | | | | | | | | | | | | | | | | | | | | functime metadata V2.0 Summary: Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata. Between them user can put YAML string that would be directly put to the generated note. E.g.: ''' .hsa_code_object_metadata { amd.MDVersion: [ 2, 0 ] } .end_hsa_code_object_metadata ''' Based on D25046 Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye Differential Revision: https://reviews.llvm.org/D27619 llvm-svn: 290097
* [ARM] GlobalISel: Support loading from the stackDiana Picus2016-12-193-10/+45
| | | | | | | | | | Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit scalars only). This will be useful for functions that need to pass arguments on the stack. First part of https://reviews.llvm.org/D27195. llvm-svn: 290096
* [X86] When recognizing vector loads or VZEXT_LOAD in selectScalarSSELoad ↵Craig Topper2016-12-191-2/+2
| | | | | | make sure we pass the load's user rather than load itself to the second operand of IsLegalToFold. llvm-svn: 290089
* [X86] Remove all of the patterns that use X86ISD:FAND/FXOR/FOR/FANDN except ↵Craig Topper2016-12-192-131/+42
| | | | | | | | for the ones needed for SSE1. Anything SSE2 or above uses the integer ISD opcode. This removes 11721 bytes from the DAG isel table or 2.2% llvm-svn: 290073
* Revert r289955 and r289962. This is causing lots of ASAN failures for us.Daniel Jasper2016-12-181-22/+10
| | | | | | | | Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066
* [X86] [AVX512] Minor fix in encoding of scalar EVEX instructions. NFC.Michael Zuckerman2016-12-181-3/+2
| | | | | | | | | | | | Commit on behalf of Gadi Haber Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified. The changed encodings are validated with XED. Rviewers: delena, igorb Differential revision: https://reviews.llvm.org/D27802 llvm-svn: 290065
* [X86][SSE] Add support for combining target shuffles to SHUFPS.Simon Pilgrim2016-12-181-2/+108
| | | | | | As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064
* [X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations ↵Craig Topper2016-12-181-13/+14
| | | | | | | | | | | | if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060
* [AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when ↵Craig Topper2016-12-183-5/+22
| | | | | | | | DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057
* [AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops ↵Craig Topper2016-12-182-2/+2
| | | | | | for scalars. I missed this in r290049. llvm-svn: 290055
* [AVX-512] Use EVEX encoded logic operations for scalar types when they are ↵Craig Topper2016-12-172-1/+38
| | | | | | available. This gives the register allocator more registers to work with. llvm-svn: 290049
* Revert "AArch64CollectLOH: Rewrite as block-local analysis."Matthias Braun2016-12-171-279/+841
| | | | | | | | It is still breaking Chrome. http://llvm.org/PR31361 This reverts commit r290026. llvm-svn: 290047
* [Hexagon] Other attempt to fix build with enabled asserts broken in 290024 ↵Eugene Zelenko2016-12-171-0/+1
| | | | | | (NFC). llvm-svn: 290028
* [Hexagon] Fix build with enabled asserts broken in 290024 (NFC).Eugene Zelenko2016-12-171-0/+1
| | | | llvm-svn: 290027
* AArch64CollectLOH: Rewrite as block-local analysis.Matthias Braun2016-12-171-841/+279
| | | | | | | | | | | | | | | | | Re-apply r288561: Liveness tracking should be correct now after r290014. Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 290026
* [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2016-12-1711-163/+220
| | | | | | other minor fixes (NFC). llvm-svn: 290024
* AArch64: Enable post-ra liveness updatesMatthias Braun2016-12-163-1/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D27559 llvm-svn: 290014
* Implement LaneBitmask::any(), use it to replace !none(), NFCIKrzysztof Parzyszek2016-12-165-11/+11
| | | | llvm-svn: 289974
* [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.Eli Friedman2016-12-163-19/+93
| | | | | | | | | | | | | | | | | | | | Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972
* AMDGPU: Fix name for v_ashrrev_i16Matt Arsenault2016-12-161-3/+3
| | | | llvm-svn: 289967
* Fix -Wself-assign from r289955Hans Wennborg2016-12-161-7/+8
| | | | llvm-svn: 289962
* [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, ↵Hans Wennborg2016-12-161-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955
* [X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling ↵Simon Pilgrim2016-12-161-3/+4
| | | | | | | | DAG.getVectorShuffle (PR27885) We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'. llvm-svn: 289950
* [X86][AVX512] use a single shufps for 512-bit vectors when it can save ↵Simon Pilgrim2016-12-161-1/+13
| | | | | | | | | | | | instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946
* [GlobalISel] Silence unused variable warnings in Release builds.Benjamin Kramer2016-12-161-5/+4
| | | | llvm-svn: 289941
* [ARM] GlobalISel: Select add i32, i32Diana Picus2016-12-167-9/+299
| | | | | | | | | | | | | Add the minimal support necessary to select a function that returns the sum of two i32 values. This includes some support for argument/return lowering of i32 values through registers, as well as the handling of copy and add instructions throughout the GlobalISel pipeline. Differential Revision: https://reviews.llvm.org/D26677 llvm-svn: 289940
* [X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.Simon Pilgrim2016-12-161-9/+9
| | | | | | We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead. llvm-svn: 289937
* [ARM] Expose methods to get the CCAssignFn. NFCIDiana Picus2016-12-162-17/+21
| | | | | | | | | | | Add two public methods to ARMTargetLowering: CCAssignFnForCall and CCAssignFnForReturn, which are just calling the already existing private method CCAssignFnForNode. These will come in handy for GlobalISel on ARM. We also replace all calls to CCAssignFnForNode in ARMISelLowering.cpp, because the new methods are friendlier to the reader. llvm-svn: 289932
* Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and ↵Chandler Carruth2016-12-161-25/+40
| | | | | | | | | | | | | tail-call eligibility) This patch appears to result in trampolines in vtables being miscompiled when they in turn tail call a method. I've posted some preliminary details about the failure on the thread for this commit and talked to Hal. He was comfortable going ahead and reverting until we sort out what is wrong. llvm-svn: 289928
* Revert 279703, it caused PR31404.Nico Weber2016-12-163-92/+19
| | | | llvm-svn: 289923
* [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2016-12-166-151/+235
| | | | | | other minor fixes (NFC). llvm-svn: 289907
* [AArch64] Add FeatureSlowMisaligned128Store to Exynos M1 and M2Evandro Menezes2016-12-161-0/+2
| | | | | | | This feature now gates such stores after r289845. Thus the Exynos processors now need this feature. llvm-svn: 289898
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-153-0/+21
| | | | llvm-svn: 289877
* AMDGPU: Fix asserting on returned tail callsMatt Arsenault2016-12-151-2/+4
| | | | llvm-svn: 289868
* AMDGPU: Assembler support for vintrp instructionsMatt Arsenault2016-12-153-6/+108
| | | | llvm-svn: 289866
* [GlobalISel] Drop workaround for Legalizer member/class sharing a name. NFC.Ahmed Bougacha2016-12-153-3/+3
| | | | | | | | MachineLegalizer used to be the name of both the class and the member, causing GCC errors. r276522 fixed that by renaming the member to just 'Legalizer'. The 'class' workaround isn't necessary anymore; drop it. llvm-svn: 289848
* [x86] use a single shufps for 256-bit vectors when it can save instructionsSanjay Patel2016-12-151-1/+13
| | | | | | | | | | | This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846
* [AArch64] Guard Misaligned 128-bit store penalty by subtarget featureMatthew Simpson2016-12-151-1/+2
| | | | | | | | | This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
* [AArch64][GlobalISel] Remove redundant RBI comments. NFC.Ahmed Bougacha2016-12-151-20/+1
| | | | | | | It's brittle, and Doxygen already picks the overriden method's comment anyway. llvm-svn: 289844
* [x86] use a single shufps when it can save instructionsSanjay Patel2016-12-151-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
* [X86][SSE] Fix domains for scalar store instructionsSimon Pilgrim2016-12-151-0/+4
| | | | | | As discussed on D27692 llvm-svn: 289834
* [lanai] Simplify small section check in LowerGlobalAddress and treat ldata ↵Jacques Pienaar2016-12-152-3/+14
| | | | | | | | sections specially. Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation). llvm-svn: 289832
* [X86][AVX512] Moved instruction domain lookups to the right table. NFCI.Simon Pilgrim2016-12-151-4/+4
| | | | | | Avoid duplicating instructions in the int32/int64 domains. llvm-svn: 289830
* [X86][SSE] Fix domains for VZEXT_LOAD type instructionsSimon Pilgrim2016-12-151-0/+6
| | | | | | | | Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825
* Fix for regression after Global Load Scalarization patchAlexander Timofeev2016-12-151-1/+2
| | | | llvm-svn: 289822
OpenPOWER on IntegriCloud