summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Improve ARM lowering for "icmp <2 x i64> eq".Eli Friedman2016-10-181-6/+21
| | | | | | | | | The custom lowering is pretty straightforward: basically, just AND together the two halves of a <4 x i32> compare. Differential Revision: https://reviews.llvm.org/D25713 llvm-svn: 284536
* [AArch64] Avoid materializing 0.0 when generating FP SELECTEvandro Menezes2016-10-181-0/+19
| | | | | | | | | | | Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0` to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not have to be materialized beforehand for FCMP, as it has a form that has 0.0 as an implicit operand. Differential Revision: https://reviews.llvm.org/D24808 llvm-svn: 284531
* GlobalISel: select small binary operations on AArch64.Tim Northover2016-10-181-4/+9
| | | | | | | | AArch64 actually supports many 8-bit operations under the definition used by GlobalISel: the designated information-carrying bits of a GPR32 get the right value if you just use the normal 32-bit instruction. llvm-svn: 284526
* GlobalISel: support floating-point constants on AArch64.Tim Northover2016-10-181-7/+74
| | | | | | Patch from Ahmed Bougacha. llvm-svn: 284523
* [Hexagon] Handle block live-ins with lane masks in HexagonBlockRangesKrzysztof Parzyszek2016-10-182-11/+28
| | | | llvm-svn: 284522
* Reduce global namespace pollution. NFC.Benjamin Kramer2016-10-185-7/+12
| | | | llvm-svn: 284521
* revert r284495: [Target] remove TargetRecip classSanjay Patel2016-10-188-63/+320
| | | | | | There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513
* [Target] remove TargetRecip class; move reciprocal estimate isel ↵Sanjay Patel2016-10-188-320/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495
* [X86][AVX512] Add mask/maskz writemask support to constant pool shuffle ↵Simon Pilgrim2016-10-181-24/+36
| | | | | | decode commentx llvm-svn: 284488
* [mips][ias] Handle more complicated expressions for memory operandsSimon Dardis2016-10-181-2/+54
| | | | | | | | | | | | | | | | This patch teaches ias for mips to handle expressions such as (8*4)+(8*31)($sp). Such expression typically occur from the expansion of multiple macro definitions. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D24667 llvm-svn: 284485
* [mips] Fix sync instruction definitionSimon Dardis2016-10-182-2/+8
| | | | | | | | | | | | | | | | | | | | | | | The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands. MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit immediate. This patch correct the definition of sync so that it is accepted with an operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned immediate for MIPS32 and later revisions. Additionally a clear error is given when the MIPS32 version of sync is used when targeting pre MIPS32. This partially resolves PR/30714. Thanks to Daniel Sanders for reporting this issue! Reveiwers: vkalintiris Differential Revision: https://reviews.llvm.org/D25672 llvm-svn: 284483
* [mips] Macro expansion for ld, sd for O32Simon Dardis2016-10-182-0/+106
| | | | | | | | | | | | | | | | | | | | | | ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads or stores using the specified source or destination register and the next register. This patch does not add support for the cases where the offset is greater than a 16 bit signed immediate as that would lead to a wrong/misleading error message as the assembler would report "instruction requires a CPU feature not currently enabled" for ld & sd for MIPS64 when their offset is not a signed 16 bit number. This fixes PR/29159. Thanks to Sean Bruno for reporting this issue! Reviewers: vkalintiris, seanbruno, zoran.jovanovic Differential Review: https://reviews.llvm.org/D24556 llvm-svn: 284481
* [x86][inline-asm][avx512] allow swapping of '{k<num>}' & '{z}' marksMichael Zuckerman2016-10-181-25/+65
| | | | | | | | | | | | | | | | | | | | | | | Committing on behalf of Coby Tayree: After check-all and LGTM Desc: AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}') Currently, the following forms are allowed: {k<num>} {k<num>}{z} This patch allows the following forms: {z}{k<num>} and ignores the next form: {z} Justification would be quite simple - GCC Differential Revision: http://reviews.llvm.org/D25013 llvm-svn: 284479
* [mips][FastISel] Instantiate the MipsFastISel class only for targets that ↵Vasileios Kalintiris2016-10-182-22/+13
| | | | | | | | | | | | | | | | | | support FastISel. Summary: Instead of instantiating the MipsFastISel class and checking if the target is supported in the overriden methods, we should perform that check before creating the class. This allows us to enable FastISel *only* for targets that truly support it, ie. MIPS32 to MIPS32R5. Reviewers: sdardis Subscribers: ehostunreach, llvm-commits Differential Revision: https://reviews.llvm.org/D24824 llvm-svn: 284475
* [ARM] Assign cost of scaling for Cortex-R52Javed Absar2016-10-181-1/+2
| | | | | | | | | | | | This patch assigns cost of the scaling used in addressing for Cortex-R52. On Cortex-R52 a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. Differential Revision: http://reviews.llvm.org/D25670 Reviewer: jmolloy llvm-svn: 284460
* [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32Simon Pilgrim2016-10-186-1/+64
| | | | | | | | | | | | As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459
* [XRay] Support for for tail calls for ARM no-ThumbDean Michael Berris2016-10-183-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456
* [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has ↵Craig Topper2016-10-183-30/+24
| | | | | | | | a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453
* [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool ↵Craig Topper2016-10-183-20/+27
| | | | | | | | | | | | | | entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451
* [AVX-512] Add support for decoding shuffle mask from constant pool for ↵Craig Topper2016-10-181-24/+53
| | | | | | masked VPERMILPS/PD. llvm-svn: 284450
* [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for itKonstantin Zhuravlyov2016-10-171-2/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D25694 llvm-svn: 284435
* GlobalISel: support wider range of load/store sizes in AArch64.Tim Northover2016-10-171-0/+8
| | | | llvm-svn: 284406
* AMDGPU/SI: LowerParameter() should be computing align based on memory typeTom Stellard2016-10-171-1/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25203 llvm-svn: 284398
* AMDGPU/SI: Fix LowerParameter() for i16 argumentsTom Stellard2016-10-171-10/+18
| | | | | | | | | | | | | | Summary: If we are loading an i16 value from a 32-bit memory location, then we need to be able to truncate the loaded value to i16. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25198 llvm-svn: 284397
* [X86] Fix shuffle decoding assertions to print the right number of required ↵Craig Topper2016-10-171-8/+8
| | | | | | operands. Update the checks themselves to be >= to the same number instead of > one less than the required number. llvm-svn: 284365
* [AVX-512] Add shuffle combining support for vpermi2var shuffles derived from ↵Craig Topper2016-10-171-0/+14
| | | | | | existing support for vpermt2var. llvm-svn: 284357
* [AVX-512] Add support for turning a 256-bit load that goes to both halfs of ↵Craig Topper2016-10-162-13/+69
| | | | | | | | an insert_subvector into a subvector broadcast. Differential Revision: https://reviews.llvm.org/D25650 llvm-svn: 284353
* [AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the ↵Craig Topper2016-10-161-3/+3
| | | | | | other vpermi2var intrinsics. llvm-svn: 284329
* [AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.Craig Topper2016-10-161-4/+4
| | | | llvm-svn: 284328
* [AVX-512] Move (v4i64 (X86SubVBroadcast (v2i64))) alternate patterns under a ↵Craig Topper2016-10-161-9/+9
| | | | | | HasVLX predicate. Similar for floating point. llvm-svn: 284327
* [ArmFastISel] Kill dead code. NFCI.Davide Italiano2016-10-161-35/+0
| | | | llvm-svn: 284320
* [MachineMemOperand] Move synchronization scope and atomic orderings from ↵Konstantin Zhuravlyov2016-10-152-7/+3
| | | | | | | | SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG. Differential Revision: https://reviews.llvm.org/D24577 llvm-svn: 284312
* [AVX-512] Add shuffle comments for vbroadcast instructions.Craig Topper2016-10-151-0/+43
| | | | llvm-svn: 284305
* [AVX-512] Rename VPBROADCASTI32X2 and VPBROADCASTF32X2 instruction classes ↵Craig Topper2016-10-151-4/+4
| | | | | | to match the mnemonic which does not include a 'P'. llvm-svn: 284304
* AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizerTom Stellard2016-10-152-0/+49
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25526 llvm-svn: 284298
* GlobalISel: rename legalizer components to match others.Tim Northover2016-10-146-17/+17
| | | | | | | | | | The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287
* [PPC] Shorter sequence to load 64bit constant with same hi/lo wordsGuozhi Wei2016-10-141-2/+23
| | | | | | | | | | | | This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276
* AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operationsTom Stellard2016-10-141-13/+10
| | | | | | | | | | | | | Summary: We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff) Reviewers: arsenm, nhaehnle Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24672 llvm-svn: 284267
* The real fix for post-r284255 failuresKrzysztof Parzyszek2016-10-142-3/+2
| | | | llvm-svn: 284264
* Workaround to eliminate check-llvm failures after r284255Krzysztof Parzyszek2016-10-141-0/+1
| | | | llvm-svn: 284262
* Add a pass to optimize patterns of vectorized interleaved memory accesses forDavid L Kreitzer2016-10-144-0/+129
| | | | | | | | | | | | | X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260
* AMDGPU/SI: Don't allow unaligned scratch accessTom Stellard2016-10-144-0/+21
| | | | | | | | | | | | Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257
* [RDF] Switch RegisterRef to be a pair (Register, LaneMask)Krzysztof Parzyszek2016-10-148-255/+236
| | | | | | | | | Use PackedRegisterRef to store the register information in the graph nodes. This commit also removes support for virtual registers. It has never been tested or used. It will be possible to add it back if there is a need. llvm-svn: 284255
* [safestack] Use non-thread-local unsafe stack pointer for Contiki OSDavid L Kreitzer2016-10-141-0/+3
| | | | | | | | Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254
* Revert "In preparation for removing getNameWithPrefix off ofEric Christopher2016-10-141-1/+7
| | | | | | | | | TargetMachine," as it's causing sanitizer/memory issues until I can track down this set. This reverts commit r284203 llvm-svn: 284252
* [X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵Pierre Gousseau2016-10-146-0/+129
| | | | | | | | | | | | | | | | ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
* AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodesNicolai Haehnle2016-10-141-10/+37
| | | | | | | | | | | | | | Summary: This will be used for 64-bit MULHU, which is in turn used for the 64-bit divide-by-constant optimization (see D24822). Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25289 llvm-svn: 284224
* [mips] Fix aui/daui/dahi/dati for MIPSR6Simon Dardis2016-10-147-17/+52
| | | | | | | | | | | | For compatiblity with binutils, define these instructions to take two registers with a 16bit unsigned immediate. Both of the registers have to be same for dahi and dati. Reviewers: dsanders, zoran.jovanovic Differential Review: https://reviews.llvm.org/D21473 llvm-svn: 284218
* AMDGPU: Fix use-after-freesNicolai Haehnle2016-10-142-15/+16
| | | | | | | | | | Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25312 llvm-svn: 284215
* [x86][ms-inline-asm] use of "jmp short" in asm is not supportedMichael Zuckerman2016-10-141-0/+14
| | | | | | | | | | | | | | | | | | Committing in the name of Ziv Izhar: After check-all and LGTM . The following patch is for compatability with Microsoft. Microsoft ignores the keyword "short" when used after a jmp, for example: __asm { jmp short label label: } A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly. link to test: https://reviews.llvm.org/D24958 Differential Revision: https://reviews.llvm.org/D24957 llvm-svn: 284211
OpenPOWER on IntegriCloud