summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix not expanding control flow after some kill blocksMatt Arsenault2016-07-151-7/+2
| | | | | | | | | | | | | Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 llvm-svn: 275510
* AMDGPU: Fix trying to skip from a block with no successorsMatt Arsenault2016-07-151-2/+3
| | | | | | Found while reducing bug 28550 llvm-svn: 275509
* AMDGPU: Fix splitting kill blocks with defs before killMatt Arsenault2016-07-151-13/+3
| | | | llvm-svn: 275508
* [AArch64] Set COPY ZR isAsCheapAsAMove when needed.Haicheng Wu2016-07-151-2/+6
| | | | | | | | | If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now only Kryo has both), set COPY (W|X)ZR isAsCheapAsAMove. Differential Revision: http://reviews.llvm.org/D22360 llvm-svn: 275503
* [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied)Simon Pilgrim2016-07-141-0/+1
| | | | | | | | This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. This was incorrectly reverted in rL275421 during triage of PR28552. llvm-svn: 275497
* s/constexpr/LLVM_CONSTEXPR in AArch64InstrInfo.cpp.Justin Lebar2016-07-141-1/+1
| | | | | | Yet again. llvm-svn: 275463
* [Hexagon] Packetize function call arguments with tail call instructionsKrzysztof Parzyszek2016-07-143-1/+13
| | | | | | | | On Hexagon is it legal to packetize the instructions setting up call arguments with the call instruction itself. This was already done, except for tail calls. Make sure tail calls are handled as well. llvm-svn: 275458
* [AArch64] Adjust the scheduling model for Exynos-M1.Evandro Menezes2016-07-141-0/+1
| | | | | | Enable use-postra-scheduler. (NFC) llvm-svn: 275457
* [CodeGen] Refactor MachineMemOperand::Flags's target-specific flags.Justin Lebar2016-07-142-21/+7
| | | | | | | | | | | | | | | | | Summary: Make the target-specific flags in MachineMemOperand::Flags real, bona fide enum values. This simplifies users, prevents various constants from going out of sync, and avoids the false sense of security provided by declaring static members in classes and then forgetting to define them inside of cpp files. Reviewers: MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22372 llvm-svn: 275451
* [X86][MC] Fix bracket expression parsing in intel-style assembly.Nirav Dave2016-07-141-2/+5
| | | | | | | | | | | | | | Only perform struct field check on Identifier tokens. Fixes PR28547. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22361 llvm-svn: 275445
* [CodeGen] Refactor MachineMemOperand's Flags enum.Justin Lebar2016-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Summary: - Give it a shorter name (because we're going to refer to it often from SelectionDAG and friends). - Split the flags and alignment into separate variables. - Specialize FlagsEnumTraits for it, so we can do bitwise ops on it without losing type information. - Make some enum values constants in MachineMemOperand instead. MOMaxBits should not be a valid Flag. - Simplify some of the bitwise ops for dealing with Flags. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22281 llvm-svn: 275438
* ARM: fix vmov.i64 immediate validity checkTim Northover2016-07-141-1/+1
| | | | | | Typo meant we were only checking the low byte (repeatedly). llvm-svn: 275437
* Don't optimize movs to pushes in -O0 builds.Nico Weber2016-07-141-1/+1
| | | | | | https://reviews.llvm.org/D22362 llvm-svn: 275431
* Delete some trailing whitespace.Nico Weber2016-07-141-2/+2
| | | | llvm-svn: 275429
* [X86] Decode MPX BND registers.Ahmed Bougacha2016-07-142-4/+15
| | | | | | | | | | | | | We were able to assemble, but not disassemble. Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit the uint8_t max. The control registers were already squarely above it, but I don't think they ever go in .r/m, only in .reg. I also did notice an extra REX.W in our encoding, but I think that's fine. llvm-svn: 275427
* [X86] Don't mark addressing mode operands as "outs". NFC-ish.Ahmed Bougacha2016-07-141-12/+12
| | | | | | | Nothing in-tree can tell the difference, but it's incorrect: the addressing mode registers aren't what's defined. llvm-svn: 275426
* [AMDGPU] Assembler: fix row_bcast parsingSam Kolton2016-07-141-0/+2
| | | | | | | | | | | | Summary: This change fix bug 28538 Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22355 llvm-svn: 275422
* Revert r275411, it cause PR28552.Nico Weber2016-07-141-1/+0
| | | | llvm-svn: 275421
* Teach fast isel calls and rets about stdcall.Nico Weber2016-07-141-0/+2
| | | | | | | stdcall is callee-pop like thiscall, so the thiscall changes already did most of the work for this. This change only opts stdcall in and adds tests. llvm-svn: 275414
* Remove trailing whitespace.Simon Pilgrim2016-07-141-1/+1
| | | | llvm-svn: 275412
* [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffleSimon Pilgrim2016-07-141-0/+1
| | | | | | This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. llvm-svn: 275411
* [mips] SelectionDAGISel subclasses now follow the optimization level.Daniel Sanders2016-07-146-13/+18
| | | | | | | | | | | | | | | | Summary: It was recently discovered that, for Mips's SelectionDAGISel subclasses, all optimization levels caused SelectionDAGISel to behave like -O2. This change adds the necessary plumbing to initialize the optimization level. Reviewers: andrew.w.kaylor Subscribers: andrew.w.kaylor, sdardis, dean, llvm-commits, vradosavljevic, petarj, qcolombet, probinson, dsanders Differential Revision: https://reviews.llvm.org/D14900 llvm-svn: 275410
* [X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to ↵Simon Pilgrim2016-07-141-14/+25
| | | | | | | | 64-bits to allow combining Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations. llvm-svn: 275406
* [X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments supportSimon Pilgrim2016-07-143-0/+23
| | | | llvm-svn: 275400
* [X86][AVX2] VBROADCASTSSrr/VBROADCASTSSYrr require AVX2 not AVXSimon Pilgrim2016-07-141-1/+1
| | | | llvm-svn: 275391
* This implements a more optimal algorithm for selecting a base constant inSjoerd Meijer2016-07-142-0/+14
| | | | | | | | | | | | | | constant hoisting. It not only takes into account the number of uses and the cost of expressions in which constants appear, but now also the resulting integer range of the offsets. Thus, the algorithm maximizes the number of uses within an integer range that will enable more efficient code generation. On ARM, for example, this will enable code size optimisations because less negative offsets will be created. Negative offsets/immediates are not supported by Thumb1 thus preventing more compact instruction encoding. Differential Revision: http://reviews.llvm.org/D21183 llvm-svn: 275382
* [AVX512] Implement EXTLOAD lowering with patterns to select existing VPMOVZX ↵Craig Topper2016-07-141-46/+76
| | | | | | instructions instead of creating CodeGenOnly instructions. llvm-svn: 275378
* [X86] Fix stupid typo in isel lowering.Eli Friedman2016-07-141-1/+1
| | | | | | | Apparently someone miscounted the number of zeros in the immediate. Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 . llvm-svn: 275376
* AMDGPU/R600: Delete/rename intrinsics no longer used by mesaMatt Arsenault2016-07-147-326/+7
| | | | | | Use the replacement pass to update the tests, and delete old names. llvm-svn: 275375
* AMDGPU/R600: Remove intrinsics with no tests and no usersMatt Arsenault2016-07-144-76/+15
| | | | | | Mesa removed this path, so nothing is using these anymore. llvm-svn: 275372
* AMDGPU: Remove unused intrinsicsMatt Arsenault2016-07-142-12/+0
| | | | llvm-svn: 275371
* AMDGPU: Remove dead codeMatt Arsenault2016-07-142-10/+0
| | | | llvm-svn: 275369
* XRay: Add entry and exit sledsDean Michael Berris2016-07-145-5/+148
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: In this patch we implement the following parts of XRay: - Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches. - Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts). - X86-specific nop sleds as described in the white paper. - A machine function pass that adds the different instrumentation marker instructions at a very late stage. - A way of identifying which return opcode is considered "normal" for each architecture. There are some caveats here: 1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet. 2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library. Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D19904 llvm-svn: 275367
* Teach fast isel about thiscall (and callee-pop) calls.Nico Weber2016-07-141-9/+8
| | | | | | http://reviews.llvm.org/D22315 llvm-svn: 275360
* Add EnableIPRA to TargetOptions, and move the cl::opt -enable-ipra to ↵Mehdi Amini2016-07-131-1/+8
| | | | | | | | | | | | TargetMachine.cpp Avoid exposing a cl::opt in a public header and instead promote this option in the API. Alternatively, we could land the cl::opt in CommandFlags.h so that it is available to every tool, but we would still have to find an option for clang. llvm-svn: 275348
* Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact.Nico Weber2016-07-131-10/+10
| | | | | | | | | This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well, but it's not clear if the pass should run in that setup. http://reviews.llvm.org/D22314 llvm-svn: 275320
* AMDGPU: Remove last AMDIL intrinsicsMatt Arsenault2016-07-132-11/+1
| | | | llvm-svn: 275309
* AMDGPU/SI: Emit the number of SGPR and VGPR spillsMarek Olsak2016-07-135-0/+30
| | | | | | | | | | | | | | | | | | | | | Summary: v2: don't count SGPRs spilled to scratch twice I think this is sufficient. It doesn't count private memory usage, which happens often and uses scratch but isn't technically a spill. The private memory usage can be computed by: [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills]. The fact SGPR spills add very high numbers to the scratch size make that computation a guessing game, but I don't have a solution to that. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D22197 llvm-svn: 275288
* [x86][SSE/AVX] optimize pcmp results better (PR28484)Sanjay Patel2016-07-131-0/+39
| | | | | | | | | | | | | | We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading. One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the test cases is filed as PR28505: https://llvm.org/bugs/show_bug.cgi?id=28505 Differential Revision: http://reviews.llvm.org/D22225 llvm-svn: 275276
* [X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask commentsSimon Pilgrim2016-07-131-9/+26
| | | | llvm-svn: 275272
* [X86][AVX] Add support for target shuffle combining to VPERMILPS variable ↵Simon Pilgrim2016-07-132-4/+37
| | | | | | | | shuffle mask Added AVX512F VPERMILPS shuffle decoding support llvm-svn: 275270
* AMDGPU/SI: Add support for R_AMDGPU_GOTPCRELTom Stellard2016-07-135-28/+69
| | | | | | | | | | Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21484 llvm-svn: 275268
* [X86][SSE] Check for lane crossing shuffles before trying to combine to PSHUFBSimon Pilgrim2016-07-131-8/+11
| | | | | | Removes a return-on-fail that was making it tricky to add other variable mask shuffles. llvm-svn: 275262
* AMDGPU: Fold out no-op kill intrinsicsMatt Arsenault2016-07-131-0/+8
| | | | llvm-svn: 275253
* AMDGPU: WQM cleanupsMatt Arsenault2016-07-132-42/+39
| | | | | | | | - Add new TTI instruction checks - Don't use const for blocks that are mutated. - Checking isBranch and isTerminator should be redundant llvm-svn: 275252
* [X86] Remove some seemingly unnecessary patterns that supported vector ↵Craig Topper2016-07-132-33/+0
| | | | | | | | zext/sext with 256-bit source types producing a 256-bit result. These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases. llvm-svn: 275240
* AMDGPU: Follow up to r275203Matt Arsenault2016-07-125-33/+101
| | | | | | I meant to squash this into it. llvm-svn: 275220
* [Power9] Add codegen for VSX word insert/extract instructionsNemanja Ivanovic2016-07-125-8/+220
| | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D20239 It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that are useful in some cases for inserting and extracting vector elements of v4[if]32 vectors. llvm-svn: 275215
* [X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128Simon Pilgrim2016-07-121-4/+28
| | | | llvm-svn: 275212
* X86FixupBWInsts: No need for forward liveness analysis.Matthias Braun2016-07-121-35/+0
| | | | | | | | | | With r274952 and r275201 in place there are no cases left where a forward liveness analysis yields different results than a backward one. So we can remove the forward stepping logic. Differential Revision: http://reviews.llvm.org/D22083 llvm-svn: 275204
OpenPOWER on IntegriCloud