summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [CodeView] Add prefix to CodeView registers.Jonas Devlieghere2018-05-291-114/+114
| | | | | | | | | | | | | Adds CVReg to CodeView register names to prevent a duplicate symbol with CR3 defined in termios.h, as suggested by Zachary on the mailing list. http://lists.llvm.org/pipermail/llvm-dev/2018-May/123372.html Differential revision: https://reviews.llvm.org/D47478 rdar://39863705 llvm-svn: 333421
* [X86] Scalar mask and scalar move optimizationsAlexander Ivchenko2018-05-294-135/+256
| | | | | | | | | | | | | | 1. Introduction of mask scalar TableGen patterns. 2. Introduction of new scalar move TableGen patterns and refactoring of existing ones. 3. Folding of pattern created by introducing scalar masking in Clang header files. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D47012 llvm-svn: 333419
* [PowerPC] Fix the incorrect iterator inside peepholeLei Huang2018-05-291-6/+3
| | | | | | | | | | | | Instruction selection can insert nodes into the underlying list after the root node so iterating will thereby miss it. We should NOT assume that, the root node is the last element in the DAG nodelist. Patch by: steven.zhang (Qing Shan Zhang) Differential Revision: https://reviews.llvm.org/D47437 llvm-svn: 333415
* [AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions.Sander de Smalen2018-05-292-1/+47
| | | | | | | | | | | | | | | | | | | | | This patch addresses the following variants: - bitmask immediate, e.g. 'and z0.d, z0.d, #0x6'. - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'. - predicated data vectors, e.g. 'and z0.d, p0/m, z0.d, z1.d'. And also several aliases, such as: - ORN, alias of ORR. - EON, alias of EOR. - BIC, alias of AND (immediate variant) - MOV, alias of ORR (if unpredicated and source register operands are the same) Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47363 llvm-svn: 333414
* [AArch64] added FP16 vcvth intrinsic supportLuke Geeson2018-05-292-5/+36
| | | | | | | | | | | | | | Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705 Reviewers: javed.absar, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: llvm-commits, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46311 llvm-svn: 333410
* [mips] Emit R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_LO16 / HI16 ↵Simon Atanasyan2018-05-294-12/+36
| | | | | | | | | | | | | relocations Emit R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_LO16 and R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_HI16 chains of relocations for %lo(%neg(%gp_rel())) and %hi(%neg(%gp_rel())) expressions in case of microMIPS. Differential Revision: http://reviews.llvm.org/D47220 llvm-svn: 333409
* [AArch64][SVE] Asm: Support for ADD (immediate) instructions.Sander de Smalen2018-05-294-15/+90
| | | | | | | | | | | | | | | | | | | This patch adds addsub_imm8_opt_lsl_(i8|i16|i32|i64) operands that are unsigned values in the range 0 to 255. For element widths of 16 bits or higher it may also be a signed multiple of 256 in the range 0 to 65280. Note: This also does some refactoring to reuse convenience function getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be accepted to be mapped to SUB #4096. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47310 llvm-svn: 333408
* [mips] Emit R_MICROMIPS_HIGHER / R_MICROMIPS_HIGHEST relocationsSimon Atanasyan2018-05-294-4/+18
| | | | | | | | | | | Emit R_MICROMIPS_HIGHER / R_MICROMIPS_HIGHEST relocations for %higher() and %highest() expressions in case of microMIPS. These relocations do exactly the same things as R_MIPS_HIGHER / R_MIPS_HIGHEST, but for consistency it's better to write microMIPS variants. Differential Revision: http://reviews.llvm.org/D47219 llvm-svn: 333407
* [mips] Correct the predicates for a number of instructions.Simon Dardis2018-05-291-21/+25
| | | | | | | | | | Previously, their listed predicates were overridden at the scope level. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D46947 llvm-svn: 333405
* [mips] Cleanup the code to reduce diff with the upcoming patches. NFCSimon Atanasyan2018-05-291-10/+10
| | | | llvm-svn: 333404
* [mips] Escape else-after-return. NFCSimon Atanasyan2018-05-291-62/+63
| | | | llvm-svn: 333403
* [mips] Stop parsing a .set assignment if the first argument is not an identifierSimon Atanasyan2018-05-291-3/+2
| | | | | | | | | | | | | | Before this fix the following code triggers two error messages. The second one is at least useless: test.s:1:9: error: expected identifier after .set .set 123, $a0 ^ test-set.s:1:9: error: unexpected token, expected comma .set 123, $a0 ^ llvm-svn: 333402
* [AMDGPU] Fixed build warningTim Renouf2018-05-291-4/+3
| | | | | | | | | | | | Summary: V2: Use cast instead of extra if. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47426 Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a llvm-svn: 333397
* [X86] Disable a DAG combine to allow packed AVX512DQ instructions to be ↵Craig Topper2018-05-291-0/+5
| | | | | | | | | | | | | | | | consistently used for i64->float/double conversions. Summary: We already get this right if the i64 didn't come from a load. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47439 llvm-svn: 333393
* [X86][Sched] Add InstRW for CLC on Intel after SNB.Clement Courbet2018-05-295-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: After SNB, Intel CPUs can rename CF independently of other EFLAGS, so the renamer can zero it for free. Note that STC still consumes resources. To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC` On SNB: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: sandybridge llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.0014, debug_string: SBPort0 } - { key: '4', value: 0.0013, debug_string: SBPort1 } - { key: '5', value: 0.0003, debug_string: SBPort4 } - { key: '6', value: 0.0029, debug_string: SBPort5 } - { key: '10', value: 0.0003, debug_string: SBPort23 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` On HSW: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.001, debug_string: HWPort0 } - { key: '4', value: 0.0009, debug_string: HWPort1 } - { key: '5', value: 0.0004, debug_string: HWPort2 } - { key: '6', value: 0.0006, debug_string: HWPort3 } - { key: '7', value: 0.0002, debug_string: HWPort4 } - { key: '8', value: 0.0012, debug_string: HWPort5 } - { key: '9', value: 0.0022, debug_string: HWPort6 } - { key: '10', value: 0.0001, debug_string: HWPort7 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` Reviewers: craig.topper, RKSimon Subscribers: gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D47362 llvm-svn: 333392
* [X86] Remove masked vpermi2var/vpermt2var intrinsics and autoupgrade.Craig Topper2018-05-292-150/+1
| | | | | | We have unmasked intrinsics now and wrap them with a select. This is a net reduction of 36 intrinsics from before the unmasked intrinsics were added. llvm-svn: 333388
* [X86] Add unmasked vermi2var intrinsics so we can use explicit select ↵Craig Topper2018-05-291-0/+18
| | | | | | | | instructions for masking in clang. This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch. llvm-svn: 333386
* [X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode.Craig Topper2018-05-285-82/+73
| | | | | | | | | | These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node. This is a step towards reducing the number of intrinsics needed. A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial. llvm-svn: 333383
* [X86] Fix typo in comment. NFCCraig Topper2018-05-281-2/+2
| | | | llvm-svn: 333382
* [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by ↵Farhana Aleen2018-05-281-5/+3
| | | | | | | | | | default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380
* [Power9]Legalize and emit code for HW/Byte vector extract and convert to QPLei Huang2018-05-281-0/+63
| | | | | | | | | Implemente patterns to extract HWord and Byte vector elements and convert to quad-precision. Differential Revision: https://reviews.llvm.org/D46774 llvm-svn: 333377
* [PowerPC] Set isAsmParserOnly=1 for X-form TLS loads/storesZaara Syeda2018-05-282-6/+33
| | | | | | | | | | | | The X-form TLS load/store instructions added for optimizing the initial-exec sequence in https://reviews.llvm.org/rL327635 fail to assemble. llvm-mc fails with the error: invalid operand for instruction. This patch adds these instructions into a block with isAsmParserOnly, similar to how ADD8TLS_ is currently handled. Differential Revision: https://reviews.llvm.org/D47382 llvm-svn: 333374
* [Sparc] Add .uahalf and .uaword directivesDaniel Cederman2018-05-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: Adding these makes it easier to assemble the output from GCC which generates a lot of .uahalf and .uaword directives. GAS treats .uahalf and .half the same unless the --enforce-aligned-data flag is used. I could not find a similar flag for LLVM so it seems that .half does not have any alignment requirement and is treated the same as .uahalf should be. If that would change later on then the tests in sparc-directives.s would fail due to bad alignment. Reviewers: jyknight, asb Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47319 llvm-svn: 333372
* [X86] Stop forcing X86VPermi2X node index operand to match destination type ↵Craig Topper2018-05-283-40/+93
| | | | | | | | | | to make masking pattern matching easier. Add extra patterns with bitcasts instead. This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future. This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed. llvm-svn: 333365
* [AMDGPU] Fixed WWM bug in block otherwise entirely in WQMTim Renouf2018-05-271-0/+5
| | | | | | | | | | | | | | | | Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 llvm-svn: 333362
* [X86] Don't hardcode scheduler classSimon Pilgrim2018-05-271-21/+24
| | | | | | Also fixes BEXTRI instruction to use WritBEXTR class, which was missed when the class was added. llvm-svn: 333360
* Revert 333358 as it's failing on some builders.David Green2018-05-271-2/+0
| | | | | | I'm guessing the tests reply on the ARM backend being built. llvm-svn: 333359
* [UnrollAndJam] Add a new Unroll and Jam passDavid Green2018-05-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now-jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 333358
* Remove boolean argument from isSuitableFromBSS.Eric Christopher2018-05-271-8/+5
| | | | | | | | The argument was used as an additional negative condition and can be expressed in the if conditional without needing to pass it down. Update bss commentary around main use. llvm-svn: 333357
* Cleanups for getKindForGlobal:Eric Christopher2018-05-271-11/+10
| | | | | | | | - Clarify block comment - Make Function/GlobalVariable split more explicit. - Move locals closer to uses. llvm-svn: 333356
* [X86] Remove masking from avx512ifma intrinsics. Use a select instead.Craig Topper2018-05-262-44/+11
| | | | | | This allows us to avoid having mask and maskz variant. Reducing from 12 intrinsics to 6. llvm-svn: 333346
* Replace AA's uses of uint64_t with LocationSize; NFC.George Burgess IV2018-05-251-1/+1
| | | | | | | | | | | | | | | | The uint64_ts that we pass around AA to represent MemoryLocation sizes are logically an Optional<uint64_t>. In D44748, we want to add an extra 'imprecise' bit to this Optional<uint64_t> to represent whether a given MemoryLocation size is an upper-bound or an exact size. For more context on why, please see D44748. That patch is quite large, but reviewers seem to be OK with the approach. In D45581 (my first attempt to split 'noise' out of D44748), reames asked that I land a precursor that is solely replacing uint64_t with LocationSize, which starts out as `using LocationSize = uint64_t;`. He also gave me the OK to submit this rename without further review. llvm-svn: 333314
* [AMDGPU][Waitcnt] Remove obsolete waitcnt optionMark Searles2018-05-251-6/+0
| | | | | | | | With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 llvm-svn: 333303
* [AMDGPU] Fixed test failure with AMDGPUPerfHintStanislav Mekhanoshin2018-05-251-8/+7
| | | | | | | We shall not keep iterator to a map while map is modified, this leads to a broken map. llvm-svn: 333298
* Fix -Winconsistent-missing-overrides in AMDGPU codeReid Kleckner2018-05-251-1/+1
| | | | llvm-svn: 333291
* [AMDGPU] Add perf hints to functionsStanislav Mekhanoshin2018-05-2510-6/+516
| | | | | | | | | | | | | | | This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289
* [mips] Fix the definitions of lwp, swpSimon Dardis2018-05-255-68/+26
| | | | | | | | | | | | | Rather than using a regpair operand of these instructions, use two seperate operands and a custom converter to handle the implicit second register operand. Additionally, remove the microMIPS32R6 definition as its redundant. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D47255 llvm-svn: 333288
* [Hexagon] Fix packing source vectors in shufflevector selection Krzysztof Parzyszek2018-05-251-3/+9
| | | | | | | | When the shuffle mask selected a subvector of the second input vector, and aligning of the source was performed, the shuffle mask was updated incorrectly, resulting in an ICE further in the selection process. llvm-svn: 333279
* [X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286)Simon Pilgrim2018-05-251-9/+1
| | | | | | | | As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models...... llvm-svn: 333271
* Fix ubsan errors introduced by r333263 re. left-shifting negative values.Sander de Smalen2018-05-251-2/+3
| | | | llvm-svn: 333270
* [AArch64][SVE] Asm: Support for DUP (immediate) instructions.Sander de Smalen2018-05-259-30/+256
| | | | | | | | | | | | | | | | | | | | | | | | | | Unpredicated copy of optionally-shifted immediate to SVE vector, along with MOV-aliases. This patch contains parsing and printing support for cpy_imm8_opt_lsl_(i8|i16|i32|i64). This operand allows a signed value in the range -128 to +127. For element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512. For element-width of 8 bits a range of -128 to 255 is accepted, since a copy of a byte can be considered either signed/unsigned. Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift() and moves the behaviour of trying to shift a plain immediate by an allowed shift-value to its addImmWithOptionalShiftOperands() method, so that the parsing itself is generic and allows immediates from multiple shifted operands. This is done because an immediate can be divisible by both shifted operands. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47309 llvm-svn: 333263
* [SystemZ] Bugfix in combineSTORE().Jonas Paulsson2018-05-251-1/+1
| | | | | | | | Remember to check if store is truncating before calling combineTruncateExtract(). Review: Ulrich Weigand llvm-svn: 333262
* [AMDGPU] Fixed incorrect break from loopTim Renouf2018-05-251-2/+40
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258
* [x86] invpcid LLVM intrinsicGabor Buella2018-05-255-3/+26
| | | | | | | | | | | | | | | Re-add the feature flag for invpcid, which was removed in r294561. Add an intrinsic, which always uses a 32 bit integer as first argument, while the instruction actually uses a 64 bit register in 64 bit mode for the INVPCID_TYPE argument. Reviewers: craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47141 llvm-svn: 333255
* AMDGPU: Remove AMDGPUMCInstLower.hTom Stellard2018-05-253-48/+23
| | | | | | | | | | | | | | | | Summary: The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp, so we don't need a header file. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47264 llvm-svn: 333254
* AMDGPU: Split R600 AsmPrinter code into its own classTom Stellard2018-05-246-161/+303
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47245 llvm-svn: 333219
* [AArch64] Improve orr+movk sequences for MOVi64imm.Eli Friedman2018-05-241-115/+96
| | | | | | | | | | | | | | The existing code has three different ways to try to lower a 64-bit immediate to the sequence ORR+MOVK. The result is messy: it misses some possible sequences, and the order of the checks means we sometimes emit two MOVKs when we only need one. Instead, just use a simple loop to try all possible two-instruction ORR+MOVK sequences. Differential Revision: https://reviews.llvm.org/D47176 llvm-svn: 333218
* [AArch64] Take advantage of variable shift/rotate amount implicit mod operation.Geoff Berry2018-05-241-0/+111
| | | | | | | | | | | | | | | | Summary: Optimize code generated for variable shifts/rotates by taking advantage of the implicit and/mod done on the variable shift amount register. Resolves bug 27582 and bug 37421. Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46844 llvm-svn: 333214
* [X86][SSE] Pull out (AND (XOR X, -1), Y) matching into a helper function. NFC.Simon Pilgrim2018-05-241-11/+26
| | | | llvm-svn: 333201
* Fix unused variable warnings. NFCI.Simon Pilgrim2018-05-241-3/+0
| | | | llvm-svn: 333195
OpenPOWER on IntegriCloud