summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [mips][microMIPS] Implement SWP and LWP instructionsZoran Jovanovic2014-12-167-1/+101
| | | | | | Differential Revision: http://reviews.llvm.org/D5667 llvm-svn: 224338
* Fixing -Wsign-compare warnings; NFC.Aaron Ballman2014-12-161-1/+2
| | | | llvm-svn: 224337
* [ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanesBradley Smith2014-12-161-3/+5
| | | | | | This would result in a crash since the vcvt used does not support v8i32 types. llvm-svn: 224332
* X86: Added FeatureVectorUAMem for all AVX architectures.Elena Demikhovsky2014-12-162-16/+10
| | | | | | | | | | | | | | | | | | | | | | | According to AVX specification: "Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix. Specifically, With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, arithmetic and data processing instructions operate in a flexible environment regarding memory address alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load operation by default. Memory arguments for most instructions with VEX prefix operate normally without causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions)." The same for AVX-512. This change does not affect anything right now, because only the "memop pattern fragment" depends on FeatureVectorUAMem and it is not used in AVX patterns. All AVX patterns are based on the "unaligned load" anyway. llvm-svn: 224330
* ARM: diagnose deprecated syntaxSaleem Abdulrasool2014-12-162-1/+16
| | | | | | | | | | | | | The use of SP and PC in the register list for stores is deprecated on ARM (ARM ARM A.8.8.199): ARM deprecates the use of ARM instructions that include the SP or the PC in the list. Provide a deprecation warning from the assembler in the case that the syntax is ever seen. llvm-svn: 224319
* [PowerPC] Improve instruction selection bit-permuting operations (32-bit)Hal Finkel2014-12-162-55/+480
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerPC backend, somewhat embarrassingly, did not generate an optimal-length sequence of instructions for a 32-bit bswap. While adding a pattern for the bswap intrinsic to fix this would not have been terribly difficult, doing so would not have addressed the real problem: we had been generating poor code for many bit-permuting operations (by which I mean things like byte-swap that permute the bits of one or more inputs around in various ways). Here are some initial steps toward solving this deficiency. Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL, SHL, SRL, AND and OR (mostly with constant second operands). Looking back through these operations, we can build up a description of the bits in the resulting value in terms of bits of one or more input values (and constant zeros). For each bit, we compute the rotation amount from the original value, and then group consecutive (value, rotation factor) bits into groups. Groups sharing these attributes are then collected and sorted, and we can then instruction select the entire permutation using a combination of masked rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts (rlwimi). The result is that instead of lowering an i32 bswap as: rlwinm 5, 3, 24, 16, 23 rlwinm 4, 3, 24, 0, 7 rlwimi 4, 3, 8, 8, 15 rlwimi 5, 3, 8, 24, 31 rlwimi 4, 5, 0, 16, 31 we now produce: rlwinm 4, 3, 8, 0, 31 rlwimi 4, 3, 24, 16, 23 rlwimi 4, 3, 24, 0, 7 and for the 'test6' example in the PowerPC/README.txt file: unsigned test6(unsigned x) { return ((x & 0x00FF0000) >> 16) | ((x & 0x000000FF) << 16); } we used to produce: lis 4, 255 rlwinm 3, 3, 16, 0, 31 ori 4, 4, 255 and 3, 3, 4 and now we produce: rlwinm 4, 3, 16, 24, 31 rlwimi 4, 3, 16, 8, 15 and, as a nice bonus, this fixes the FIXME in test/CodeGen/PowerPC/rlwimi-and.ll. This commit does not include instruction-selection for i64 operations, those will come later. llvm-svn: 224318
* ARM: 80-columnSaleem Abdulrasool2014-12-161-4/+5
| | | | | | clang-format a function with an overly long string constant. NFC. llvm-svn: 224314
* ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.Adrian Prantl2014-12-162-12/+25
| | | | | | | | | | Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224294
* [Hexagon] Adding doubleword multiplies with and without accumulation.Colin LeMahieu2014-12-162-0/+136
| | | | llvm-svn: 224293
* [Hexagon] Adding halfword to doubleword multiplies.Colin LeMahieu2014-12-151-0/+59
| | | | llvm-svn: 224289
* [Hexagon] Adding logical-logical accumulation instructions and tests.Colin LeMahieu2014-12-151-19/+40
| | | | llvm-svn: 224288
* x86: Emit LOCK prefix after DATA16JF Bastien2014-12-151-4/+6
| | | | | | | | | | | | | | Summary: x86 allows either ordering for the LOCK and DATA16 prefixes, but using GCC+GAS leads to different code generation than using LLVM. This change matches the order that GAS emits the x86 prefixes when a semicolon isn't used in inline assembly (see tc-i386.c comment before define LOCK_PREFIX), and helps simplify tooling that operates on the instruction's byte sequence (such as NaCl's validator). This change shouldn't have any performance impact. Test Plan: ninja check Reviewers: craig.topper, jvoung Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D6630 llvm-svn: 224283
* [Hexagon] Adding a number of additional multiply forms with tests.Colin LeMahieu2014-12-151-11/+126
| | | | llvm-svn: 224282
* [Hexagon] Adding misc multiply encodings and tests.Colin LeMahieu2014-12-151-0/+48
| | | | llvm-svn: 224273
* [Hexagon] Adding doubleworld accumulating multiplies of halfwords.Colin LeMahieu2014-12-151-0/+74
| | | | llvm-svn: 224267
* [Hexagon] Adding accumulating half word multiplies.Colin LeMahieu2014-12-151-0/+105
| | | | llvm-svn: 224266
* [Hexagon] Adding multiply with rnd/sat/rndsatColin LeMahieu2014-12-151-0/+46
| | | | llvm-svn: 224265
* [Hexagon] Adding encoding bits for halfword multiplies.Colin LeMahieu2014-12-151-0/+39
| | | | llvm-svn: 224261
* [X86] Also pretty-print shuffle mask for INSERTPS rm variants.Ahmed Bougacha2014-12-151-3/+7
| | | | llvm-svn: 224260
* Silence more static analyzer warnings.Michael Ilseman2014-12-153-2/+7
| | | | | | | | Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255
* Add disassembler tests for mips3 platform. There are no functional changes.Vladimir Medic2014-12-151-1/+2
| | | | llvm-svn: 224253
* [X86] Break false dependencies before partial register updates when the ↵Michael Kuperstein2014-12-151-0/+20
| | | | | | | | | | source operand is in memory Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list. Differential Revision: http://reviews.llvm.org/D6620 llvm-svn: 224246
* AVX-512: Added EXPAND instructions and intrinsics.Elena Demikhovsky2014-12-154-15/+150
| | | | llvm-svn: 224241
* Loop Vectorizer minor changes in the code - Elena Demikhovsky2014-12-141-5/+5
| | | | | | | | some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218
* [PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in ↵Hal Finkel2014-12-141-18/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PPCTL::DAGCombineExtBoolTrunc PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted truncations and extensions when dealing with nodes of the form: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) There was a FIXME in the implementation (now removed) regarding the fact that the function would abort the transformations if any of the non-output operands of a SELECT or SELECT_CC node would need to be promoted (because they were also output operands, for example). As a result, we continued to generate unnecessary zero-extends for code such as this: unsigned foo(unsigned a, unsigned b) { return (a <= b) ? a : b; } which would produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 rldicl 3, 3, 0, 32 blr and now we produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 blr which is better in the obvious way. llvm-svn: 224213
* Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores."Ahmed Bougacha2014-12-131-6/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | r223862 tried to also combine base-updating load/stores. r224198 reverted it, as "it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown." Reapply, with a fix to ignore non-normal load/stores. Truncstores are handled elsewhere (you can actually write a pattern for those, whereas for postinc loads you can't, since they return two values), but it should be possible to also combine extloads base updates, by checking that the memory (rather than result) type is of the same size as the addend. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 224203
* Revert "[ARM] Combine base-updating/post-incrementing vector load/stores."Renato Golin2014-12-131-38/+6
| | | | | | | | | This reverts commit r223862, as it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown. We'll investigate the issue and re-apply when safe. llvm-svn: 224198
* [PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-extsHal Finkel2014-12-123-5/+310
| | | | | | | | | | | | | | | | | | | | On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all of the usual places, but also from the ABI, which specifies that values passed are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined to do *something* to the higher-order bits, and for some instructions, that action clears those bits (thus providing a zero-extended result). This is especially common after rotate-and-mask instructions. Adding an additional instruction to zero-extend the results of these instructions is unnecessary. This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and looks back through their operands to see if all instructions will implicitly zero extend their results. If so, we convert these instructions to their 64-bit variants (which is an internal change only, the actual encoding of these instructions is the same as the original 32-bit ones) and remove the unnecessary zero-extension (changing where the INSERT_SUBREG instructions are to make everything internally consistent). llvm-svn: 224169
* [ARMConstantIsland] Insert tbb/tbh optimization where previous jump table ↵Chad Rosier2014-12-121-1/+3
| | | | | | resided. llvm-svn: 224165
* [Hexagon] Adding double word add/min/minu/max/maxu instructions and tests.Colin LeMahieu2014-12-121-21/+63
| | | | llvm-svn: 224153
* [Hexagon] Adding J class call instructions.Colin LeMahieu2014-12-122-9/+48
| | | | llvm-svn: 224150
* [AVX512] Enabling bit logic loweringRobert Khasanov2014-12-122-0/+9
| | | | | | Added lowering tests. llvm-svn: 224132
* [mips] Enable code generation for MIPS-III.Vasileios Kalintiris2014-12-123-9/+17
| | | | | | | | | | | | | | | | | | Summary: This commit enables the MIPS-III target and adds support for code generation of SELECT nodes. We have to use pseudo-instructions with custom inserters for these nodes as MIPS-III CPUs do not have conditional-move instructions. Depends on D6212 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6464 llvm-svn: 224128
* [AVX512] Enabling MIN/MAX lowering.Robert Khasanov2014-12-122-4/+19
| | | | | | Added lowering tests. llvm-svn: 224127
* [mips] Support SELECT nodes for targets that don't have conditional-move ↵Vasileios Kalintiris2014-12-124-0/+129
| | | | | | | | | | | | | | | | | | | | | | | | instructions. Summary: For Mips targets that do not have conditional-move instructions, ie. targets before MIPS32 and MIPS-IV, we have to insert a diamond control-flow pattern in order to support SELECT nodes. In order to do that, we add pseudo-instructions with a custom inserter that emits the necessary control-flow that selects the correct value. With this patch we add complete support for code generation of Mips-II targets based on the LLVM test-suite. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6212 llvm-svn: 224124
* [AVX512] Minor fix in lowering pattern for broadcast intrustions.Robert Khasanov2014-12-121-6/+5
| | | | | | No functional change. llvm-svn: 224122
* Emit Tag_ABI_FP_16bit_format build attribute.Charlie Turner2014-12-121-0/+7
| | | | | | | | | | | | | The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet supported, there is not a user switch to change this behaviour. This build attribute should capture the default behaviour of the compiler, which is to expose the IEEE 754 version of __fp16. When -mfp16-format is emitted, that will be the way to control the value of this build attribute. Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0 llvm-svn: 224115
* R600: Fix min/max matching problems with unordered comparesMatt Arsenault2014-12-124-50/+60
| | | | | | | | The returned operand needs to be permuted for the unordered compares. Also fix incorrectly producing fmin_legacy / fmax_legacy for f64, which don't exist. llvm-svn: 224094
* R600/SI: fmin/fmax_legacy are not associativeMatt Arsenault2014-12-121-2/+2
| | | | llvm-svn: 224093
* R600/SI: Don't promote f32 select to i32Matt Arsenault2014-12-122-2/+5
| | | | | | | | This is nice for the instruction patterns, but it complicates min / max matching. The select doesn't have the correct type and would require looking through the bitcasts for the real float operands. llvm-svn: 224092
* Add target hook for whether it is profitable to reduce load widthsMatt Arsenault2014-12-122-0/+26
| | | | | | | | Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084
* remove function names from comments; NFCSanjay Patel2014-12-111-29/+23
| | | | llvm-svn: 224080
* R600/SI: Handle physical registers in getOpRegClassMatt Arsenault2014-12-111-2/+7
| | | | llvm-svn: 224079
* R600/SI: Don't verify constant bus usage of flag opsMatt Arsenault2014-12-111-2/+10
| | | | | | | | | | | | This was checking if pseudo-operands like the source modifiers were using the constant bus, which happens to work because the values these all can be happen to be valid inline immediates. This fixes a later commit which starts checking the register class of the operands. llvm-svn: 224078
* return without temporary; NFCSanjay Patel2014-12-111-4/+1
| | | | llvm-svn: 224076
* Enable MachineVerifier in debug mode for X86, ARM, AArch64, Mips.Matthias Braun2014-12-114-20/+20
| | | | llvm-svn: 224075
* [X86] Add a temporary testcase for PR21876/r223996.Ahmed Bougacha2014-12-111-0/+1
| | | | llvm-svn: 224074
* [PowerPC] Better lowering for add/or of a FrameIndexHal Finkel2014-12-112-30/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have an add (or an or that is really an add), where one operand is a FrameIndex and the other operand is a small constant, we can combine the lowering of the FrameIndex (which is lowered as an add of the FI and a zero offset) with the constant operand. Amusingly, this is an old potential improvement entry from lib/Target/PowerPC/README.txt which had never been resolved. In short, we used to lower: %X = alloca { i32, i32 } %Y = getelementptr {i32,i32}* %X, i32 0, i32 1 ret i32* %Y as: addi 3, 1, -8 ori 3, 3, 4 blr and now we produce: addi 3, 1, -4 blr which is much more sensible. llvm-svn: 224071
* R600/SI: Use unordered equal instructionsMatt Arsenault2014-12-112-6/+2
| | | | llvm-svn: 224067
* R600/SI: Make more unordered comparisons legalMatt Arsenault2014-12-113-18/+9
| | | | | | | This saves a second compare and an and / or by using the unordered comparison instructions. llvm-svn: 224066
OpenPOWER on IntegriCloud