summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Improve mul w/ overflow codegen, to MUL8+SETO.Ahmed Bougacha2014-10-233-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where 3 are really needed. This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to MUL8/IMUL8 + SETO. i8 is a special case because there is no two/three operand variants of (I)MUL8, so the first operand and return value need to go in AL/AX. Also, we can't write patterns for these instructions: TableGen refuses patterns where output operands don't match SDNode results. In this case, instructions where the output operand is an implicitly defined register. A related special case (and FIXME) exists for MUL8 (X86InstrArith.td): // FIXME: Used for 8-bit mul, ignore result upper 8 bits. // This probably ought to be moved to a def : Pat<> if the // syntax can be accepted. [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)] Ideally, these go away with UMUL8, but we still need to improve TableGen support of implicit operands in patterns. Before this change: movsbl %sil, %eax movsbl %dil, %ecx imull %eax, %ecx movb %cl, %al sarb $7, %al movzbl %al, %eax movzbl %ch, %esi cmpl %eax, %esi setne %al After: movb %dil, %al imulb %sil seto %al Also, remove a made-redundant testcase for PR19858, and enable more FastISel ALU-overflow tests for SelectionDAG too. Differential Revision: http://reviews.llvm.org/D5809 llvm-svn: 220516
* Add minnum / maxnum codegenMatt Arsenault2014-10-211-0/+2
| | | | llvm-svn: 220342
* X86AsmInstrumentation.cpp: Dissolve initializer-ranged-for. MSC17 disliked it.NAKAMURA Takumi2014-10-211-3/+3
| | | | llvm-svn: 220301
* [asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an ↵Yuri Gorshenin2014-10-211-63/+121
| | | | | | | | | | | | | | index register. Summary: Fixed memory accesses with rbp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5819 llvm-svn: 220283
* Fix a bit of confusion about .set and produce more readable assembly.Rafael Espindola2014-10-211-1/+2
| | | | | | | | | | | | | | | Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256
* [X86] Fix a bug in the lowering of the mask of VSELECT.Quentin Colombet2014-10-201-1/+6
| | | | | | | | | | | | | | | | X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT when it knows it can be lowered into BLEND. Indeed, only the high bits need to be set for those and it optimizes those accordingly. However, when the mask is a compile time constant, the lowering will be handled by the generic optimizer and those modifications will generate bad code in the generic optimizer. This patch fixes that by preventing the optimization if the VSELECT will be handled by the generic optimizer. <rdar://problem/18675020> llvm-svn: 220242
* [X86] Memory folding for commutative instructions (updated)Simon Pilgrim2014-10-203-59/+102
| | | | | | | | | | | | This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt. Added additional regression test case provided by Joerg Sonnenberger. Differential Revision: http://reviews.llvm.org/D5818 llvm-svn: 220239
* [X86] Fix missed selection of non-temporal store of zero vector.Andrea Di Biagio2014-10-171-0/+8
| | | | | | | | | | When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054
* [AVX512] Add DQ subvector insertsAdam Nemet2014-10-152-11/+33
| | | | | | | | | | | | | | In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874
* [AVX512] Two new attributes in X86VectorVTInfo for subvector insertAdam Nemet2014-10-152-4/+14
| | | | | | | | | The new attributes are NumElts and the CD8TupleForm. This prepares the code to enable x8 and x2 inserts. NFC, no change in X86.td.expanded except for the new attributes. llvm-svn: 219871
* [AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_sizeAdam Nemet2014-10-151-4/+4
| | | | | | | It's the W bit that selects between 32 or 64 elt type and not the opcode. The opcode selects between the width of the insert (128 or 256). llvm-svn: 219870
* Simplify handling of --noexecstack by using getNonexecutableStackSection.Rafael Espindola2014-10-151-6/+3
| | | | llvm-svn: 219799
* Move getNonexecutableStackSection up to the base ELF class.Rafael Espindola2014-10-152-8/+0
| | | | | | The .note.GNU-stack section is not SystemZ/X86 specific. llvm-svn: 219796
* [X86][SSE] pslldq/psrldq shuffle mask decodesSimon Pilgrim2014-10-143-0/+71
| | | | | | | | Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions. Differential Revision: http://reviews.llvm.org/D5598 llvm-svn: 219738
* [x86 asm] allow fwait alias in both At&t and Intel modes (PR21208)Hans Wennborg2014-10-141-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D5741 llvm-svn: 219725
* [AVX512] Extended avx512_binop_rm to DQ/VL subsets.Robert Khasanov2014-10-141-0/+2
| | | | | | Added encoding tests. llvm-svn: 219686
* [AVX512] Extended avx512_binop_rm to BW/VL subsets.Robert Khasanov2014-10-141-67/+125
| | | | | | Added encoding tests. llvm-svn: 219685
* Fix a broadcast related regression on the vector shuffle lowering.Filipe Cabecinhas2014-10-131-3/+35
| | | | | | | | | | | | Summary: Test by Robert Lougher! Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5745 llvm-svn: 219617
* [asan-asm-instrumentation] Follow-up fixes to r219602: asserts are moved intoYuri Gorshenin2014-10-131-8/+8
| | | | | | function. llvm-svn: 219610
* [asan-asm-instrumentation] Fixed memory references which includes %rsp as a ↵Yuri Gorshenin2014-10-131-111/+162
| | | | | | | | | | | | | | base or an index register. Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5599 llvm-svn: 219602
* Revert r219584, "[X86] Memory folding for commutative instructions."NAKAMURA Takumi2014-10-133-70/+51
| | | | | | It broke i686 selfhosting. llvm-svn: 219595
* [X86] Memory folding for commutative instructions.Simon Pilgrim2014-10-123-51/+70
| | | | | | | | | | This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too. Differential Revision: http://reviews.llvm.org/D5701 llvm-svn: 219584
* Test commit access (email fix)Simon Pilgrim2014-10-111-3/+4
| | | | | | Indentation tidyup. llvm-svn: 219577
* MC: Bit pack MCSymbolData.Benjamin Kramer2014-10-111-1/+1
| | | | | | | | | On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573
* Test commit accessSimon Pilgrim2014-10-111-1/+1
| | | | | | Fix comment typo + spelling. llvm-svn: 219572
* Don't use an unqualified 'abs' function call with a builtin type.Chandler Carruth2014-10-101-2/+3
| | | | | | | | | | | | | | | | This is dangerous for numerous reasons. The primary risk here is with floating point or double types where if the wrong header files are included in a strange order this can implicitly convert to integers and then call the C abs function on the integers. There is a secondary risk that even impacts integers where if the namespace the code is written in ever defines an abs overload for types within that namespace the global abs will be hidden. The correct form is to call std::abs or write 'using std::abs' for builtin types (and only the latter is correct in any generic context). I've also added the requisite header to be a bit more explicit here. llvm-svn: 219484
* [AVX512] Extended avx512_binop_rm for AVX512VL subsets.Robert Khasanov2014-10-091-53/+76
| | | | | | | Added avx512_binop_rm_vl multiclass for VL subset Added encoding tests llvm-svn: 219390
* [AVX512] Rename AVX512_masking* to AVX512_maskable*Adam Nemet2014-10-081-69/+69
| | | | | | | | | | | | | | | | No functional change. This is the current AVX512_maskable multiclass hierarchy: maskable_custom / \ / \ maskable_common maskable_in_asm / \ / \ maskable maskable_3src llvm-svn: 219363
* [AVX512] Intrinsics for vextract*x4Adam Nemet2014-10-081-0/+23
| | | | | | | | This adds the Pat<>'s for the intrinsics. These are necessary because we don't lower these intrinsics to SDNodes but match them directly. See the rational in the previous commit. llvm-svn: 219362
* [AVX512] Add asm-only support for vextract*x4 masking variantsAdam Nemet2014-10-081-7/+19
| | | | | | | | | | | | | | | | These derive from the new asm-only masking definitions. Unfortunately I wasn't able to find a ISel pattern that we could legally generate for the masking variants. The problem is that since the destination is v4* we would need VK4 register classes and v4i1 value types to express the masking. These are however not legal types/classes in AVX512f but only in VL, so things get complicated pretty quickly. We can revisit this question later if we have a more pressing need to express something like this. So the ISel patterns are empty for the masking instructions and the next patch will add Pat<>s instead to match the intrinsics calls with instructions. llvm-svn: 219361
* [AVX512] Move DAG for all-zero node to X86VectorVTInfoAdam Nemet2014-10-081-3/+6
| | | | | | | | | | No functional change. No change in X86.td.expanded except for the appearance of the new attributes. The new attributes will be used in the subsequent patch. llvm-svn: 219360
* [AVX512] Peel off an asm-only class from AVX512_masking_common.Adam Nemet2014-10-081-10/+32
| | | | | | | | | No functional change. This enables the generation of masking instructions that don't provide a ISel pattern. llvm-svn: 219358
* [X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slowRobin Morisset2014-10-081-3/+4
| | | | llvm-svn: 219357
* [X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load())Robin Morisset2014-10-081-2/+2
| | | | | | | | | | | | | | | | | Summary: I had forgotten to check for NotSlowIncDec in the patterns that can generate inc/dec for the above pattern (added in D4796). This currently applies to Atom Silvermont, KNL and SKX. Test Plan: New checks on atomic_mi.ll Reviewers: jfb, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5677 llvm-svn: 219336
* [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of ↵Robert Khasanov2014-10-082-6/+38
| | | | | | | | | | | VPCMP/VPCMPU{BWDQ} Added CMP_MASK_CC intrinsic type. Added tests for intrinsics. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 219316
* [AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking.Robert Khasanov2014-10-082-133/+60
| | | | | | | Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable. No functional change. llvm-svn: 219310
* Cache TargetLowering on SelectionDAGISel and update previousEric Christopher2014-10-081-7/+6
| | | | | | calls to getTargetLowering() with the cached variable. llvm-svn: 219284
* [X86] Fix a bug with fetch_add(INT32_MIN)Robin Morisset2014-10-071-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fix pr21099 The pseudocode of what we were doing (spread through two functions) was: if (operand.doesNotFitIn32Bits()) Opc.initializeWithFoo(); if (operand < 0) operand = -operand; if (operand.doesFitIn8Bits()) Opc.initializeWithBar(); else if (operand.doesFitIn32Bits()) Opc.initializeWithBlah(); doStuff(Opc); So for operand == INT32_MIN, Opc was never initialized because the operand changes from fitting in 32 bits to not fitting, causing the various bugs/error messages noted by pr21099. This patch adds an extra test at the beginning for this case, and an llvm_unreachable to have better error message if the operand ends up not fitting in 32-bits at the end. Test Plan: new test + make check Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5655 llvm-svn: 219257
* [asan-asm-instrumentation] CFI directives are generated for .S files.Yuri Gorshenin2014-10-073-10/+51
| | | | | | | | | | | | Summary: CFI directives are generated for .S files. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5520 llvm-svn: 219199
* [X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit ↵Craig Topper2014-10-071-0/+47
| | | | | | | | | | mode for certain instructions it shouldn't. Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present. Fixes PR21169. llvm-svn: 219194
* Formatting fixes. Most putting 'else' on the same line as the preceding ↵Craig Topper2014-10-071-38/+19
| | | | | | curly brace. llvm-svn: 219193
* Fix filename in header and use C++ version of the C header files.Craig Topper2014-10-071-5/+5
| | | | llvm-svn: 219192
* X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't ↵Benjamin Kramer2014-10-061-9/+8
| | | | | | convert them anymore. llvm-svn: 219112
* Add subtarget caches to aarch64, arm, ppc, and x86.Eric Christopher2014-10-062-0/+43
| | | | | | | | | These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106
* [x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd.Chandler Carruth2014-10-051-28/+0
| | | | | | | | | | | | | | | This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] llvm-svn: 219090
* X86: Don't drop half of the mask when converting 2-address shufps into ↵Benjamin Kramer2014-10-051-1/+1
| | | | | | | | | 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. llvm-svn: 219084
* AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q.Elena Demikhovsky2014-10-052-2/+50
| | | | | | This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083
* [x86] Fix PR21139, one of the last remaining regressions found in theChandler Carruth2014-10-051-9/+17
| | | | | | | | | | | | | new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081
* [x86] Teach the new vector shuffle lowering how to lower 128-bitChandler Carruth2014-10-051-55/+102
| | | | | | | | | | | | | | | | | | | | | | shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079
* [x86] Enable the new vector shuffle lowering by default.Chandler Carruth2014-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was *extremely little* support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and *many* others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046
OpenPOWER on IntegriCloud