summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [GlobalISel][X86] Support G_GLOBAL_VALUE operation.Igor Breger2017-07-022-8/+64
| | | | | | | | | | | | | | | | | Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34738 Conflicts: test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir llvm-svn: 306972
* [GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection.Igor Breger2017-07-021-0/+31
| | | | | | | | | | | | | | | | Summary: Support vector type G_UNMERGE_VALUES selection. For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Reviewers: t.p.northover, qcolombet, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33665 llvm-svn: 306971
* fix trivial typos; NFCHiroshi Inoue2017-07-022-2/+2
| | | | | | suport -> support llvm-svn: 306968
* [X86] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-012-47/+55
| | | | | | NFC llvm-svn: 306921
* Make 0 argument getSubtargetImpl functions for the X86, AArch64, and PPC ↵Eric Christopher2017-06-301-0/+1
| | | | | | targets deleted so that no one is tempted to use them. llvm-svn: 306864
* [X86][SSE] Pulled common variables to top of matchUnaryPermuteVectorShuffle. ↵Simon Pilgrim2017-06-301-5/+4
| | | | | | NFCI. llvm-svn: 306847
* Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"Daniel Jasper2017-06-293-131/+8
| | | | | | | | | | I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676
* [GlobalISel][X86] Support vector type G_MERGE_VALUES selection.Igor Breger2017-06-291-0/+53
| | | | | | | | | | | | | | | | Summary: Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33958 llvm-svn: 306665
* [LLVM][X86][Goldmont] Adding new target-cpu: GoldmontMichael Zuckerman2017-06-292-1/+31
| | | | | | | | | | | | | | | | [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658
* Reuse existing variables. NFC.Rafael Espindola2017-06-281-11/+9
| | | | llvm-svn: 306586
* Fix PR33625.Rafael Espindola2017-06-281-1/+1
| | | | | | We were failing to convert this expression to pcrel. llvm-svn: 306573
* [GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XORIgor Breger2017-06-281-2/+2
| | | | | | | | | | | | | | Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: aymanmus Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34605 llvm-svn: 306533
* Reverting commit 306414 on behalf of @gadi.haberMichael Zuckerman2017-06-282-5225/+1634
| | | | llvm-svn: 306532
* [X86] Correct dwarf unwind information in function epiloguePetar Jovanovic2017-06-283-8/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529
* [X86][AsmParser][MS-compatability] Binary/Unary operators enhancementsCoby Tayree2017-06-271-37/+75
| | | | | | | | | | | Introducing MOD binary operator https://msdn.microsoft.com/en-us/library/hha180wt.aspx Enhancing unary operators NEG and NOT, to support more complex patterns Differential Revision: https://reviews.llvm.org/D33876 llvm-svn: 306425
* Updated and extended the information about each instruction in HSW and SNB ↵Gadi Haber2017-06-272-1634/+5225
| | | | | | | | | | | | | | | | | | | to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414
* Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371Ayman Musa2017-06-272-25/+715
| | | | | | | | | | | | | | | [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402
* Fixed the warning introduced by r306289 to make ubuntu-gcc7.1-werror bot green.Galina Kistanova2017-06-271-1/+1
| | | | llvm-svn: 306369
* AArch64: legalize G_EXTRACT operations.Tim Northover2017-06-261-4/+12
| | | | | | | This is the dual problem to legalizing G_INSERTs so most of the code and testing was cribbed from there. llvm-svn: 306328
* [inline asm] dot operator while using imm generates wrong ir + asm - llvm partMarina Yatsina2017-06-261-2/+1
| | | | | | | | | | | | | | | | | Inline asm dot operator while using imm generates wrong ir and asm This also fixes bugzilla 32987: https://bugs.llvm.org//show_bug.cgi?id=32987 The clang part of the review that contains the test can be found here: https://reviews.llvm.org/D33040 commit on behald of zizhar Differential Revision: https://reviews.llvm.org/D33039 llvm-svn: 306300
* [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.Ahmed Bougacha2017-06-261-12/+12
| | | | | | | | | | | | The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299
* [x86] transform vector inc/dec to use -1 constant (PR33483)Sanjay Patel2017-06-261-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289
* [X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memopsSimon Pilgrim2017-06-251-10/+0
| | | | | | The 'scalar' simd bitops were dropped a while ago llvm-svn: 306248
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-06-252-2/+2
| | | | llvm-svn: 306247
* [GlobalISel][X86] Support vector type G_EXTRACT selection.Igor Breger2017-06-251-0/+104
| | | | | | | | | | | | | | | | Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240
* [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2Dorit Nuzman2017-06-252-0/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238
* Remove redundant argument.Rafael Espindola2017-06-241-1/+1
| | | | llvm-svn: 306189
* ARM: move some logic from processFixupValue to applyFixup.Rafael Espindola2017-06-231-1/+2
| | | | | | | | | | | | processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177
* [X86] Fix SP adjustment in stack probes emitted on 32-bit Windows.whitequark2017-06-231-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit r306010 adjusted the condition as follows: - if (Is64Bit) { + if (!STI.isTargetWin32()) { The intent was to preserve the behavior on all Windows platforms but extend the behavior on 64-bit Windows platforms to every other one. (Before r306010, emitStackProbeCall only ever executed when emitting code for Windows triples.) Unfortunately, if (Is64Bit && STI.isOSWindows()) is not the same as if (!STI.isTargetWin32()) because of the way isTargetWin32() is defined: bool isTargetWin32() const { return !In64BitMode && (isTargetCygMing() || isTargetKnownWindowsMSVC()); } In practice this broke the JIT tests on 32-bit Windows, which did not satisfy the new condition: LLVM :: ExecutionEngine/MCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/MCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/MCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/MCJIT/test-loadstore.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/OrcMCJIT/test-loadstore.ll because %esp was not updated correctly. The failures are only visible on a MSVC 2017 Debug build, for which we do not have bots. llvm-svn: 306142
* [x86] fix value types for SBB transform (PR33560)Sanjay Patel2017-06-231-8/+13
| | | | | | | | | | | I'm not sure yet why this wouldn't fail in the simple case, but clearly I used the wrong value type with: https://reviews.llvm.org/rL306040 ...and the bug manifests with: https://bugs.llvm.org/show_bug.cgi?id=33560 llvm-svn: 306139
* Remove trailing whitespace. NFCI.Simon Pilgrim2017-06-231-1/+1
| | | | llvm-svn: 306121
* COFF: Produce an error on invalid pcrel relocs.Rafael Espindola2017-06-231-4/+13
| | | | | | | | | | X86_64 COFF only has support for 32 bit pcrel relocations. Produce an error on all others. Note that gnu as has extended the relocation values to support this. It is not clear if we should support the gnu extension. llvm-svn: 306082
* Fixed a (product) build error that was due to an unused variableFarhana Aleen2017-06-221-2/+1
| | | | | | | | | | | | | Details: There was a use but it was in the assert which was not exercised during product build. Reviewers: Andrew Kaylor Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306073
* [x86] add/sub (X==0) --> sbb(cmp X, 1)Sanjay Patel2017-06-221-5/+17
| | | | | | | | | | | | | | | | | | | | | | This is very similar to the transform in: https://reviews.llvm.org/rL306040 ...but in this case, we use cmp X, 1 to set the carry bit as needed. Again, we can show that all of these are logically equivalent (although InstCombine currently canonicalizes to a form not seen here), and if we believe IACA, then this is the smallest/fastest code. Eg, with SNB: | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | 1.0 | | | | | | | cmp edi, 0x1 | 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax The larger motivation is to clean up all select-of-constants combining/lowering because we're missing some common cases. llvm-svn: 306072
* Supported lowerInterleavedStore() in X86InterleavedAccess.Farhana Aleen2017-06-222-31/+101
| | | | | | | | | | Reviewers: RKSimon, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306068
* [AVX-512] Remove and autoupgrade the masked integer compare intrinsicsCraig Topper2017-06-221-24/+0
| | | | | | | | | | | | | | | | | Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047
* [x86] add/sub (X==0) --> sbb(neg X)Sanjay Patel2017-06-221-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480), lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb' codegen in 1 out of the 4 tests. This is a step towards smoothing that out. First, show that all of these IR forms are equivalent: http://rise4fun.com/Alive/mx Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge (later Intel and AMD chips are similar based on Agner's tables): This is the "obvious" x86 codegen (what gcc appears to produce currently): | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1* | | | | | | | | xor eax, eax | 1 | 1.0 | | | | | | CP | test edi, edi | 1 | | | | | | 1.0 | CP | setnz al | 1 | | 1.0 | | | | | CP | neg eax This is the adc version: | 1* | | | | | | | | xor eax, eax | 1 | 1.0 | | | | | | CP | cmp edi, 0x1 | 2 | | 1.0 | | | | 1.0 | CP | adc eax, 0xffffffff And this is sbb: | 1 | 1.0 | | | | | | | neg edi | 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be clearly better than the alternatives going forward. llvm-svn: 306040
* Add a common error checking for some invalid expressions.Rafael Espindola2017-06-221-2/+1
| | | | | | | This refactors a bit of duplicated code and fixes an assertion failure on ELF. llvm-svn: 306035
* [X86] Add support for "probe-stack" attributewhitequark2017-06-223-19/+37
| | | | | | | | | | | This commit adds prologue code emission for stack probe function calls. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34387 llvm-svn: 306010
* [GlobalISel][X86] Support vector type G_INSERT legalization/selection.Igor Breger2017-06-222-3/+136
| | | | | | | | | | | | | | | | Summary: Support vector type G_INSERT legalization/selection. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33956 llvm-svn: 305989
* AVX-512: Lowering Masked Gather intrinsic - fixed a bugElena Demikhovsky2017-06-225-9/+103
| | | | | | | | | | | | Masked gather for vector length 2 is lowered incorrectly for element type i32. The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD. The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal. In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well. Differential revision: https://reviews.llvm.org/D34343 llvm-svn: 305987
* Use a MutableArrayRef. NFC.Rafael Espindola2017-06-211-3/+2
| | | | llvm-svn: 305968
* [Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86)Davide Italiano2017-06-213-0/+13
| | | | | | | | Patch by Fedor Sergeev. Differential Revision: https://reviews.llvm.org/D33868 llvm-svn: 305948
* [x86] fix formatting; NFCSanjay Patel2017-06-211-15/+13
| | | | llvm-svn: 305914
* [x86] enable CGP memcmp() expansion for 2/4/8 byte sizesSanjay Patel2017-06-203-1/+13
| | | | | | | | | There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802
* [X86][SSE] Relax 0/-1 vector element insertion to work for any vector with ↵Simon Pilgrim2017-06-201-1/+2
| | | | | | | | >=16bit elements Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this llvm-svn: 305801
* [X86][SSE] Dropped old INSERT_VECTOR_ELT lowering TODOSimon Pilgrim2017-06-201-2/+0
| | | | | | Target shuffle combining now supports the matching of INSERT_VECTOR_ELT/PINSRW/PINSRB for merging multiple insertions into shuffles/bitmasks. llvm-svn: 305788
* [GlobalISel][X86] fix compilation error ( -Werror=unused-function )Igor Breger2017-06-201-2/+2
| | | | llvm-svn: 305786
* [GlobalISel][X86] Get correct RegClass for given RegBank.Igor Breger2017-06-201-17/+26
| | | | | | | | | | | | | | | | | | | Summary: In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: t.p.northover, guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33952 Conflicts: test/CodeGen/X86/GlobalISel/select-memop-scalar.mir llvm-svn: 305784
* Revert r304824 "Fix PR23384 (part 3 of 3)"Hans Wennborg2017-06-192-13/+0
| | | | | | | | | | | | | | | | | This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720
OpenPOWER on IntegriCloud