summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* Fix spelling in comments. NFCI.Simon Pilgrim2017-07-061-3/+3
| | | | llvm-svn: 307288
* [X86][SSE4A] Add support for shuffle combining to INSERTQI.Simon Pilgrim2017-07-061-0/+16
| | | | llvm-svn: 307268
* [X86][SSE] combineX86ShuffleChain - merge duplicate creations of integer ↵Simon Pilgrim2017-07-061-20/+12
| | | | | | mask types llvm-svn: 307257
* [X86][SSE] combineX86ShuffleChain - merge duplicate 'Zeroable' element masksSimon Pilgrim2017-07-061-20/+12
| | | | llvm-svn: 307255
* [X86][SSE4A] Add support for shuffle combining to EXTRQ.Simon Pilgrim2017-07-061-1/+28
| | | | llvm-svn: 307254
* [X86][SSE4A] Split EXTRQ/INSERTQ shuffle matching from lowering. NFCI.Simon Pilgrim2017-07-061-99/+112
| | | | | | First step toward supporting shuffle combining to EXTRQ/INSERTQ. llvm-svn: 307250
* [GlobalISel][X86] For now don't handle not trivial function arguments lowering.Igor Breger2017-07-051-1/+11
| | | | llvm-svn: 307142
* [GlobalISel][X86] Allow graceful fallback for struct/array argument/return ↵Igor Breger2017-07-052-11/+26
| | | | | | value lowering. Going to support it in follow patch. llvm-svn: 307125
* [X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shufflesSimon Pilgrim2017-07-041-3/+3
| | | | | | With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types llvm-svn: 307099
* Fix signed/unsigned comparison warningsSimon Pilgrim2017-07-041-4/+4
| | | | llvm-svn: 307098
* [X86][SSE4A] Generalized EXTRQI/INSERTQI shuffle decodesSimon Pilgrim2017-07-044-31/+41
| | | | | | The existing decodes only worked for v16i8 vectors, this adds support for any 128-bit vector llvm-svn: 307095
* [globalisel][tablegen] Partially fix compile-time regressions by converting ↵Daniel Sanders2017-07-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | matcher to state-machine(s) Summary: Replace the matcher if-statements for each rule with a state-machine. This significantly reduces compile time, memory allocations, and cumulative memory allocation when compiling AArch64InstructionSelector.cpp.o after r303259 is recommitted. The following patches will expand on this further to fully fix the regressions. Reviewers: rovka, ab, t.p.northover, qcolombet, aditya_nandakumar Reviewed By: ab Subscribers: vitalybuka, aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33758 llvm-svn: 307079
* [X86] Add comment string for broadcast loads from the constant pool.Craig Topper2017-07-041-37/+156
| | | | | | | | | | | | | | | | | Summary: When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool. I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead. Reviewers: spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34923 llvm-svn: 307062
* [X86] Add RDRAND feature to GLM CPUCraig Topper2017-07-041-0/+1
| | | | | | | | | | | | | | Summary: I believe this should be supported on GLM since RDSEED is. Reviewers: m_zuckerman, zvi, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34828 llvm-svn: 307060
* [X86][SSE4A] Add support for combining from EXTRQI/INSERTQI shufflesSimon Pilgrim2017-07-031-0/+22
| | | | llvm-svn: 307048
* DAGCombine: Combine BUILD_VECTOR to TRUNCATEZvi Rackover2017-07-031-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add a combine for creating a truncate to replace a build_vector composed of extracts with indices that form a stride-2^N series. Example: v8i32 V = ... v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6)) --> v4i32 truncate (bitcast V to v4i64) Related discussion in llvm-dev about canonicalizing shuffles to truncates in LLVM IR: http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html. Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena Reviewed By: delena Subscribers: guyblank, delena, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34077 llvm-svn: 307036
* [GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection.Igor Breger2017-07-031-1/+2
| | | | llvm-svn: 307019
* fix trivial typos in comments; NFCHiroshi Inoue2017-07-031-1/+1
| | | | llvm-svn: 307004
* [X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOPSimon Pilgrim2017-07-021-0/+14
| | | | | | Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back. llvm-svn: 306990
* [X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles ↵Simon Pilgrim2017-07-021-32/+21
| | | | | | | | before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306978
* [X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles ↵Simon Pilgrim2017-07-021-64/+56
| | | | | | | | | | before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive The 32-bit shuffles are a bit tricky and will be dealt with in a later patch llvm-svn: 306977
* [X86][CM] update add\sub costs of vectors of 64 in X86\SLM archMohammed Agabaria2017-07-021-4/+9
| | | | | | | | | this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974
* [GlobalISel][X86] Support G_GLOBAL_VALUE operation.Igor Breger2017-07-022-8/+64
| | | | | | | | | | | | | | | | | Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34738 Conflicts: test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir llvm-svn: 306972
* [GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection.Igor Breger2017-07-021-0/+31
| | | | | | | | | | | | | | | | Summary: Support vector type G_UNMERGE_VALUES selection. For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Reviewers: t.p.northover, qcolombet, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33665 llvm-svn: 306971
* fix trivial typos; NFCHiroshi Inoue2017-07-022-2/+2
| | | | | | suport -> support llvm-svn: 306968
* [X86] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-012-47/+55
| | | | | | NFC llvm-svn: 306921
* Make 0 argument getSubtargetImpl functions for the X86, AArch64, and PPC ↵Eric Christopher2017-06-301-0/+1
| | | | | | targets deleted so that no one is tempted to use them. llvm-svn: 306864
* [X86][SSE] Pulled common variables to top of matchUnaryPermuteVectorShuffle. ↵Simon Pilgrim2017-06-301-5/+4
| | | | | | NFCI. llvm-svn: 306847
* Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"Daniel Jasper2017-06-293-131/+8
| | | | | | | | | | I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676
* [GlobalISel][X86] Support vector type G_MERGE_VALUES selection.Igor Breger2017-06-291-0/+53
| | | | | | | | | | | | | | | | Summary: Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33958 llvm-svn: 306665
* [LLVM][X86][Goldmont] Adding new target-cpu: GoldmontMichael Zuckerman2017-06-292-1/+31
| | | | | | | | | | | | | | | | [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658
* Reuse existing variables. NFC.Rafael Espindola2017-06-281-11/+9
| | | | llvm-svn: 306586
* Fix PR33625.Rafael Espindola2017-06-281-1/+1
| | | | | | We were failing to convert this expression to pcrel. llvm-svn: 306573
* [GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XORIgor Breger2017-06-281-2/+2
| | | | | | | | | | | | | | Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: aymanmus Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34605 llvm-svn: 306533
* Reverting commit 306414 on behalf of @gadi.haberMichael Zuckerman2017-06-282-5225/+1634
| | | | llvm-svn: 306532
* [X86] Correct dwarf unwind information in function epiloguePetar Jovanovic2017-06-283-8/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529
* [X86][AsmParser][MS-compatability] Binary/Unary operators enhancementsCoby Tayree2017-06-271-37/+75
| | | | | | | | | | | Introducing MOD binary operator https://msdn.microsoft.com/en-us/library/hha180wt.aspx Enhancing unary operators NEG and NOT, to support more complex patterns Differential Revision: https://reviews.llvm.org/D33876 llvm-svn: 306425
* Updated and extended the information about each instruction in HSW and SNB ↵Gadi Haber2017-06-272-1634/+5225
| | | | | | | | | | | | | | | | | | | to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414
* Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371Ayman Musa2017-06-272-25/+715
| | | | | | | | | | | | | | | [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402
* Fixed the warning introduced by r306289 to make ubuntu-gcc7.1-werror bot green.Galina Kistanova2017-06-271-1/+1
| | | | llvm-svn: 306369
* AArch64: legalize G_EXTRACT operations.Tim Northover2017-06-261-4/+12
| | | | | | | This is the dual problem to legalizing G_INSERTs so most of the code and testing was cribbed from there. llvm-svn: 306328
* [inline asm] dot operator while using imm generates wrong ir + asm - llvm partMarina Yatsina2017-06-261-2/+1
| | | | | | | | | | | | | | | | | Inline asm dot operator while using imm generates wrong ir and asm This also fixes bugzilla 32987: https://bugs.llvm.org//show_bug.cgi?id=32987 The clang part of the review that contains the test can be found here: https://reviews.llvm.org/D33040 commit on behald of zizhar Differential Revision: https://reviews.llvm.org/D33039 llvm-svn: 306300
* [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.Ahmed Bougacha2017-06-261-12/+12
| | | | | | | | | | | | The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299
* [x86] transform vector inc/dec to use -1 constant (PR33483)Sanjay Patel2017-06-261-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289
* [X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memopsSimon Pilgrim2017-06-251-10/+0
| | | | | | The 'scalar' simd bitops were dropped a while ago llvm-svn: 306248
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-06-252-2/+2
| | | | llvm-svn: 306247
* [GlobalISel][X86] Support vector type G_EXTRACT selection.Igor Breger2017-06-251-0/+104
| | | | | | | | | | | | | | | | Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240
* [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2Dorit Nuzman2017-06-252-0/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238
* Remove redundant argument.Rafael Espindola2017-06-241-1/+1
| | | | llvm-svn: 306189
* ARM: move some logic from processFixupValue to applyFixup.Rafael Espindola2017-06-231-1/+2
| | | | | | | | | | | | processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177
OpenPOWER on IntegriCloud