summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][AVX512BW] Add support for v64i8 multipliesSimon Pilgrim2016-04-101-3/+38
| | | | | | | | | | Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets. I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL. Differential Revision: http://reviews.llvm.org/D18937 llvm-svn: 265902
* [X86] Use for loops over types to reduce code for setting up operation actions.Craig Topper2016-04-101-150/+97
| | | | llvm-svn: 265893
* [X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is ↵Craig Topper2016-04-101-2/+0
| | | | | | suppored. This is already done for SSE2/AVX2 which VLX implies. NFC llvm-svn: 265892
* [MC] support TLSDESC and TLSCALL / GNU2 tls dialectDavide Italiano2016-04-091-0/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D18885 llvm-svn: 265881
* [x86] use BMI 'andn' for logic + compare ops Sanjay Patel2016-04-091-3/+13
| | | | | | | | | | With BMI, we can use 'andn' to save an instruction when the result is only used in a compare. This is related to one of the potential sequences to check 'isfinite' in: https://llvm.org/bugs/show_bug.cgi?id=27164 Differential Revision: http://reviews.llvm.org/D18910 llvm-svn: 265875
* [X86][XOP] Support for VPPERM 2-input shuffle mask decodingSimon Pilgrim2016-04-093-27/+127
| | | | | | | | | | This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle. The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp. Differential Revision: http://reviews.llvm.org/D18441 llvm-svn: 265874
* [X86] Use for loops over types to reduce code for setting up operation ↵Craig Topper2016-04-091-149/+78
| | | | | | actions. NFC llvm-svn: 265871
* [X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some ↵Craig Topper2016-04-091-16/+0
| | | | | | vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC llvm-svn: 265870
* [SSP] Remove llvm.stackprotectorcheck.Tim Shen2016-04-082-3/+20
| | | | | | | | | | This is a cleanup patch for SSP support in LLVM. There is no functional change. llvm.stackprotectorcheck is not needed, because SelectionDAG isn't actually lowering it in SelectBasicBlock; rather, it adds check code in FinishBasicBlock, ignoring the position where the intrinsic is inserted (See FindSplitPointForStackProtector()). llvm-svn: 265851
* Rangeify a loop. NFC.Hans Wennborg2016-04-081-4/+3
| | | | llvm-svn: 265846
* Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOCHans Wennborg2016-04-081-3/+0
| | | | | | These are already defined, with the same values, a few lines up. NFC. llvm-svn: 265845
* [X86] Fix PR23155 by turning on X86FixupBWInsts by default.Kevin B. Smith2016-04-081-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D18866 llvm-svn: 265830
* [X86] Tidied up shuffle decode function doxygen descriptionsSimon Pilgrim2016-04-082-30/+41
| | | | | | As discussed on D18441 - auto brief is used so we don't need /brief, we don't need to include the function name and added some missing descriptions. llvm-svn: 265785
* [X86]: Fix for PR27251.Kevin B. Smith2016-04-071-3/+18
| | | | | | Differential Revision: http://reviews.llvm.org/D18850 llvm-svn: 265690
* [X86][SSE] Add support for VZEXT constant foldingSimon Pilgrim2016-04-071-0/+18
| | | | llvm-svn: 265646
* [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.Ahmed Bougacha2016-04-071-15/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636
* Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-04-071-14/+34
| | | | | | | | | | | | | | | | eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623
* NFC: make AtomicOrdering an enum classJF Bastien2016-04-061-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In the context of http://wg21.link/lwg2445 C++ uses the concept of 'stronger' ordering but doesn't define it properly. This should be fixed in C++17 barring a small question that's still open. The code currently plays fast and loose with the AtomicOrdering enum. Using an enum class is one step towards tightening things. I later also want to tighten related enums, such as clang's AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI' enum). This change touches a few lines of code which can be improved later, I'd like to keep it as NFC for now as it's already quite complex. I have related changes for clang. As a follow-up I'll add: bool operator<(AtomicOrdering, AtomicOrdering) = delete; bool operator>(AtomicOrdering, AtomicOrdering) = delete; bool operator<=(AtomicOrdering, AtomicOrdering) = delete; bool operator>=(AtomicOrdering, AtomicOrdering) = delete; This is separate so that clang and LLVM changes don't need to be in sync. Reviewers: jyknight, reames Subscribers: jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D18775 llvm-svn: 265602
* Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."Hans Wennborg2016-04-061-76/+15
| | | | | | It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559
* Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-04-061-26/+13
| | | | | | | | | eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551
* Faster stack-protector for Android/AArch64.Evgeniy Stepanov2016-04-052-11/+10
| | | | | | | Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481
* Swift Calling Convention: add swiftcc.Manman Ren2016-04-052-0/+22
| | | | | | Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480
* Revert "Fix Clang-tidy modernize-deprecated-headers warnings in remaining ↵Duncan P. N. Exon Smith2016-04-051-38/+33
| | | | | | | | | | files; other minor fixes." This reverts commit r265454 since it broke the build. E.g.: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/22413/ llvm-svn: 265459
* Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; ↵Eugene Zelenko2016-04-051-33/+38
| | | | | | | | | | | | other minor fixes. Some Include What You Use suggestions were used too. Use anonymous namespaces in source files. Differential revision: http://reviews.llvm.org/D18778 llvm-svn: 265454
* [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.Ahmed Bougacha2016-04-051-15/+76
| | | | | | | | | | | | | | | | | | | | We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450
* [X86] Simplify early-exit check. NFC.Ahmed Bougacha2016-04-051-4/+4
| | | | llvm-svn: 265447
* fix typo; NFCSanjay Patel2016-04-051-1/+1
| | | | llvm-svn: 265442
* Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-04-041-13/+26
| | | | | | | | | | | | | | | | | | | | eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345
* ARM, AArch64, X86: Check preserved registers for tail calls.Matthias Braun2016-04-041-7/+7
| | | | | | | | | | | | | | | | We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329
* Add MachineFunctionProperty checks for AllVRegsAllocated for target passesDerek Schuff2016-04-046-0/+30
| | | | | | | | | | | | | | Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 llvm-svn: 265313
* AVX-512: Truncating store for i1 vectorsElena Demikhovsky2016-04-041-1/+62
| | | | | | | | | Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283
* [X86] Removed duplicate code.Simon Pilgrim2016-04-031-5/+5
| | | | llvm-svn: 265274
* [X86][SSE] Support for MOVMSK signbit extraction instructionsSimon Pilgrim2016-04-035-45/+32
| | | | | | | | | | Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR. This is an early step towards more optimal handling of vector comparison results. Differential Revision: http://reviews.llvm.org/D18741 llvm-svn: 265266
* [X86] Tidied up X86ISD instruction nodes. NFCI.Simon Pilgrim2016-04-031-50/+59
| | | | | | | | Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related. No change in ordering although there is definitely some scope for it. llvm-svn: 265263
* AVX-512: Load and Extended Load for i1 vectorsElena Demikhovsky2016-04-032-10/+122
| | | | | | | | | | Implemented load+{sign|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-1/+2
| | | | | | | | | | | | | | | | | Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-6/+8
| | | | | | | | | | | | | | | | | | | | Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148
* [x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type.Andrea Di Biagio2016-04-011-1/+0
| | | | | | | Since revision 235394, we no longer perform target specific combines on build_vector nodes. No functional change intended. llvm-svn: 265138
* [X86] Introduce Lakemont CPU.Andrey Turetskiy2016-04-011-0/+3
| | | | | | | | Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128
* Use range-based for loops. NFC.Michael Kuperstein2016-04-011-6/+5
| | | | llvm-svn: 265105
* Follow-up to r265036: I got these iterators mixed upHans Wennborg2016-03-311-2/+2
| | | | llvm-svn: 265076
* Revert r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-03-311-19/+12
| | | | | | | | | | eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046
* [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr ↵Hans Wennborg2016-03-311-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039
* Change eliminateCallFramePseudoInstr() to return an iteratorHans Wennborg2016-03-312-8/+11
| | | | | | | | | | | | | | | | | | | | | This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036
* [x86] use SSE/AVX ops for non-zero memsets (PR27100)Sanjay Patel2016-03-311-5/+7
| | | | | | | | | | | | | | Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029
* Prevent X86ISelLowering from merging volatile loadsNirav Dave2016-03-311-7/+7
| | | | | | | | | | | | | Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013
* [X86] Use MVT instead of EVT in code called after legalization.Craig Topper2016-03-311-3/+3
| | | | llvm-svn: 264992
* [X86] Enable call frame optimization ("mov to push") not only for optsize ↵Hans Wennborg2016-03-301-4/+0
| | | | | | | | | | (PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966
* CodeGen: Factor out code for tail call result compatibility check; NFCMatthias Braun2016-03-301-35/+8
| | | | llvm-svn: 264959
* Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵Aaron Ballman2016-03-301-4/+4
| | | | | | "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
OpenPOWER on IntegriCloud