summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* return without temporary; NFCSanjay Patel2014-12-111-4/+1
| | | | llvm-svn: 224076
* Enable MachineVerifier in debug mode for X86, ARM, AArch64, Mips.Matthias Braun2014-12-111-3/+3
| | | | llvm-svn: 224075
* [X86] Add a temporary testcase for PR21876/r223996.Ahmed Bougacha2014-12-111-0/+1
| | | | llvm-svn: 224074
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-23/+9
| | | | | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059
* This reverts commit r224043 and r224042.Rafael Espindola2014-12-111-6/+20
| | | | | | check-llvm was failing. llvm-svn: 224045
* Enable machineverifier in debug mode for X86, ARM, AArch64, MipsMatthias Braun2014-12-111-3/+3
| | | | llvm-svn: 224043
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-23/+9
| | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042
* [AVX512] Add support for 512b variable bit shift intrinsics.Cameron McInally2014-12-113-39/+43
| | | | llvm-svn: 224028
* AVX-512: Added all forms of COMPRESS instructionElena Demikhovsky2014-12-115-6/+160
| | | | | | + intrinsics + tests llvm-svn: 224019
* [X86] When converting movs to pushes, don't assume MOVmi operand is an ↵Michael Kuperstein2014-12-111-11/+11
| | | | | | | | actual immediate This should fix PR21878. llvm-svn: 224010
* AVX-512: Fixed a bug in lowering setcc for MVT::i1 typeElena Demikhovsky2014-12-111-1/+4
| | | | llvm-svn: 224008
* [X86] Add back AVX2 VR256 PMOVX patterns.Ahmed Bougacha2014-12-111-0/+16
| | | | | | | | | We can't reach those from zext, but other parts of the backend (the shuffle lowering) generate 256-bit VZEXT nodes. Fixes PR21876. llvm-svn: 223996
* Match new shuffle codegen for MOVHPD patternsSanjay Patel2014-12-101-0/+14
| | | | | | | | | | | | Add patterns to match SSE (shufpd) and AVX (vpermilpd) shuffle codegen when storing the high element of a v2f64. The existing patterns were only checking for an unpckh type of shuffle. http://llvm.org/bugs/show_bug.cgi?id=21791 Differential Revision: http://reviews.llvm.org/D6586 llvm-svn: 223929
* [X86] Make a code path in EltsFromConsecutiveLoads work only on vectors it ↵Michael Kuperstein2014-12-101-1/+4
| | | | | | | | | | | expects EltsFromConsecutiveLoads was apparently only ever called for 128-bit vectors, and assumed this implicitly. r223518 started calling it for AVX-sized vectors, causing the code path that had this assumption to crash. This adds a check to make this path fire only for 128-bit vectors. Differential Revision: http://reviews.llvm.org/D6579 llvm-svn: 223922
* [AVX512] Added lowering for VBROADCASTSS/SD instructions.Robert Khasanov2014-12-092-1/+56
| | | | | | | Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes. Added lowering tests. llvm-svn: 223804
* [AVX512] Added VPBROADCAST{BWDQ} (Load with Broadcast Integer Data from ↵Robert Khasanov2014-12-091-23/+33
| | | | | | | | | General Purpose Register) encodings for AVX512-BW/VL subsets Added encoding tests. llvm-svn: 223787
* [x86] Fix the test to actually test things for the CPU names, add theChandler Carruth2014-12-091-0/+4
| | | | | | | | | | | | | missing barcelona CPU which that test uncovered, and remove the 32-bit x86 CPUs which I really wasn't prepared to audit and test thoroughly. If anyone wants to clean up the 32-bit only x86 CPUs, go for it. Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd be cool, but from the looks of it wouldn't save as much as it did for the Intel CPUs. llvm-svn: 223774
* Removing an unused variable to silence a -Wunused-but-set-variable warning. NFC.Aaron Ballman2014-12-091-2/+0
| | | | llvm-svn: 223773
* [x86] Bring some sanity to the x86 CPU processor definitions.Chandler Carruth2014-12-091-61/+139
| | | | | | | | | | | | | | | | | | Notably, this adds simple micro-architecture names for the Intel CPU variants, and defines the old 'core'-based names as aliases. GCC has started to simplify their documented interface to use these names as well, so it seems like we can start to converge on a consistent pattern. I'd appreciate Intel double checking the entries that aren't yet documented widely, especially Atom (Bonnell and Silvermont), Knights Landing, and Skylake. But this change shouldn't break any existing users. Also, ran clang-format to re-format this code and it actually worked (modulo a tiny bug) so hopefully we can start to stop thinking about formatting this stuff. llvm-svn: 223769
* AVX-512: Added some comments to ERI scalar intrinsics.Elena Demikhovsky2014-12-092-6/+17
| | | | | | No functional change. llvm-svn: 223761
* [X86] Convert esp-relative movs of function arguments into pushes, step 1Michael Kuperstein2014-12-092-4/+125
| | | | | | | | | | | This handles the simplest case for mov -> push conversion: 1. x86-32 calling convention, everything is passed through the stack. 2. There is no reserved call frame. 3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm. Differential Revision: http://reviews.llvm.org/D6503 llvm-svn: 223757
* [CompactUnwind] Fix register encoding logicBruno Cardoso Lopes2014-12-081-1/+1
| | | | | | | | | | | | Fix a compact unwind encoding logic bug which would try to encode more callee saved registers than it should, leading to early bail out in the encoding logic and abusive use of DWARF frame mode unnecessarily. Also remove no-compact-unwind.ll which was testing the wrong thing based on this bug and move it to valid 'compact unwind' tests. Added other few more tests too. llvm-svn: 223676
* [X86] Improved tablegen patters for matching TZCNT/LZCNT.Andrea Di Biagio2014-12-081-24/+29
| | | | | | | | | | | Teach ISel how to match a TZCNT/LZCNT from a conditional move if the condition code is X86_COND_NE. Existing tablegen patterns only allowed to match TZCNT/LZCNT from a X86cond with condition code equal to X86_COND_E. To avoid introducing extra rules, I added an 'ImmLeaf' definition that checks if the condition code is COND_E or COND_NE. llvm-svn: 223668
* [X86] Improved lowering of packed v8i16 vector shifts by non-constant count.Andrea Di Biagio2014-12-081-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, the backend sub-optimally expanded the non-constant shift count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'. With this patch the backend checks if the target features sse4.1. If so, then it lets the shuffle legalizer deal with the expansion of the shift amount. Example: ;; define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) { %shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer %shl = shl <8 x i16> %A, %shamt ret <8 x i16> %shl } ;; Before (with -mattr=+avx): vmovd %xmm1, %eax movzwl %ax, %eax vmovd %eax, %xmm1 vpsllw %xmm1, %xmm0, %xmm0 retq Now: vpxor %xmm2, %xmm2, %xmm2 vpblendw $1, %xmm1, %xmm2, %xmm1 vpsllw %xmm1, %xmm0, %xmm0 retq llvm-svn: 223660
* X86 intrinsics moved form X86ISelLowering.cpp to X86IntrinsicsInfo.hElena Demikhovsky2014-12-082-133/+48
| | | | | | | | X86ISelLowering.cpp has a long switch for intrinsics. I moved a part of this long switch to the new intrinsics table in X86IntrinsicsInfo.h. No functional changes, just code and compile time optimization. llvm-svn: 223641
* [X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.Ahmed Bougacha2014-12-062-531/+226
| | | | | | | | Most patterns will go away once the extload legalization changes land. Differential Revision: http://reviews.llvm.org/D6125 llvm-svn: 223567
* [X86] Cleanup FCOPYSIGN lowering. NFC intended.Ahmed Bougacha2014-12-051-29/+15
| | | | llvm-svn: 223542
* Optimize merging of scalar loads for 32-byte vectors [X86, AVX]Sanjay Patel2014-12-051-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ). Before we crack 32-byte build vectors into smaller chunks (and then subsequently glue them back together), we should look for the easy case where we can just load all elements in a single op. An example of the codegen change is: From: vmovss 16(%rdi), %xmm1 vmovups (%rdi), %xmm0 vinsertps $16, 20(%rdi), %xmm1, %xmm1 vinsertps $32, 24(%rdi), %xmm1, %xmm1 vinsertps $48, 28(%rdi), %xmm1, %xmm1 vinsertf128 $1, %xmm1, %ymm0, %ymm0 retq To: vmovups (%rdi), %ymm0 retq Differential Revision: http://reviews.llvm.org/D6536 llvm-svn: 223518
* Use 32-bit ebp for NaCl64 in a limited case: llvm.frameaddress.Jan Wen Voung2014-12-054-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Follow up to [x32] "Use ebp/esp as frame and stack pointer": http://reviews.llvm.org/D4617 In that earlier patch, NaCl64 was made to always use rbp. That's needed for most cases because rbp should hold a full 64-bit address within the NaCl sandbox so that load/stores off of rbp don't require sandbox adjustment (zeroing the top 32-bits, then filling those by adding r15). However, llvm.frameaddress returns a pointer and pointers are 32-bit for NaCl64. In this case, use ebp instead, which will make the register copy type check. A similar mechanism may be needed for llvm.eh.return, but is not added in this change. Test Plan: test/CodeGen/X86/frameaddr.ll Reviewers: dschuff, nadav Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D6514 llvm-svn: 223510
* [X86] Improved lowering of packed vector shifts to vpsllq/vpsrlq.Andrea Di Biagio2014-12-051-10/+17
| | | | | | | | | | | | | | SSE2/AVX non-constant packed shift instructions only use the lower 64-bit of the shift count. This patch teaches function 'getTargetVShiftNode' how to deal with shifts where the shift count node is of type MVT::i64. Before this patch, function 'getTargetVShiftNode' only knew how to deal with shift count nodes of type MVT::i32. This forced the backend to wrongly truncate the shift count to MVT::i32, and then zero-extend it back to MVT::i64. llvm-svn: 223505
* [X86] Avoid introducing extra shuffles when lowering packed vector shifts.Andrea Di Biagio2014-12-051-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When lowering a vector shift node, the backend checks if the shift count is a shuffle with a splat mask. If so, then it introduces an extra dag node to extract the splat value from the shuffle. The splat value is then used to generate a shift count of a target specific shift. However, if we know that the shift count is a splat shuffle, we can use the splat index 'I' to extract the I-th element from the first shuffle operand. The advantage is that the splat shuffle may become dead since we no longer use it. Example: ;; define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) { %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer %shl = shl <4 x i32> %a, %c ret <4 x i32> %shl } ;; Before this patch, llc generated the following code (-mattr=+avx): vpshufd $0, %xmm1, %xmm1 # xmm1 = xmm1[0,0,0,0] vpxor %xmm2, %xmm2 vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7] vpslld %xmm1, %xmm0, %xmm0 retq With this patch, the redundant splat operation is removed from the code. vpxor %xmm2, %xmm2 vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7] vpslld %xmm1, %xmm0, %xmm0 retq llvm-svn: 223461
* Rename the x86 isTargetMacho to isTargetMachO for uniformity.Eric Christopher2014-12-054-8/+8
| | | | llvm-svn: 223421
* Both of these subtargets have functions that check whether orEric Christopher2014-12-051-2/+1
| | | | | | not the target is mach-o. Use them. llvm-svn: 223420
* [X86] Delete dead code in fcopysign lowering. NFC.Ahmed Bougacha2014-12-041-11/+0
| | | | | | | | | r32900 introduced custom lowering for fcopysign, with two checks to change the magnitude value's type if it's larger/smaller than the sign value's type. r32932 replaced that code for the smaller case. r43205 did the same for the larger case, but left the old code, now dead. llvm-svn: 223415
* [x86] Fix isOffsetSuitableForCodeModel kernel code model offsetBruno Cardoso Lopes2014-12-041-1/+1
| | | | | | | Offset == 0 is a valid offset for kernel code model according to the x86_64 System V ABI. Found by inspection, no testcase. llvm-svn: 223383
* [X86] Improve a dag-combine that handles a vector extract -> zext sequence.Michael Kuperstein2014-12-041-24/+51
| | | | | | | | | The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads. According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better. Differential Revision: http://reviews.llvm.org/D6501 llvm-svn: 223360
* [X86] Simplify code. NFC.Andrea Di Biagio2014-12-041-16/+6
| | | | | | | | Replaced some logic that checked if a build_vector node is doing a splat of a non-undef value with a call to method BuildVectorSDNode::getSplatValue(). No functional change intended. llvm-svn: 223354
* Masked Load / Store Intrinsics - the CodeGen part.Elena Demikhovsky2014-12-044-4/+166
| | | | | | | | | | | | | | | | | | I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
* [X86] Clean up whitespace as well as minor coding styleMichael Liao2014-12-0433-409/+403
| | | | llvm-svn: 223339
* [X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmpMichael Liao2014-12-044-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit on - This patch fixes the bug described in http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html The fix allocates an extra slot just below the GPRs and stores the base pointer there. This is done only for functions containing llvm.eh.sjlj.setjmp that also need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of the callee-save GPRs in the prologue, the offset to the extra slot can be computed before prologue generation runs. Impact at run-time on affected functions is:: - One extra store in the prologue, The store saves the base pointer. - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer. Because the extra slot is just above a gap between frame-pointer-relative and base-pointer-relative chunks of memory, there is no impact on other offset calculations other than ensuring there is room for the extra slot. http://reviews.llvm.org/D6388 Patch by Arch Robison <arch.robison@intel.com> llvm-svn: 223329
* Allow target to specify prefix for labelsMatt Arsenault2014-12-041-0/+2
| | | | | | | | Use the MCAsmInfo instead of the DataLayout, and allow specifying a custom prefix for labels specifically. HSAIL requires that labels begin with @, but global symbols with &. llvm-svn: 223323
* fix typos, grammar, formatting; NFCSanjay Patel2014-12-031-22/+19
| | | | llvm-svn: 223276
* [X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80.Ahmed Bougacha2014-12-031-1/+1
| | | | | | | | | | The X86AsmParser intel handling was refactored in r216481, making it try each different memory operand size to see which one matches. Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which led to an "invalid operand" error for code such as: movdqa [rax], xmm0 llvm-svn: 223187
* [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targetsSimon Pilgrim2014-12-021-2/+2
| | | | | | | | | | 4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead. The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch. Differential Revision: http://reviews.llvm.org/D6458 llvm-svn: 223165
* Remove unneccessary code introduced with 223101.Philip Reames2014-12-021-10/+2
| | | | llvm-svn: 223132
* fix typo in commentSanjay Patel2014-12-021-1/+1
| | | | llvm-svn: 223127
* Fix variable used only in assertion.Nick Lewycky2014-12-021-1/+2
| | | | llvm-svn: 223101
* Try to fix a bot failure due to a variable used only in an assert.Philip Reames2014-12-011-4/+4
| | | | | | Specifically, bot lld-x86_64-darwin13. Resulting from change 223085. llvm-svn: 223092
* [Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & ↵Philip Reames2014-12-014-1/+145
| | | | | | | | | | | | | | x86-64 Backend This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683. The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.) The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point. Reviewed by: atrick, ributzka llvm-svn: 223085
* Revert "Masked Vector Load and Store Intrinsics."Duncan P. N. Exon Smith2014-11-284-166/+4
| | | | | | | | | | | This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936
OpenPOWER on IntegriCloud