summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [AVX512] Capitalize the Z in VEXTRACTPSzmr. Lowercase z has been primarily ↵Craig Topper2016-05-211-2/+2
| | | | | | used to indicating the zero masking behavior which is not the case here. NFC llvm-svn: 270333
* [AVX512] Rename vector extract instructions so 'mr' intead of 'rm' to ↵Craig Topper2016-05-211-2/+2
| | | | | | reflect the fact that memory is the destination. llvm-svn: 270332
* [AVX512] Fix copy/paste mistake a I made in a comment.Craig Topper2016-05-211-1/+1
| | | | llvm-svn: 270331
* [Clang][AVX512][intrinsics] Fix rcp and sqrt intrinsics.Michael Zuckerman2016-05-214-7/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D20438 llvm-svn: 270322
* [Clang][AVX512][intrinsics] Fix vscalef intrinsics.Michael Zuckerman2016-05-215-8/+11
| | | | | | Differential Revision: http://reviews.llvm.org/D20324 llvm-svn: 270321
* [AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable ↵Craig Topper2016-05-212-1/+9
| | | | | | AVX2 versions of vector extract when AVX512VL is enabled. llvm-svn: 270318
* [AVX512] Disable AVX2 VPERMD, VPERMQ, VPERMPS, and VPERMPD patterns when ↵Craig Topper2016-05-212-30/+38
| | | | | | AVX512VL is enabled. Also add shuffle comment printing for AVX512VL VPERMPD/VPERMQ to keep some tests that now use these instructions instead of the AVX2 ones. llvm-svn: 270317
* [AVX512] Disable AVX/AVX2 VBROADCASTSS/VBROADCASTSD patterns when AVX512VL ↵Craig Topper2016-05-211-4/+4
| | | | | | is enabled. llvm-svn: 270316
* [AVX512] Disable AVX/AVX2 patterns for VPSADBW and VPMULUDQ when the ↵Craig Topper2016-05-211-4/+4
| | | | | | AVX512VL/AVX512BWI equivalents are available. llvm-svn: 270311
* [X86] Convert some SSE2/AVX2 intrinsics to ISD opcodes during lowering ↵Craig Topper2016-05-212-12/+24
| | | | | | instead of pattern matching the intrinsics. This unifies handling with AVX512 and allows these intrinsics to select EVEX encoded instructions to increase available registers. llvm-svn: 270310
* Address post-review for r270246David Majnemer2016-05-201-11/+13
| | | | | | | | | This gets rid of some unnecessary SmallStrings in X86TargetMachine::getSubtargetImpl. No functionality change is intended. llvm-svn: 270270
* [X86] Reduce memory allocations in X86TargetMachine::getSubtargetImplDavid Majnemer2016-05-203-10/+15
| | | | | | | | We performed a number of memory allocations each time getTTI was called, remove them by using SmallString. No functionality change intended. llvm-svn: 270246
* fix comments; NFCSanjay Patel2016-05-201-9/+8
| | | | llvm-svn: 270237
* use range-loops; NFCISanjay Patel2016-05-201-4/+2
| | | | llvm-svn: 270236
* fix documentation comments; NFCSanjay Patel2016-05-201-9/+8
| | | | llvm-svn: 270234
* [X86][AVX] Generalized matching for target shuffle combinesSimon Pilgrim2016-05-201-99/+146
| | | | | | | | | | | | This patch is a first step towards a more extendible method of matching combined target shuffle masks. Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple. I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits. Differential Revision: http://reviews.llvm.org/D19198 llvm-svn: 270230
* Refactor X86 symbol access classification.Rafael Espindola2016-05-204-155/+130
| | | | | | | | | | | | This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209
* [X86] Fix another AVX pattern to only be disable if VLX and BWI are supported.Craig Topper2016-05-201-1/+1
| | | | llvm-svn: 270182
* [X86] Fix some AVX patterns to only be disabled if VLX and BWI are ↵Craig Topper2016-05-201-20/+24
| | | | | | supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL. llvm-svn: 270174
* Record a TargetMachine instead of a Reloc::Model.Rafael Espindola2016-05-194-16/+13
| | | | | | Addresses r270095's code review. llvm-svn: 270147
* X86: Don't reset the stack after calls that don't return (PR27117)Hans Wennborg2016-05-191-0/+6
| | | | | | | | | Since the calls don't return, the instruction afterwards will never run, and is just taking up unnecessary space in the binary. Differential Revision: http://reviews.llvm.org/D20406 llvm-svn: 270109
* Remember the relocation model. NFC.Rafael Espindola2016-05-195-18/+16
| | | | | | This avoids passing a TargetMachine in a few places. llvm-svn: 270095
* Style fixes. NFC.Rafael Espindola2016-05-195-18/+16
| | | | llvm-svn: 270093
* [X86] Enable RRL part of the LEA optimization pass for -O2.Andrey Turetskiy2016-05-191-10/+8
| | | | | | | | | | Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2. This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro. There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006). Differential Revision: http://reviews.llvm.org/D19659 llvm-svn: 270036
* [X86] Generalize and combine some similar type constraints and node types. ↵Craig Topper2016-05-192-88/+57
| | | | | | No changes to the isel table size so the separation wasn't buying us anything. llvm-svn: 270026
* [X86] Simplify some type constraints by removing parts that were already ↵Craig Topper2016-05-191-12/+5
| | | | | | implied. llvm-svn: 270025
* [X86] Remove some type constraint classes and use already existing stricter ↵Craig Topper2016-05-191-16/+10
| | | | | | classes. llvm-svn: 270013
* [AVX512] Strengthen type constraints for VFIXUPIMM patterns and combine the ↵Craig Topper2016-05-191-7/+8
| | | | | | type constraints for vector and scalar. llvm-svn: 270012
* Delete Reloc::Default.Rafael Espindola2016-05-183-37/+43
| | | | | | | | | | | | Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988
* clean up; NFCISanjay Patel2016-05-181-5/+4
| | | | llvm-svn: 269962
* Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA ↵Hans Wennborg2016-05-189-38/+332
| | | | | | | | | | | | instructions" with an additional fix to make RegAllocFast ignore undef physreg uses. It would previously get confused about the "push %eax" instruction's use of eax. That method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate as well, but since that runs after register-allocation, we didn't run into the RegAllocFast issue before. llvm-svn: 269949
* Trivial cleanups.Rafael Espindola2016-05-181-1/+1
| | | | | | | This just clang formats and cleans comments in an area I am about to post a patch for review. llvm-svn: 269946
* Add new flag and intrinsic support for MWAITX and MONITORX instructionsAshutosh Nema2016-05-187-15/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper, RKSimon Subscribers: RKSimon, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19795 llvm-svn: 269911
* [AVX512] Strengthen type constraints on my rounding mode inputs and some ↵Craig Topper2016-05-181-16/+23
| | | | | | immediate inputs. llvm-svn: 269886
* [AVX512] Strengthen type checks on the X86ISD::SELECT node. Saves over 800 ↵Craig Topper2016-05-182-6/+16
| | | | | | bytes in the DAG isel table by removing type checks for the condition operand which is always a vector or scalar of i1 matching the the number of elements in the other operands. llvm-svn: 269885
* Revert r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions"Hans Wennborg2016-05-179-332/+38
| | | | | | Seems to have broken the Windows ASan bot. Reverting while investigating. llvm-svn: 269833
* X86: Avoid using _chkstk when lowering WIN_ALLOCA instructionsHans Wennborg2016-05-179-38/+332
| | | | | | | | | | | | | | | This patch moves the expansion of WIN_ALLOCA pseudo-instructions into a separate pass that walks the CFG and lowers the instructions based on a conservative estimate of the offset between the stack pointer and the lowest accessed stack address. The goal is to reduce binary size and run-time costs by removing calls to _chkstk. While it doesn't fix all the code quality problems with inalloca calls, it's an incremental improvement for PR27076. Differential Revision: http://reviews.llvm.org/D20263 llvm-svn: 269828
* Simplify handling of hidden stub.Rafael Espindola2016-05-176-51/+4
| | | | | | | | | Since r207518 they are printed exactly like non-hidden stubs on x86 and since r207517 on ARM. This means we can use a single set for all stubs in those platforms. llvm-svn: 269776
* Fix for PR27750. Correctly handle the case where the fallthrough block andDavid L Kreitzer2016-05-171-5/+9
| | | | | | | | target block are the same in getFallThroughMBB. Differential Revision: http://reviews.llvm.org/D20288 llvm-svn: 269760
* [X86] Remove transformVSELECTtoBlendVECTOR_SHUFFLEMichael Kuperstein2016-05-161-110/+0
| | | | | | | | | | | | | | | The new X86 shuffle lowering can do just fine without transforming vselects into vector_shuffles. It looks like the only thing this code does right now is cause trouble - in particular, it can lead to combine/legalization infinite loops. Note that it's not completely NFC, since some of the shuffle masks get inverted, which may cause slight differences further down the line. We may want to find a way to invert those masks, but that's orthogonal to this commit. This fixes the hang in PR27689. llvm-svn: 269676
* Fixed unused variable warningSimon Pilgrim2016-05-161-1/+0
| | | | llvm-svn: 269650
* [X86][SSSE3] Lower vector CTLZ with PSHUFB lookupsSimon Pilgrim2016-05-161-5/+119
| | | | | | | | | | This patch uses PSHUFB to lower vector CTLZ and avoid (slower) scalarizations. The leading zero count of each 4-bit nibble of the vector is determined by using a PSHUFB lookup. Pairs of results are then repeatedly combined up to the original element width. Differential Revision: http://reviews.llvm.org/D20016 llvm-svn: 269646
* [X86][SSE] Simplify zero'th index extract element matchingSimon Pilgrim2016-05-151-2/+3
| | | | llvm-svn: 269615
* [X86][SSE] Removed duplicate variables. NFCI.Simon Pilgrim2016-05-151-18/+10
| | | | | | Removed duplicate getOperand / getSimpleValueType calls. llvm-svn: 269614
* [AVX512] Make the permd intrinsics take a 32-bit immediate to match the ↵Craig Topper2016-05-141-4/+4
| | | | | | software spec. llvm-svn: 269579
* Fixed lowering of _comi_ intrinsics from all sets - SSE/SSE2/AVX/AVX-512Elena Demikhovsky2016-05-142-107/+53
| | | | | | Differential revision http://reviews.llvm.org/D19261 llvm-svn: 269569
* [AVX512] Fix types for pshufd intrinsics. The immediate is the second ↵Craig Topper2016-05-141-3/+3
| | | | | | | | argument and the mask is the 4th argument. Also move the 128/256 tests to the right test file. Prior to this the immediate was a strange 16-bits and the 512-bit intrinsic couldn't receive the full 16 mask bits it needs. llvm-svn: 269526
* SDAG: Clean up a dead node I missed earlier in X86Justin Bogner2016-05-131-1/+1
| | | | | | H.J. Lu pointed out that I missed this in r269236. Thanks! llvm-svn: 269516
* Assure calling "cld" instruction in prologue of X86 interrupt handler function.Amjad Aboud2016-05-131-0/+12
| | | | | | Differential Revision: http://reviews.llvm.org/D18725 llvm-svn: 269413
* Fixed the callee saved registers list for X86 AllRegs calling convention.Amjad Aboud2016-05-122-16/+26
| | | | | | | | | | | | | | | | 32-bit AllRegs: SSE: xmm0-xmm7 AVX: ymm0-ymm7 AVX512: zmm0-zmm7 + k0-k7 64-bit AllRegs: SSE: xmm0-xmm15 AVX: ymm0-ymm15 AVX512: zmm0-zmm31 + k0-k7 Differential Revision: http://reviews.llvm.org/D20142 llvm-svn: 269337
OpenPOWER on IntegriCloud