summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [AVX-512] Add VPTERNLOG to load folding tables.Craig Topper2016-09-111-0/+18
| | | | llvm-svn: 281156
* [X86] Make a helper method into a static function local to the cpp file.Craig Topper2016-09-111-8/+10
| | | | llvm-svn: 281154
* [CodeGen] Split out the notions of MI invariance and MI dereferenceability.Justin Lebar2016-09-111-1/+3
| | | | | | | | | | | | | | | | | | | Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151
* [CodeGen] Rename MachineInstr::isInvariantLoad to ↵Justin Lebar2016-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | isDereferenceableInvariantLoad. NFC Summary: I want to separate out the notions of invariance and dereferenceability at the MI level, so that they correspond to the equivalent concepts at the IR level. (Currently an MI load is MI-invariant iff it's IR-invariant and IR-dereferenceable.) First step is renaming this function. Reviewers: chandlerc Subscribers: MatzeB, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D23370 llvm-svn: 281125
* X86: Fold tail calls into conditional branches also for 64-bit (PR26302)Hans Wennborg2016-09-091-2/+12
| | | | | | | | | This extends the optimization in r280832 to also work for 64-bit. The only quirk is that we can't do this for 64-bit Windows (yet). Differential Revision: https://reviews.llvm.org/D24423 llvm-svn: 281113
* [AVX-512] Add VPCMP instructions to the load folding tables and make them ↵Craig Topper2016-09-091-1/+56
| | | | | | commutable. llvm-svn: 281013
* Win64: Don't use REX prefix for direct tail callsHans Wennborg2016-09-081-1/+0
| | | | | | | | | | The REX prefix should be used on indirect jmps, but not direct ones. For direct jumps, the unwinder looks at the offset to determine if it's inside the current function. Differential Revision: https://reviews.llvm.org/D24359 llvm-svn: 281003
* X86: Fold tail calls into conditional branches where possible (PR26302)Hans Wennborg2016-09-071-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte. Example: define void @f(i32 %x, i32 %y) { entry: %p = icmp eq i32 %x, %y br i1 %p, label %bb1, label %bb2 bb1: tail call void @foo() ret void bb2: tail call void @bar() ret void } before: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne .LBB0_2 jmp foo .LBB0_2: jmp bar after: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne bar .LBB0_1: jmp foo I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter. This patch only does 32-bit, but 64-bit would work similarly. Differential Revision: https://reviews.llvm.org/D24108 llvm-svn: 280832
* [AVX-512] Add support for commuting masked instructions in ↵Craig Topper2016-09-071-1/+23
| | | | | | findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input. llvm-svn: 280781
* [AVX-512] Integrate mask register copying more completely into ↵Craig Topper2016-09-051-68/+53
| | | | | | | | X86InstrInfo::copyPhysReg and simplify. No functional change intended. The code is now written in terms of source and dest classes with feature checks inside each type of copy instead of having separate functions for each feature set. llvm-svn: 280673
* [AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with ↵Craig Topper2016-09-051-20/+7
| | | | | | | | AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space. Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available. llvm-svn: 280648
* [X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their ↵Craig Topper2016-09-051-12/+0
| | | | | | | | | | placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected. The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD. I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were. llvm-svn: 280644
* [AVX-512] Add EVEX encoded scalar FMA intrinsic instructions to ↵Craig Topper2016-09-041-12/+24
| | | | | | isNonFoldablePartialRegisterLoad. llvm-svn: 280636
* [AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an ↵Craig Topper2016-09-031-0/+44
| | | | | | AVX512 stack folding test. llvm-svn: 280593
* [AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables.Craig Topper2016-09-031-0/+24
| | | | llvm-svn: 280581
* [AVX-512] Add execution domain fixing for logical operations with broadcast ↵Craig Topper2016-09-021-0/+80
| | | | | | loads. This builds on the handling of masked ops since we need to keep element size the same. llvm-svn: 280464
* [X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions.Andrey Turetskiy2016-09-011-2/+2
| | | | | | | | | According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned to 16 bytes. This patch removes this requirement from the memory folding table. Differential Revision: https://reviews.llvm.org/D23919 llvm-svn: 280402
* [XRay][NFC] Promote isTailCall() as virtual in TargetInstrInfo.Dean Michael Berris2016-09-011-0/+23
| | | | | | | This change is broken out from D23986, where XRay detects tail call exits. llvm-svn: 280331
* [X86] Rename PABSB/D/W instructions to be consistent with SSE/AVX ↵Craig Topper2016-08-281-9/+9
| | | | | | instructions instead of ending 128/256. NFC llvm-svn: 279927
* [AVX-512] Allow EVEX encoding unordered/ordered/equal/notequal ↵Craig Topper2016-08-271-2/+18
| | | | | | VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts. llvm-svn: 279914
* [X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted.Craig Topper2016-08-271-0/+8
| | | | llvm-svn: 279913
* [AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd.Craig Topper2016-08-271-0/+8
| | | | llvm-svn: 279912
* [X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support.Craig Topper2016-08-261-0/+4
| | | | llvm-svn: 279806
* [X86] Fix indentation per coding standards. NFCCraig Topper2016-08-251-9/+9
| | | | llvm-svn: 279719
* [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding supportSimon Pilgrim2016-08-241-0/+4
| | | | | | These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad llvm-svn: 279652
* [AVX-512] Add masked commutable floating point max/min instructions to ↵Craig Topper2016-08-141-0/+24
| | | | | | folding tables. llvm-svn: 278628
* [AVX-512] Add masked logical operations to memory folding tables.Craig Topper2016-08-141-2/+98
| | | | llvm-svn: 278627
* [X86] Add a check of isCommutable at the top of ↵Craig Topper2016-08-131-0/+3
| | | | | | | | X86InstrInfo::findCommutedOpIndices. Most callers don't check if the instruction is commutable before calling. This saves us the trouble of ending up in the default of the switch and having to determine if this is an FMA or not. llvm-svn: 278597
* X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes.Vyacheslav Klochkov2016-08-111-535/+105
| | | | | | | | | | | This helped to improved memory-folding and register coalescing optimizations. Also, this patch fixed the tracker #17229. Reviewer: Craig Topper. Differential Revision: https://reviews.llvm.org/D23108 llvm-svn: 278431
* Avoid false dependencies of undef machine operandsMarina Yatsina2016-08-111-1/+1
| | | | | | | | | | | | | | | | | | | | This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref. Pseudo example: loop: xmm0 = ... xmm1 = vcvtsi2sdl eax, xmm0<undef> ... = inst xmm0 jmp loop In this example, selecting xmm0 as the undef register creates false dependency between loop iterations. This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction. Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem. Differential Revision: https://reviews.llvm.org/D22466 llvm-svn: 278321
* [X86][SSE] Fix memory folding of (v)roundsd / (v)roundssSimon Pilgrim2016-08-091-0/+8
| | | | | | | | | | We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier. This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481. Differential Revision: https://reviews.llvm.org/D23276 llvm-svn: 278106
* [X86] Reduce duplicated code in the execution domain lookup functions by ↵Craig Topper2016-08-091-37/+17
| | | | | | passing tables as an argument. llvm-svn: 278098
* [AVX-512] Add support for execution domain switching masked logical ops ↵Craig Topper2016-08-091-11/+137
| | | | | | | | between floating point and integer domain. This switches PS<->D and PD<->Q. llvm-svn: 278097
* [X86] Remove the Fv packed logical operation alias instructions. Replace ↵Craig Topper2016-08-091-24/+0
| | | | | | | | them with patterns to the regular instructions. This enables execution domain fixing which is why the tests changed. llvm-svn: 278090
* X86InstrInfo: Update liveness in classifyLea()Matthias Braun2016-08-091-8/+13
| | | | | | | | | We need to update liveness information when we create COPYs in classifyLea(). This fixes http://llvm.org/28301 llvm-svn: 278086
* [AVX-512] Add 512-bit logical operations to load folding tables. Add avx512f ↵Craig Topper2016-08-071-0/+16
| | | | | | stack folding test and move some tests from the avx512vl test. llvm-svn: 277961
* [AVX-512] Add EVEX encoded floating point MAX/MIN instructions to the load ↵Craig Topper2016-08-071-3/+35
| | | | | | folding tables. llvm-svn: 277960
* [X86] Add commutable floating point max/min instructions to the load folding ↵Craig Topper2016-08-071-0/+20
| | | | | | tables. llvm-svn: 277949
* [AVX-512] Add SQRT/RCP14/RNDSCALE to hasUndefRegUpdate.Craig Topper2016-08-061-0/+16
| | | | llvm-svn: 277934
* [AVX-512] Add AVX-512 scalar CVT instructions to hasUndefRegUpdate.Craig Topper2016-08-061-1/+25
| | | | llvm-svn: 277933
* [X86] Add VRCPSSr_Int, VRSQRTSSr_Int, VSQRTSSr_Int, and VSQRTSDr_Int to ↵Craig Topper2016-08-061-0/+4
| | | | | | hasUndefRegUpdate. llvm-svn: 277931
* [X86][SSE] Enable commutation between MOVHLPS and UNPCKHPDSimon Pilgrim2016-08-061-0/+16
| | | | | | | | Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities. VEX encoded versions don't benefit so I haven't added support to them. llvm-svn: 277930
* [AVX-512] Fix duplicate column in AVX512 execution dependency table that was ↵Craig Topper2016-08-011-18/+18
| | | | | | preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types. llvm-svn: 277327
* [X86] Move mask register handling into the main switch of ↵Craig Topper2016-08-011-22/+6
| | | | | | getLoadStoreRegOpcode. No functional change intended. llvm-svn: 277318
* [AVX512] Always use EVEX encodings for 128/256-bit move instructions in ↵Craig Topper2016-07-311-24/+15
| | | | | | getLoadStoreRegOpcode if VLX is supported. llvm-svn: 277305
* [AVX512] Add VLX packed move instructions to the execution dependency fix ↵Craig Topper2016-07-311-5/+15
| | | | | | pass and update tests. llvm-svn: 277304
* [AVX512] Move FR32X/FR64X handling in getLoadStoreRegOpcode into the main ↵Craig Topper2016-07-311-15/+11
| | | | | | switch. No functional change intended. llvm-svn: 277303
* [AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the ↵Craig Topper2016-07-311-3/+1
| | | | | | regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned. llvm-svn: 277302
* [AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD ↵Craig Topper2016-07-311-9/+31
| | | | | | | | unless DQI instructions are supported. Same for ANDN, OR, and XOR. Thanks to Igor Breger for pointing out my mistake. llvm-svn: 277292
* [AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and ↵Craig Topper2016-07-291-0/+2
| | | | | | isReMaterializable. llvm-svn: 277120
OpenPOWER on IntegriCloud