| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 281156
|
|
|
|
| |
llvm-svn: 281154
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isDereferenceableInvariantLoad. NFC
Summary:
I want to separate out the notions of invariance and dereferenceability
at the MI level, so that they correspond to the equivalent concepts at
the IR level. (Currently an MI load is MI-invariant iff it's
IR-invariant and IR-dereferenceable.)
First step is renaming this function.
Reviewers: chandlerc
Subscribers: MatzeB, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D23370
llvm-svn: 281125
|
|
|
|
|
|
|
|
|
| |
This extends the optimization in r280832 to also work for 64-bit. The only
quirk is that we can't do this for 64-bit Windows (yet).
Differential Revision: https://reviews.llvm.org/D24423
llvm-svn: 281113
|
|
|
|
|
|
| |
commutable.
llvm-svn: 281013
|
|
|
|
|
|
|
|
|
|
| |
The REX prefix should be used on indirect jmps, but not direct ones.
For direct jumps, the unwinder looks at the offset to determine if
it's inside the current function.
Differential Revision: https://reviews.llvm.org/D24359
llvm-svn: 281003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.
Example:
define void @f(i32 %x, i32 %y) {
entry:
%p = icmp eq i32 %x, %y
br i1 %p, label %bb1, label %bb2
bb1:
tail call void @foo()
ret void
bb2:
tail call void @bar()
ret void
}
before:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne .LBB0_2
jmp foo
.LBB0_2:
jmp bar
after:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne bar
.LBB0_1:
jmp foo
I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.
This patch only does 32-bit, but 64-bit would work similarly.
Differential Revision: https://reviews.llvm.org/D24108
llvm-svn: 280832
|
|
|
|
|
|
| |
findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input.
llvm-svn: 280781
|
|
|
|
|
|
|
|
| |
X86InstrInfo::copyPhysReg and simplify. No functional change intended.
The code is now written in terms of source and dest classes with feature checks inside each type of copy instead of having separate functions for each feature set.
llvm-svn: 280673
|
|
|
|
|
|
|
|
| |
AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space.
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.
llvm-svn: 280648
|
|
|
|
|
|
|
|
|
|
| |
placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected.
The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD.
I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were.
llvm-svn: 280644
|
|
|
|
|
|
| |
isNonFoldablePartialRegisterLoad.
llvm-svn: 280636
|
|
|
|
|
|
| |
AVX512 stack folding test.
llvm-svn: 280593
|
|
|
|
| |
llvm-svn: 280581
|
|
|
|
|
|
| |
loads. This builds on the handling of masked ops since we need to keep element size the same.
llvm-svn: 280464
|
|
|
|
|
|
|
|
|
| |
According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned
to 16 bytes. This patch removes this requirement from the memory folding table.
Differential Revision: https://reviews.llvm.org/D23919
llvm-svn: 280402
|
|
|
|
|
|
|
| |
This change is broken out from D23986, where XRay detects tail call
exits.
llvm-svn: 280331
|
|
|
|
|
|
| |
instructions instead of ending 128/256. NFC
llvm-svn: 279927
|
|
|
|
|
|
| |
VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts.
llvm-svn: 279914
|
|
|
|
| |
llvm-svn: 279913
|
|
|
|
| |
llvm-svn: 279912
|
|
|
|
| |
llvm-svn: 279806
|
|
|
|
| |
llvm-svn: 279719
|
|
|
|
|
|
| |
These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad
llvm-svn: 279652
|
|
|
|
|
|
| |
folding tables.
llvm-svn: 278628
|
|
|
|
| |
llvm-svn: 278627
|
|
|
|
|
|
|
|
| |
X86InstrInfo::findCommutedOpIndices. Most callers don't check if the instruction is commutable before calling.
This saves us the trouble of ending up in the default of the switch and having to determine if this is an FMA or not.
llvm-svn: 278597
|
|
|
|
|
|
|
|
|
|
|
| |
This helped to improved memory-folding and register coalescing optimizations.
Also, this patch fixed the tracker #17229.
Reviewer: Craig Topper.
Differential Revision: https://reviews.llvm.org/D23108
llvm-svn: 278431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref.
Pseudo example:
loop:
xmm0 = ...
xmm1 = vcvtsi2sdl eax, xmm0<undef>
... = inst xmm0
jmp loop
In this example, selecting xmm0 as the undef register creates false dependency between loop iterations.
This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction.
Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem.
Differential Revision: https://reviews.llvm.org/D22466
llvm-svn: 278321
|
|
|
|
|
|
|
|
|
|
| |
We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier.
This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481.
Differential Revision: https://reviews.llvm.org/D23276
llvm-svn: 278106
|
|
|
|
|
|
| |
passing tables as an argument.
llvm-svn: 278098
|
|
|
|
|
|
|
|
| |
between floating point and integer domain.
This switches PS<->D and PD<->Q.
llvm-svn: 278097
|
|
|
|
|
|
|
|
| |
them with patterns to the regular instructions.
This enables execution domain fixing which is why the tests changed.
llvm-svn: 278090
|
|
|
|
|
|
|
|
|
| |
We need to update liveness information when we create COPYs in
classifyLea().
This fixes http://llvm.org/28301
llvm-svn: 278086
|
|
|
|
|
|
| |
stack folding test and move some tests from the avx512vl test.
llvm-svn: 277961
|
|
|
|
|
|
| |
folding tables.
llvm-svn: 277960
|
|
|
|
|
|
| |
tables.
llvm-svn: 277949
|
|
|
|
| |
llvm-svn: 277934
|
|
|
|
| |
llvm-svn: 277933
|
|
|
|
|
|
| |
hasUndefRegUpdate.
llvm-svn: 277931
|
|
|
|
|
|
|
|
| |
Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities.
VEX encoded versions don't benefit so I haven't added support to them.
llvm-svn: 277930
|
|
|
|
|
|
| |
preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types.
llvm-svn: 277327
|
|
|
|
|
|
| |
getLoadStoreRegOpcode. No functional change intended.
llvm-svn: 277318
|
|
|
|
|
|
| |
getLoadStoreRegOpcode if VLX is supported.
llvm-svn: 277305
|
|
|
|
|
|
| |
pass and update tests.
llvm-svn: 277304
|
|
|
|
|
|
| |
switch. No functional change intended.
llvm-svn: 277303
|
|
|
|
|
|
| |
regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned.
llvm-svn: 277302
|
|
|
|
|
|
|
|
| |
unless DQI instructions are supported. Same for ANDN, OR, and XOR.
Thanks to Igor Breger for pointing out my mistake.
llvm-svn: 277292
|
|
|
|
|
|
| |
isReMaterializable.
llvm-svn: 277120
|