| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 261573
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit r261504, since it's not obvious the new name is
better:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160222/334298.html
I'll recommit if we get consensus that it's the right direction.
llvm-svn: 261567
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17498
llvm-svn: 261521
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17500
llvm-svn: 261520
|
| |
|
|
| |
llvm-svn: 261515
|
| |
|
|
|
|
|
|
| |
Resolve FIXME from r261504. Apparently bundled instructions are illegal
here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160215/334146.html
llvm-svn: 261507
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.
- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator(). This matches the
naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator(). This is explicitly called
"bundle" (not matching MachineBasicBlock) to disintinguish it clearly
from ilist_node::getIterator().
- Update all calls. Some of these I switched to `auto` to remove
boiler-plate, since the new name is clear about the type.
There was one call I updated that looked fishy, but it wasn't clear what
the right answer was. This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.
llvm-svn: 261504
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
llvm-svn: 261498
|
| |
|
|
| |
llvm-svn: 261494
|
| |
|
|
|
|
|
|
| |
Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements.
Differential Revision: http://reviews.llvm.org/D17297
llvm-svn: 261490
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Rename `"skylake"` == SkylakeServerProc to `"skylake-avx512"`
- Change `"skylake"` to denote SkylakeClientProc
- Fix the detection of cpu family 6 and model 94 to be
SkylakeClientProc instead of SkylakeServerProc
- Remove the `"cnl"` for CannonLake
Reviewers: craig.topper, delena
Subscribers: zansari, echristo, qcolombet, RKSimon, spatel, DavidKreitzer, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17090
llvm-svn: 261482
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
COFF doesn't have sections with mergeable contents. Instead, each
constant pool entry ends up in a COMDAT section. The linker, when
choosing between COMDAT sections, doesn't choose the max alignment of
the two sections. You just get whatever alignment was on the section.
If one constant needed a higher alignment in one object file from
another one, then we will get into trouble if the linker chooses the
lower alignment one.
Instead, lets promote the alignment of the constant pool entry to make
sure we don't use an under aligned constant with an instruction which
assumed otherwise.
This fixes PR26680.
llvm-svn: 261462
|
| |
|
|
|
|
|
|
| |
(PR26667)
Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input.
llvm-svn: 261434
|
| |
|
|
|
|
| |
First small step towards fixing PR26667 - we need to ensure that combineX86ShuffleChain only gets called with a valid shuffle input node (a similar issue was found in D17041).
llvm-svn: 261433
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16877
llvm-svn: 261429
|
| |
|
|
|
|
|
|
|
|
| |
Handle address displacement operands of a type other than Immediate or Global in LEAs and load/stores.
Ref: https://llvm.org/bugs/show_bug.cgi?id=26575
Differential Revision: http://reviews.llvm.org/D17374
llvm-svn: 261428
|
| |
|
|
|
|
|
| |
This has no observable behavior change, it just makes the state
insertion pass look a little more like normal passes.
llvm-svn: 261420
|
| |
|
|
| |
llvm-svn: 261417
|
| |
|
|
|
|
|
|
|
|
| |
TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent
shrink-wrapping from pushing prologue/epilogue past them (which result
in TLS variables being accessed before the stack frame is set up), we
put markers, so that the stack gets adjusted properly.
Thanks to Quentin Colombet for guidance/help on how to fix this problem!
llvm-svn: 261387
|
| |
|
|
|
|
| |
I stumbled upon this while debugging a lowering bug.
llvm-svn: 261371
|
| |
|
|
| |
llvm-svn: 261370
|
| |
|
|
|
|
| |
Turns out the new nop sequences aren't actually nops on x86_64 (PR26554).
llvm-svn: 261365
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When optimizing for size, sqrt calls can be incorrectly selected as
AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a
`Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
definition. Even if the target does not support AVX512, the class can
apparently still be chosen, leading to an incorrect selection of
`vsqrtss`.
In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
requires an XMM register, which is not available on i686 CPUs.
Reviewers: grosbach, resistor, joker.eph
Subscribers: spatel, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D17414
llvm-svn: 261360
|
| |
|
|
| |
llvm-svn: 261311
|
| |
|
|
|
|
|
|
|
|
| |
This is effectively NFC because Atom is the only in-order x86 subtarget currently,
but the predicate would have become wrong if any other in-order CPU came along.
See related discussion in:
http://reviews.llvm.org/D16836
llvm-svn: 261275
|
| |
|
|
|
|
| |
Cleanup for upcoming Clang warning -Wcomma. No functionality change intended.
llvm-svn: 261270
|
| |
|
|
|
|
|
|
|
|
| |
If we know that all of our successors want to be in the exact same
state, it makes sense to hoist the state transition into their common
predecessor.
Differential Revision: http://reviews.llvm.org/D17391
llvm-svn: 261262
|
| |
|
|
| |
llvm-svn: 261255
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In r260133, LLVM was changed to no longer extend i8/i16 return values,
as it's not required by the ABI. However, code was found in the wild
that relies on the old behaviour on Darwin, so this commit reverts
back to that old behaviour for Darwin.
On other platforms, it's less likely that code would be depending on
the old behaviour, as GCC and MSVC haven't been extending such return
values.
llvm-svn: 261235
|
| |
|
|
|
|
|
|
| |
In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool.
The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly.
llvm-svn: 261201
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17024
llvm-svn: 261198
|
| |
|
|
| |
llvm-svn: 261128
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
32-bit x86 Windows targets use a linked-list of nodes allocated on the
stack, referenced to via thread-local storage. The personality routine
interprets one of the fields in the node as a 'state number' which
indicates where the personality routine should transfer control.
State transitions are possible only before call-sites which may throw
exceptions. Our previous scheme had us update the state number before
all call-sites which may throw.
Instead, we can try to minimize the number of times we need to store by
reasoning about the nearest store which dominates the current call-site.
If the last store agrees with the current call-site, then we know that
the state-update is redundant and can be elided.
This is largely straightforward: an RPO walk of the blocks allows us to
correctly forward propagate the information when the function is a DAG.
Currently, loops are not handled optimally and may trigger superfluous
state stores.
Differential Revision: http://reviews.llvm.org/D16763
llvm-svn: 261122
|
| |
|
|
|
|
|
|
|
|
|
| |
Bug description:
The bug was discovered when test was compiled with -O0.
In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result.
Change LowerMSCATTER() to return chain as original node do.
Differential Revision: http://reviews.llvm.org/D17331
llvm-svn: 261090
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour.
Part 2 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261082
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.
Part 1 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261081
|
| |
|
|
|
|
|
|
| |
Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow.
Renamed from V to Ops to more closely match their purpose.
llvm-svn: 261078
|
| |
|
|
|
|
| |
Asserts are still firing in Chromium builds. PR26575.
llvm-svn: 261058
|
| |
|
|
|
|
|
|
|
| |
__chkstk clobbers EAX. If EAX is live across the prologue, then we have
to take extra steps to save it. We already had code to do this if EAX
was a register parameter. This change adapts it to work when shrink
wrapping is used.
llvm-svn: 261039
|
| |
|
|
| |
llvm-svn: 261025
|
| |
|
|
|
|
| |
I suspect this is what let PR26110 lie dormant for so long.
llvm-svn: 261024
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we sometimes miscompile this vector pattern:
(c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
When we have SSSE3, we incorrectly lower that to PSIGN, which does:
(c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
(c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.
Note that the PSIGN tests are also incorrect. Consider:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %a, zeroinitializer
ret <4 x i32> %a
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which returns 0, as %xmm1 is 0.
Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate
Fixes PR26110.
Differential Revision: http://reviews.llvm.org/D17181
llvm-svn: 261023
|
| |
|
|
| |
llvm-svn: 261021
|
| |
|
|
|
|
| |
This makes it IMO more readable and reduces indentation.
llvm-svn: 261020
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16877
llvm-svn: 260979
|
| |
|
|
|
|
|
|
|
|
| |
Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution.
Ref: https://llvm.org/bugs/show_bug.cgi?id=26575
Differential Revision: http://reviews.llvm.org/D17261
llvm-svn: 260959
|
| |
|
|
| |
llvm-svn: 260943
|
| |
|
|
| |
llvm-svn: 260942
|
| |
|
|
| |
llvm-svn: 260940
|
| |
|
|
|
|
|
|
| |
locality and code size from SP/FP offset encoding.
Differential Revision: http://reviews.llvm.org/D15393
llvm-svn: 260917
|