| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
| |
shuffle lowering to allow much better blend matching.
Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.
llvm-svn: 222539
|
| |
|
|
|
|
|
|
|
|
|
|
| |
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch from Akos Kiss.
Differential Revision: http://reviews.llvm.org/D6079
llvm-svn: 222538
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
a bunch more improvements.
Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.
This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.
llvm-svn: 222537
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.
This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.
llvm-svn: 222536
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lanes.
By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.
While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.
llvm-svn: 222533
|
| |
|
|
|
|
|
|
| |
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.
llvm-svn: 222522
|
| |
|
|
|
|
|
|
| |
positive numbers
Differential Revision: http://reviews.llvm.org/D5938
llvm-svn: 222521
|
| |
|
|
|
|
|
|
| |
variable-sized dynamic allocas. Patch by Max Ostapenko.
Reviewed at http://reviews.llvm.org/D6055
llvm-svn: 222519
|
| |
|
|
| |
llvm-svn: 222516
|
| |
|
|
|
|
|
|
|
|
|
|
| |
divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)
A hook is added to allow the target to control whether it needs to do such combine.
Reviewed in http://reviews.llvm.org/D6334
llvm-svn: 222510
|
| |
|
|
| |
llvm-svn: 222509
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.
llvm-svn: 222504
|
| |
|
|
| |
llvm-svn: 222502
|
| |
|
|
| |
llvm-svn: 222500
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.
This manifested as an assertion failure when we compared the various
types: we had a size mismatch.
This fixes PR21480.
llvm-svn: 222499
|
| |
|
|
|
|
| |
now that the old JIT has been removed.
llvm-svn: 222494
|
| |
|
|
|
|
| |
MSVC can't parse this pattern for range-based for loops.
llvm-svn: 222491
|
| |
|
|
|
|
|
|
| |
match the custom lowering.
<rdar://problem/19026326>
llvm-svn: 222489
|
| |
|
|
|
|
|
| |
correctly. This helps with catching problems caused by IRBuilder abuse
such as the one fixed in CFE r222487.
llvm-svn: 222488
|
| |
|
|
|
|
| |
Follow up to r221940, where I must not have caught em all. NFC
llvm-svn: 222481
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878. When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.
This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.
llvm-svn: 222480
|
| |
|
|
| |
llvm-svn: 222475
|
| |
|
|
|
|
|
|
|
|
| |
Code seems cleaner and easier to understand this way
This is basically r222416, after fixes for MSVC lack of standard
support, and a few cleaning (got rid of a warning).
Thanks Nakamura Takumi and Nico Weber for the MSVC fixes.
llvm-svn: 222472
|
| |
|
|
|
|
| |
have a circular dependency.
llvm-svn: 222458
|
| |
|
|
|
|
| |
now that the legacy JIT has been removed.
llvm-svn: 222453
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.
However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.
llvm-svn: 222451
|
| |
|
|
|
|
|
| |
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT. This corrects the emission of stack probes on i686-windows-itanium.
llvm-svn: 222439
|
| |
|
|
|
|
|
|
|
| |
As dump() methods should be. To allow that, do not store the DWARFFormValue
objects used for the dump in the header data.
Per Alexey's suggestion!
llvm-svn: 222436
|
| |
|
|
| |
llvm-svn: 222435
|
| |
|
|
|
|
|
|
| |
These fields would need to be explicitly deleted before we RAUW the temporary
node anyway (this was done in cfe commit r222373). Instead, do not create
these useless nodes in the first place.
llvm-svn: 222434
|
| |
|
|
|
|
| |
fixing them introduced bugs
llvm-svn: 222428
|
| |
|
|
| |
llvm-svn: 222426
|
| |
|
|
|
|
|
|
| |
- Use LLVM_DELETED_FUNCTION.
- Don't use member initializers.
- Don't use initializer list.
llvm-svn: 222422
|
| |
|
|
|
|
| |
Code seems cleaner and easier to understand this way
llvm-svn: 222416
|
| |
|
|
|
|
| |
code R_ARM_PLT32
llvm-svn: 222414
|
| |
|
|
| |
llvm-svn: 222412
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
"global-init", "global-init-src" and "global-init-type" were originally
used to blacklist entities in ASan init-order checker. However, they
were never documented, and later were replaced by "=init" category.
Old blacklist entries should be converted as follows:
* global-init:foo -> global:foo=init
* global-init-src:bar -> src:bar=init
* global-init-type:baz -> type:baz=init
llvm-svn: 222401
|
| |
|
|
| |
llvm-svn: 222399
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
operands are"
This reverts commit r222142. This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.
Conflicts:
test/Transforms/Reassociate/optional-flags.ll
llvm-svn: 222398
|
| |
|
|
| |
llvm-svn: 222396
|
| |
|
|
| |
llvm-svn: 222386
|
| |
|
|
|
|
| |
A long sequence of || or && could lead to a stack explosion.
llvm-svn: 222384
|
| |
|
|
|
|
|
|
|
| |
- Show "Considering..." message after flipping so you actually see the final
destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
a list of which register got merged with which.
llvm-svn: 222382
|
| |
|
|
| |
llvm-svn: 222381
|
| |
|
|
| |
llvm-svn: 222380
|
| |
|
|
| |
llvm-svn: 222379
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 222376
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.
With this patch, a build_vector that performs a blend with zero is
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.
This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.
Differential Revision: http://reviews.llvm.org/D6311
llvm-svn: 222375
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
As detailed at http://llvm.org/PR20728, due to an internal overflow in
APFloat::multiplySignificand the APFloat::fusedMultiplyAdd method can return
incorrect results for x87DoubleExtended (x86_fp80) values. This commonly
manifests as incorrect constant folding of libm fmal calls on x86. E.g.
fmal(1.0L, 1.0L, 3.0L) == 0.0L (should be 4.0L)
This patch fixes PR20728 by adding an extra bit to the significand for
intermediate results of APFloat::multiplySignificand, avoiding the overflow.
llvm-svn: 222374
|
| |
|
|
|
|
|
|
|
|
|
|
| |
A register operand that has a common sub-class with its instruction's
defined register class is not always legal. For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.
This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.
llvm-svn: 222368
|