| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 222554
|
| |
|
|
| |
llvm-svn: 222553
|
| |
|
|
|
|
| |
Debug output is shown if any of the -debug-only arguments match.
llvm-svn: 222547
|
| |
|
|
| |
llvm-svn: 222546
|
| |
|
|
| |
llvm-svn: 222545
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.
Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6355
llvm-svn: 222544
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
relative offsets for code models other than small/medium. For JIT application, memory layout is less controlled and can result in truncations otherwise."
This reverts commit r222538.
It's causing test failures for CFI, at least on Darwin:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental/1189/
http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_check/1391/
Note that the previous incremental build was on r222537, and the CFI
tests weren't failing:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental/1188/
llvm-svn: 222542
|
| |
|
|
|
|
|
|
|
|
| |
shuffle lowering to allow much better blend matching.
Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.
llvm-svn: 222539
|
| |
|
|
|
|
|
|
|
|
|
|
| |
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch from Akos Kiss.
Differential Revision: http://reviews.llvm.org/D6079
llvm-svn: 222538
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
a bunch more improvements.
Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.
This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.
llvm-svn: 222537
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.
This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.
llvm-svn: 222536
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lanes.
By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.
While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.
llvm-svn: 222533
|
| |
|
|
| |
llvm-svn: 222528
|
| |
|
|
|
|
|
|
|
| |
merging 128-bit subvectors and also shuffling all the elements of those
subvectors. Currently we generate pretty bad code for many of these, but
I'm testing a patch that should dramatically improve this in addition to
making the shuffle lowering robust to other changes.
llvm-svn: 222525
|
| |
|
|
|
|
|
|
| |
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.
llvm-svn: 222522
|
| |
|
|
|
|
|
|
| |
positive numbers
Differential Revision: http://reviews.llvm.org/D5938
llvm-svn: 222521
|
| |
|
|
|
|
|
|
| |
variable-sized dynamic allocas. Patch by Max Ostapenko.
Reviewed at http://reviews.llvm.org/D6055
llvm-svn: 222519
|
| |
|
|
| |
llvm-svn: 222516
|
| |
|
|
|
|
|
|
|
|
|
|
| |
divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)
A hook is added to allow the target to control whether it needs to do such combine.
Reviewed in http://reviews.llvm.org/D6334
llvm-svn: 222510
|
| |
|
|
| |
llvm-svn: 222509
|
| |
|
|
|
|
| |
RegisterInfo output file.
llvm-svn: 222508
|
| |
|
|
|
|
|
| |
The logic for detecting EOF was wrong and would fail if we ever requested
more than 16k past the last read position.
llvm-svn: 222505
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.
llvm-svn: 222504
|
| |
|
|
| |
llvm-svn: 222502
|
| |
|
|
| |
llvm-svn: 222500
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.
This manifested as an assertion failure when we compared the various
types: we had a size mismatch.
This fixes PR21480.
llvm-svn: 222499
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous description of the noalias attribute did not accurately specify
the implemented semantics, and the terminology used differed unnecessarily
from that used by the C specification to define the semantics of restrict. For
the argument attribute, the semantics can be precisely specified in terms of
objects accessed through pointers based on the arguments, and this is now what
is done.
Saying that the semantics are 'slightly weaker' than that provided by C99
restrict is not really useful without further elaboration, so that has been
removed from the sentence.
noalias on a return value is really used to mean that the function is
malloc-like (and, in fact, we use this attribute to represent
__attribute__((malloc)) in Clang), and this is a stronger guarantee than that
provided by restrict (because it is a property of the pointed-to memory region,
not just a guarantee on object access). Clarifying this is relevant to fixing
(and was motivated by the discussion on) PR21556.
llvm-svn: 222497
|
| |
|
|
|
|
| |
now that the old JIT has been removed.
llvm-svn: 222494
|
| |
|
|
|
|
| |
MSVC can't parse this pattern for range-based for loops.
llvm-svn: 222491
|
| |
|
|
|
|
|
|
| |
match the custom lowering.
<rdar://problem/19026326>
llvm-svn: 222489
|
| |
|
|
|
|
|
| |
correctly. This helps with catching problems caused by IRBuilder abuse
such as the one fixed in CFE r222487.
llvm-svn: 222488
|
| |
|
|
|
|
| |
Follow up to r221940, where I must not have caught em all. NFC
llvm-svn: 222481
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878. When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.
This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.
llvm-svn: 222480
|
| |
|
|
| |
llvm-svn: 222475
|
| |
|
|
|
|
|
|
|
|
| |
Code seems cleaner and easier to understand this way
This is basically r222416, after fixes for MSVC lack of standard
support, and a few cleaning (got rid of a warning).
Thanks Nakamura Takumi and Nico Weber for the MSVC fixes.
llvm-svn: 222472
|
| |
|
|
|
|
| |
have a circular dependency.
llvm-svn: 222458
|
| |
|
|
|
|
| |
now that the legacy JIT has been removed.
llvm-svn: 222453
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.
However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.
llvm-svn: 222451
|
| |
|
|
|
|
|
|
|
| |
If the template specialization for externally managed sets in
PostOrderIterator call too far out of sync with each other, this unit
test will fail to build. This is especially useful for developers who
may not build Clang (the only in-tree user) every time.
llvm-svn: 222447
|
| |
|
|
|
|
|
|
|
| |
po_iterator_storage's insertEdge was updated to reflect the API
changes from many of our insert methods in r222334, however the
template specialization for external storage was not updated. This
updates the specialization.
llvm-svn: 222446
|
| |
|
|
|
|
|
| |
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT. This corrects the emission of stack probes on i686-windows-itanium.
llvm-svn: 222439
|
| |
|
|
|
|
|
|
|
| |
As dump() methods should be. To allow that, do not store the DWARFFormValue
objects used for the dump in the header data.
Per Alexey's suggestion!
llvm-svn: 222436
|
| |
|
|
| |
llvm-svn: 222435
|
| |
|
|
|
|
|
|
| |
These fields would need to be explicitly deleted before we RAUW the temporary
node anyway (this was done in cfe commit r222373). Instead, do not create
these useless nodes in the first place.
llvm-svn: 222434
|
| |
|
|
| |
llvm-svn: 222430
|
| |
|
|
|
|
| |
fixing them introduced bugs
llvm-svn: 222428
|
| |
|
|
| |
llvm-svn: 222426
|
| |
|
|
|
|
|
|
| |
- Use LLVM_DELETED_FUNCTION.
- Don't use member initializers.
- Don't use initializer list.
llvm-svn: 222422
|
| |
|
|
|
|
| |
Code seems cleaner and easier to understand this way
llvm-svn: 222416
|
| |
|
|
|
|
| |
code R_ARM_PLT32
llvm-svn: 222414
|