| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.
This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.
Both the transformation and the lowering of its result to asm got shiny
new tests.
The transformation is a bit convoluted because of blendvp[sd]'s
definition:
Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
I will send an email to llvm-dev to discuss if we want to change this or
not.
Reviewers: grosbach, delena, nadav
Differential Revision: http://reviews.llvm.org/D3859
llvm-svn: 209643
|
| |
|
|
| |
llvm-svn: 209640
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 209639
|
| |
|
|
|
|
|
| |
This seems to match what gcc does for ppc and what every other llvm
backend does.
llvm-svn: 209638
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:
1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
to 8 bits. This goes against the spirit of "zeroext" I think, but
it's a bit of a vague construct anyway (by definition you're going
to extend to the amount required by the ABI, that's why it's the
ABI!).
This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.
Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.
Should fix PR19850
llvm-svn: 209637
|
| |
|
|
|
|
|
|
|
| |
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.
llvm-svn: 209636
|
| |
|
|
| |
llvm-svn: 209634
|
| |
|
|
| |
llvm-svn: 209628
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
optimization pass.
Add tests for the following transform:
str X, [x0, #32]
...
add x0, x0, #32
->
str X, [x0, #32]!
with X being either w1, x1, s0, d0 or q0.
llvm-svn: 209627
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cover the following cases:
ldr X, [x0, #32]
...
add x0, x0, #32
->
ldr X, [x0, #32]!
with X being either w1, x1, s0, d0 or q0.
llvm-svn: 209624
|
| |
|
|
|
|
| |
see PR17409
llvm-svn: 209623
|
| |
|
|
| |
llvm-svn: 209619
|
| |
|
|
|
|
|
|
| |
We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding.
As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation.
llvm-svn: 209618
|
| |
|
|
|
|
|
|
|
|
| |
programmatically
and via the command line, mirroring similar functionality in LoopUnroll. In
situations where clients used custom unrolling thresholds, their intent could
previously be foiled by LoopRotate having a hardcoded threshold.
llvm-svn: 209617
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was previously regressed/broken by r192749 (reverted due to this
issue in r192938) and I was about to break it again by accident with
some more invasive changes that deal with the subprogram lists. So to
avoid that and further issues - here's a test.
It's a pretty basic test - in both r192749 and my impending case, this
test would crash, but checking the basics (that we put a subprogram in
just one of the two CUs) seems like a good start.
We still get this wrong in weird ways if the linkonce-odr function
happens to not be identical in the metadata (because it's defined in two
different files (hence the # line directives in this test), etc) even
though it meets the language requirements (identical token stream) for
such a thing. That results in two subprogram DIEs, but only one of them
gets the parameter and high/low pc information, etc. We probably need to
use the DIRef infrastructure to deduplicate functions as we do types to
address this issue - or perhaps teach the BC linker to remove the
duplicate entries in subprogram lists?
llvm-svn: 209614
|
| |
|
|
|
|
|
| |
Post commit review feedback from Manman called this out, but it looks
like it slipped through the cracks.
llvm-svn: 209611
|
| |
|
|
|
|
| |
Thanks to David Blaikie for the suggestion.
llvm-svn: 209610
|
| |
|
|
| |
llvm-svn: 209609
|
| |
|
|
| |
llvm-svn: 209608
|
| |
|
|
|
|
|
|
|
|
|
| |
Remove the use of the std::function and replace the capturing lambda with a
non-capturing one, opting to pass the user data down to the context. This is
needed as std::function is not yet available on all hosted platforms (it
requires RTTI, which breaks on Windows).
Thanks to Nico Rieck for pointing this out!
llvm-svn: 209607
|
| |
|
|
|
|
|
|
|
|
| |
Move the implementation of the Win64 EH printer from the COFFDumper into its own
class. This is in preparation for adding support to print ARM EH information.
The only real change here is in printUnwindInfo where we now lambda lift the
implicit this parameter for the resolveFunction. Also setup the printing to
handle ARM. This now has set the stage to introduce ARM EH printing.
llvm-svn: 209606
|
| |
|
|
|
|
|
| |
This inlines the single use function in preparation for splitting the Win64EH
printing out of the COFFDumper into its own entity.
llvm-svn: 209605
|
| |
|
|
|
|
|
|
|
|
| |
Make the use of the cache more transparent to the users. There is no reason
that the cached entries really need to be passed along. The overhead for doing
so is minimal: a single extra parameter. This requires that some standalone
functions be brought into the COFFDumper class so that they may access the
cache.
llvm-svn: 209604
|
| |
|
|
|
|
|
| |
Switch to use references for parameters that are guaranteed to be non-null.
Simplifies the code a slight bit in preparation for another change.
llvm-svn: 209603
|
| |
|
|
|
|
|
|
|
|
|
| |
Seems my previous fix was insufficient - we were still not adding the
inlined function to the abstract scope list. Which meant it wasn't
flagged as inline, didn't have nested lexical scopes in the abstract
definition, and didn't have abstract variables - so the inlined variable
didn't reference an abstract variable, instead being described
completely inline.
llvm-svn: 209602
|
| |
|
|
|
|
|
|
|
| |
straight to llvm-dwarfdump
We still do temporary files in many cases, just updating this particular
one because I was debugging it and made this change while doing so.
llvm-svn: 209601
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.
With this patch it is now possible to do things like
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16
@f = dllexport alias i32 (), [6 x i8]* @foo
!llvm.module.flags = !{!0}
!0 = metadata !{i32 6, metadata !"Linker Options", metadata !1}
!1 = metadata !{metadata !2, metadata !3}
!2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"}
!3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"}
llvm-svn: 209600
|
| |
|
|
|
|
| |
The " at the end of the line makes sure we matched the entire directive.
llvm-svn: 209599
|
| |
|
|
|
|
|
|
|
|
| |
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.
Differential Revision: http://reviews.llvm.org/D3905
llvm-svn: 209595
|
| |
|
|
| |
llvm-svn: 209589
|
| |
|
|
|
|
| |
Sort the source files. NFC.
llvm-svn: 209587
|
| |
|
|
| |
llvm-svn: 209586
|
| |
|
|
|
|
|
|
|
| |
The code emitted is what would be expected for the small model, so it
shouldn't be used when objects can be the full 64-bits away.
This fixes MCJIT tests on Linux.
llvm-svn: 209585
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 209581
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This makes front/back symmetric with begin/end, avoiding some confusion.
Added instr_front/instr_back for the old behavior, corresponding to
instr_begin/instr_end. Audited all three in-tree users of back(), all
of them look like they don't want to look inside bundles.
Fixes an assertion (PR19815) when generating debug info on mips, where a
delay slot was bundled at the end of a branch.
llvm-svn: 209580
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
llvm-svn: 209577
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.
The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.
Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.
llvm-svn: 209576
|
| |
|
|
|
|
|
|
| |
directory".
It didn't match on non-English version of Windows.
llvm-svn: 209570
|
| |
|
|
|
|
|
|
|
|
|
| |
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar
Evolution.
That helps SLP-vectorizer to recognize consecutive loads/stores.
<rdar://problem/14860614>
llvm-svn: 209568
|
| |
|
|
|
|
|
|
| |
After the load/store refactoring, we were sometimes trying to feed a
GPR64 into a 32-bit register offset operand. This failed in
copyPhysReg.
llvm-svn: 209566
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In an effort to fix inlined debug info in situations where the out of
line definition of a function preceeds any inlined usage, the order in
which some attributes are added to subprogram DIEs may change. (in
essence, definition-necessary attributes like DW_AT_low_pc/high_pc will
be added immediately, but the names, types, and other features will be
delayed to module end where they may either be added to the subprogram
DIE or instead reference an abstract definition for those values)
These tests can be generalized to be resilient to this change. 5 or so
tests actually have to be incompatibly changed to cope with this
reordering and will go along with the change that affects the order.
llvm-svn: 209554
|
| |
|
|
|
|
|
| |
It's an unnecessary detail for this test and just gets in the way when
making unrelated changes to the output in this test.
llvm-svn: 209553
|
| |
|
|
| |
llvm-svn: 209551
|
| |
|
|
| |
llvm-svn: 209550
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 209548
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
same scope as the abstract definition.
This seems like a simple cleanup/improved consistency, but also helps
lay the foundation to fix the bug mentioned in the test case: concrete
definitions preceeding any inlined usage aren't properly split into
concrete + abstract (because they're not known to need it until it's too
late).
Once we start deferring this choice until later, we won't have the
choice to put concrete definitions for inlined subroutines in a
different scope from concrete definitions for non-inlined subroutines
(since we won't know at time-of-construction which one it'll be). This
change brings those two cases into alignment ahead of that future
chaneg/fix.
llvm-svn: 209547
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to r209358: PR19799: Indvars miscompile due to an
incorrect max backedge taken count from SCEV.
That fix was incomplete as pointed out by Arnold and Michael Z. The
code was also too confusing. It needed a careful rewrite with more
unit tests. This version will also happen to optimize more cases.
<rdar://17005101> PR19799: Indvars miscompile...
llvm-svn: 209545
|
| |
|
|
|
|
| |
This reverts part of commit r209538.
llvm-svn: 209544
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches both what we do for the non-thread case and what gcc does.
With this patch clang would match gcc's behaviour in
static __thread int a = 42;
extern __thread int b __attribute__((alias("a")));
int *f(void) { return &a; }
int *g(void) { return &b; }
if not for pr19843. Manually writing the IL does produce the same access modes.
It is also a step in the direction of fixing pr19844.
llvm-svn: 209543
|
| |
|
|
| |
llvm-svn: 209539
|