| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SLP Vectorization of intrinsics (r203707) has exposed cases where the
expansion of vector bswap is failing (PR19151).
Reviewers: hfinkel
CC: chandlerc
Differential Revision: http://llvm-reviews.chandlerc.com/D3104
llvm-svn: 204163
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
following rules:
1) (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask)
2) (OR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR (A, B), C, Mask)
3) (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask)
4) (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask)
5) (OR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR (A, B), Mask)
6) (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask)
llvm-svn: 204160
|
| |
|
|
| |
llvm-svn: 204071
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX.
For that to work properly, we also need to teach the legalizer about the
specific promotion required here. The default vector promotion uses
bitcasting to a vector type of the same total size. We want to promote the
vector element type, effectively widening the operation and then truncating
the result. This is analogous to the current logic of how int_to_fp is
promoted.
The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType. This is now shared between
int_to_fp and fp_to_int.
There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86. It can now go through the new target-independent fp_to_*int promotion
logic.
I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.
Fixes <rdar://problem/16202247>
llvm-svn: 204058
|
| |
|
|
|
|
|
|
|
|
| |
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&. At this point they almost behave like normal iterators!
Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.
llvm-svn: 203865
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for use with C++11 range-based for-loops.
The gist of phase 1 is to remove the skipInstruction() and skipBundle()
methods from these iterators, instead splitting each iterator into a version
that walks operands, a version that walks instructions, and a version that
walks bundles. This has the result of making some "clever" loops in lib/CodeGen
more verbose, but also makes their iterator invalidation characteristics much
more obvious to the casual reader. (Making them concise again in the future is a
good motivating case for a pre-incrementing range adapter!)
Phase 2 of this undertaking with consist of removing the getOperand() method,
and changing operator*() of the operand-walker to return a MachineOperand&. At
that point, it should be possible to add range views for them that work as one
might expect.
llvm-svn: 203757
|
| |
|
|
|
|
|
| |
In some cases the include is pushed "downstream" (or removed if
unused).
llvm-svn: 203644
|
| |
|
|
|
|
|
| |
The code added nothing but potentially disabled move semantics and made
types non-trivially copyable.
llvm-svn: 203563
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
|
| |
|
|
| |
llvm-svn: 203514
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
llvm-svn: 203364
|
| |
|
|
|
|
| |
class.
llvm-svn: 203339
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is already done for shifts. Allow it for rotations as well. E.g.:
(rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31))
Use the newly factored-out distributeTruncateThroughAnd.
With this patch and some X86.td tweaks we should be able to remove redundant
masking of the rotation amount like in the example above. HW implicitly
performs this masking.
The testcase will be added as part of the X86 patch.
llvm-svn: 203316
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the new idiom:
x<<(y&31) | x>>((0-y)&31)
which is recognized as:
x ROTL (y&31)
The change refines matchRotateSub. In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.
llvm-svn: 203315
|
| |
|
|
|
|
|
|
|
|
| |
Slightly change the wording in the function comment. Originally, it can be
misunderstood as we turned the input into two subsequent rotates.
Better connect the comment which talks about Mask and the code which used
LoBits. Renamed variable to MaskLoBits.
llvm-svn: 203314
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
be split and the result type widened.
When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.
I ran this over the test suite with i686 and mattr=+sse and saw no regressions.
Fixes PR18036.
llvm-svn: 203311
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.
The rules are:
1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)
The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.
Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.
llvm-svn: 203156
|
| |
|
|
|
|
|
| |
This appears to only be working for global loads. Private
and local break for other reasons.
llvm-svn: 203135
|
| |
|
|
|
|
| |
already lives.
llvm-svn: 203046
|
| |
|
|
|
|
|
|
|
|
|
| |
already lived there and it is where it belongs -- this is the in-memory
debug location representation.
This is just cleanup -- Modules can actually cope with this, but that
doesn't make it right. After chatting with folks that have out-of-tree
stuff, going ahead and moving the rest of the headers seems preferable.
llvm-svn: 202960
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.
Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense. Although the calling
convention is not currently used to select the scratch registers.
llvm-svn: 202943
|
| |
|
|
| |
llvm-svn: 202932
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
selection dag (PR19012)
In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).
The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.
This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.
Differential Revision: http://llvm-reviews.chandlerc.com/D2954
llvm-svn: 202930
|
| |
|
|
|
|
|
|
|
| |
Currently this code is duplicated across visitSHL, visitSRA and visitSRL. The
plan is to add rotates as clients to this new function.
There is no functional change intended here.
llvm-svn: 202908
|
| |
|
|
|
|
|
| |
abstracting between a CallInst and an InvokeInst, both of which are IR
concepts.
llvm-svn: 202816
|
| |
|
|
|
|
| |
The old implementation is no longer needed in C++11.
llvm-svn: 202644
|
| |
|
|
|
|
| |
Remove the old functions.
llvm-svn: 202636
|
| |
|
|
|
|
|
|
| |
of boilerplate.
No intended functionality change.
llvm-svn: 202588
|
| |
|
|
|
|
|
|
|
|
|
| |
This extract-and-trunc vector optimization cannot work for i1 values as
currently implemented, and so I'm disabling this for now for i1 values. In the
future, this can be fixed properly.
Soon I'll commit support for i1 CR bit tracking in the PowerPC backend, and
this will be covered by one of the existing regression tests.
llvm-svn: 202449
|
| |
|
|
| |
llvm-svn: 202074
|
| |
|
|
| |
llvm-svn: 202073
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
shifted mask rather than masking and shifting separately.
The patch adds this transformation to the DAGCombiner:
(shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)
<rdar://problem/16054492>
Patch by Adam Nemet <anemet@apple.com>
llvm-svn: 201906
|
| |
|
|
| |
llvm-svn: 201870
|
| |
|
|
|
|
| |
This is quiet a bit less confusing now that TargetData was renamed DataLayout.
llvm-svn: 201606
|
| |
|
|
|
|
| |
TargetData was renamed DataLayout back in r165242.
llvm-svn: 201581
|
| |
|
|
|
|
|
|
|
|
|
| |
This fix checks the original LLVM IR node to identify opaque constants by
looking for the bitcast-constant pattern. Originally we looked at the generated
SDNode, but this might lead to incorrect results. The SDNode could have been
generated by an constant expression that was folded to a constant.
This fixes <rdar://problem/16050719>
llvm-svn: 201291
|
| |
|
|
|
|
|
|
|
|
| |
We are now no longer relying on the target-specific call lowering implementation
to lower a stackmap intrinsic call. Instead we perform the call lowering in a
target-independent way directly in the stackmap lowering code. This simplifies
the code and removes the need to fixup the code after the target-specific call
lowering.
llvm-svn: 201263
|
| |
|
|
|
|
|
|
|
|
|
| |
documenation)
The ID type for the stackmap and patchpoint intrinsics are in both cases i64.
This fixes an zero extend in the SelectionDAGBuilder that still used i32. This
also updates the target independent instructions STACKMAP and PATCHPOINT to use
the correct type.
llvm-svn: 201262
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
BUILD_VECTOR nodes, e.g.:
(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)
This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.
llvm-svn: 201158
|
| |
|
|
|
|
|
|
|
|
|
|
| |
opaque constants.
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.
llvm-svn: 200900
|
| |
|
|
| |
llvm-svn: 200888
|
| |
|
|
|
|
|
| |
On R600, some address spaces have more strict alignment
requirements than others.
llvm-svn: 200887
|
| |
|
|
|
|
| |
from X86 matcher table.
llvm-svn: 200821
|
| |
|
|
|
|
|
| |
ISD::BSWAP was missing from the list of node types that should be expanded
element-wise.
llvm-svn: 200705
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.
Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work. As a result these changes are pretty
minimal.
Reviewers: echristo
Differential Revision: http://llvm-reviews.chandlerc.com/D2637
llvm-svn: 200596
|
| |
|
|
|
|
|
|
| |
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block. Also add an
assertion in x86 fastisel.
llvm-svn: 200593
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when the input is a concat_vectors and the insert replaces one of the
concat halves:
Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)
This can be seen with the following IR:
define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
%1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)
The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.
Using AVX, without the patch this generates two vinsertf128 instructions:
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0
With the patch this is optimized into:
vinsertf128 $1, %xmm1, %ymm2, %ymm0
Patch by Robert Lougher.
llvm-svn: 200506
|
| |
|
|
|
|
| |
they're not legal.
llvm-svn: 200503
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".
Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.
llvm-svn: 200502
|
| |
|
|
| |
llvm-svn: 200434
|