| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack.
Patch by Luqman Aden!
llvm-svn: 217410
|
| |
|
|
| |
llvm-svn: 217381
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Assert in scheduler from an inserted copy_to_regclass from
a constant.
This only seems to break sometimes when a constant initializer
address is forced into VGPRs in a non-entry block. No test
since the only case I've managed to hit only happens with a future
patch, and that case will also not be a problem once scalar instructions
are used in non-entry blocks.
llvm-svn: 217380
|
| |
|
|
| |
llvm-svn: 217379
|
| |
|
|
|
|
|
|
| |
Only handles LDS atomics for now, and will be used
to replace atomics with no uses with the no return
versions.
llvm-svn: 217378
|
| |
|
|
|
|
|
| |
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103
llvm-svn: 217371
|
| |
|
|
|
|
|
| |
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103
llvm-svn: 217370
|
| |
|
|
|
|
| |
Another trivial spelling change.
llvm-svn: 217364
|
| |
|
|
|
|
|
| |
I hadn't actually run all the tests yet and these combines have somewhat
surprisingly far reaching effects.
llvm-svn: 217333
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
support for MOVDDUP which is really important for matrix multiply style
operations that do lots of non-vector-aligned load and splats.
The original motivation was to add support for MOVDDUP as the lack of it
regresses matmul_f64_4x4 by 5% or so. However, all of the rules here
were somewhat suspicious.
First, we should always be using the floating point domain shuffles,
regardless of how many copies we have to make as a movapd is *crazy*
faster than the domain switching cost on some chips. (Mostly because
movapd is crazy cheap.) Because SHUFPD can't do the copy-for-free trick
of the PSHUF instructions, there is no need to avoid canonicalizing on
UNPCK variants, so do that canonicalizing. This also ensures we have the
chance to form MOVDDUP. =]
Second, we assume SSE2 support when doing any vector lowering, and given
that we should just use UNPCKLPD and UNPCKHPD as they can operate on
registers or memory. If vectors get spilled or come from memory at all
this is going to allow the load to be folded into the operation. If we
want to optimize for encoding size (the only difference, and only
a 2 byte difference) it should be done *much* later, likely after RA.
llvm-svn: 217332
|
| |
|
|
| |
llvm-svn: 217323
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parsing (and latent bug in the instruction definitions).
This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.
The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:
insertps $192, %xmm0, %xmm1
insertps $-64, %xmm0, %xmm1
These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.
The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.
Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.
The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.
In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.
I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.
llvm-svn: 217310
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
computation was totally wrong, but somehow it didn't really show up with
llc.
I've added an assert that triggers on multiple existing test cases and
updated one of them to show the correct value.
There appear to still be more bugs lurking around insertps's mask. =/
However, note that this only really impacts the new vector shuffle
lowering.
llvm-svn: 217289
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
MipsAsmParser. No functional changes.
Summary: Found a couple of cases where unsigned was still being used. These two should be the last ones in the (entire) Mips backend.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5028
llvm-svn: 217257
|
| |
|
|
|
|
|
| |
This fixes hitting the same negative base offset problem
that was already fixed for regular loads and stores.
llvm-svn: 217256
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Patch by Vasileios Kalintiris.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5173
llvm-svn: 217255
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Use the naming convention from the LLVM Coding Standards.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4972
llvm-svn: 217254
|
| |
|
|
|
|
|
|
| |
round halfway cases away from zero
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 217250
|
| |
|
|
|
|
|
|
|
|
| |
We must constrain the destination register class of legalized operands
to a VGPR class or else the illegal operand may be folded back into
the instruction by the register coalescer.
This fixes a bug in add.ll that will be uncovered by future commits.
llvm-svn: 217249
|
| |
|
|
|
|
| |
https://bugs.freedesktop.org/show_bug.cgi?id=83416
llvm-svn: 217248
|
| |
|
|
|
|
|
|
|
|
| |
shuffle lowering for integer vectors and share it from v4i32, v8i16, and
v16i8 code paths.
Ironically, the SSE2 v16i8 code for this is now better than the SSSE3!
=] Will have to fix the SSSE3 code next to just using a single pshufb.
llvm-svn: 217240
|
| |
|
|
|
|
| |
No change in behaviour (hopefully).
llvm-svn: 217233
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patched by Sergey Dmitrouk.
This pass tries to make consecutive compares of values use same operands to
allow CSE pass to remove duplicated instructions. For this it analyzes
branches and adjusts comparisons with immediate values by converting:
GE -> GT
GT -> GE
LT -> LE
LE -> LT
and adjusting immediate values appropriately. It basically corrects two
immediate values towards each other to make them equal.
llvm-svn: 217220
|
| |
|
|
|
|
|
|
| |
This fixes an issue where MS inline assembly containing xgetbv wouldn't
be marked as clobbering EAX:EDX. Test for that forthcoming on the Clang
side.
llvm-svn: 217173
|
| |
|
|
|
|
|
|
|
|
| |
Follow up to r217138, extending the logic to other NEON-immediate instructions.
As before, the instruction already performs the correct operation and we're
just using a different type for convenience, so we want a true nop-cast.
Patch by Asiri Rathnayake.
llvm-svn: 217159
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Standards. No functional changes.
Summary: There are still some functions which should be renamed, but they are inherited from the generic MC classes.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5068
llvm-svn: 217145
|
| |
|
|
|
|
| |
warning with MSVC. NFC.
llvm-svn: 217143
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We were materialising big-endian constants using DAG nodes with types different
from what was requested, followed by a bitcast. This is fine on little-endian
machines where bitcasting is a nop, but we need a slightly different
representation for big-endian. This adds a new set of NVCAST (natural-vector
cast) operations which are always nops.
Patch by Asiri Rathnayake.
llvm-svn: 217138
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vzext patterns and insert-element patterns that for SSE4 have dedicated
instructions.
With this we can enable the experimental mode in a regression test that
happens to cover some of the past set of issues. You can see that the
new logic does significantly better here on the floating point cases.
A follow-up to this change and the previous ones will hoist the logic
into helpers so it can be shared across element type sizes as in this
particular case it generalizes cleanly.
llvm-svn: 217136
|
| |
|
|
|
|
| |
After commit 217131.
llvm-svn: 217134
|
| |
|
|
|
|
| |
Used binary search over the tables.
llvm-svn: 217131
|
| |
|
|
| |
llvm-svn: 217119
|
| |
|
|
|
|
|
|
|
| |
This change adds support for immediate and shift-left folding into logical
operations.
This fixes rdar://problem/18223183.
llvm-svn: 217118
|
| |
|
|
|
|
|
|
|
|
|
| |
abilities of INSERTPS which are really powerful and come up in very
important contexts such as forming diagonal matrices, etc.
With this I ended up being able to remove the somewhat weird helper
I added for INSERTPS because we can collapse the entire state to a no-op
mask. Added a bunch of tests for inserting into a zero-ish vector.
llvm-svn: 217117
|
| |
|
|
| |
llvm-svn: 217109
|
| |
|
|
|
|
|
| |
Also fix bug this exposed where when legalizing an immediate
operand, a v_mov_b32 would be created with a VSrc dest register.
llvm-svn: 217108
|
| |
|
|
|
|
|
|
|
|
| |
'insertps' patterns.
This replaces two shuffles with a single insertps in very common cases.
My next patch will extend this to leverage the zeroing capabilities of
insertps which will allow it to be used in a much wider set of cases.
llvm-svn: 217100
|
| |
|
|
|
|
|
|
|
|
| |
an immediate operand when we don't have instruction-specific comments.
This ensures that instruction-specific comments are attached to the same
line as the instruction which is important for using them to write
readable and maintainable tests. My next commit will just such a test.
llvm-svn: 217099
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs.
Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving
the X86 backend to this pass.
This requires a way of easily detecting which instructions are atomic.
I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making
isAtomic() a method of Instruction implemented by a switch on the opcodes.
Test Plan: make check
Reviewers: jfb
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D5035
llvm-svn: 217080
|
| |
|
|
| |
llvm-svn: 217077
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
target-independent insertLeading/TrailingFence (in AtomicExpandPass)
Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
It is not even clear if this is correct on ARM swift processors (where release fences are
DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
as it is not clear whether it is incorrect. I would love to get documentation stating
whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).
See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.
I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.
This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.
llvm-svn: 217076
|
| |
|
|
|
|
|
|
|
|
| |
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
Reviewed by Eric
llvm-svn: 217075
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit renames the following public FastISel functions:
LowerArguments -> lowerArguments
SelectInstruction -> selectInstruction
TargetSelectInstruction -> fastSelectInstruction
FastLowerArguments -> fastLowerArguments
FastLowerCall -> fastLowerCall
FastLowerIntrinsicCall -> fastLowerIntrinsicCall
FastEmitZExtFromI1 -> fastEmitZExtFromI1
FastEmitBranch -> fastEmitBranch
UpdateValueMap -> updateValueMap
TargetMaterializeConstant -> fastMaterializeConstant
TargetMaterializeAlloca -> fastMaterializeAlloca
TargetMaterializeFloatZero -> fastMaterializeFloatZero
LowerCallTo -> lowerCallTo
Reviewed by Eric
llvm-svn: 217074
|
| |
|
|
| |
llvm-svn: 217071
|
| |
|
|
|
|
| |
on the machine function.
llvm-svn: 217070
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Things got a little bit messy over the years and it is time for a little bit
spring cleaning.
This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.
Reviewed by Eric
llvm-svn: 217060
|
| |
|
|
| |
llvm-svn: 217054
|
| |
|
|
| |
llvm-svn: 217041
|
| |
|
|
|
|
|
|
|
|
| |
This fixes a crash in the OpenCV test:
ImgprocWarpResizeArea/Resize.Mat/16
There is no test case for this, because this failure depends on a
specific ordering of the loads, which could easily change.
llvm-svn: 217040
|
| |
|
|
|
|
| |
No functionality change. Changes made by clang-tidy + some manual cleanup.
llvm-svn: 217028
|