| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
|  | 
llvm-svn: 219357
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
I had forgotten to check for NotSlowIncDec in the patterns that can generate
inc/dec for the above pattern (added in D4796).
This currently applies to Atom Silvermont, KNL and SKX.
Test Plan: New checks on atomic_mi.ll
Reviewers: jfb, nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5677
llvm-svn: 219336
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
VPCMP/VPCMPU{BWDQ}
Added CMP_MASK_CC intrinsic type.
Added tests for intrinsics.
Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>
llvm-svn: 219316
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable.
No functional change.
llvm-svn: 219310
 | 
| | 
| 
| 
| 
| 
|  | 
Patch by Charlie Turner.
llvm-svn: 219301
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This must be enforced for all v6M cores, not just the cortex-m0,
irregardless of the user-specified alignment.
Patch by Charlie Turner.
llvm-svn: 219300
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This switch can be reduced to a simpler if/else statement.
Patch by Charlie Turner.
llvm-svn: 219299
 | 
| | 
| 
| 
| 
| 
|  | 
calls to getTargetLowering() with the cached variable.
llvm-svn: 219284
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Phabricator Revision: http://reviews.llvm.org/D5589
Patch by Balaram Makam <bmakam@codeaurora.org>!!
llvm-svn: 219276
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Fix pr21099
The pseudocode of what we were doing (spread through two functions) was:
if (operand.doesNotFitIn32Bits())
  Opc.initializeWithFoo();
if (operand < 0)
  operand = -operand;
if (operand.doesFitIn8Bits())
  Opc.initializeWithBar();
else if (operand.doesFitIn32Bits())
  Opc.initializeWithBlah();
doStuff(Opc);
So for operand == INT32_MIN, Opc was never initialized because the operand changes
from fitting in 32 bits to not fitting, causing the various bugs/error messages
noted by pr21099.
This patch adds an extra test at the beginning for this case, and an
llvm_unreachable to have better error message if the operand ends up
not fitting in 32-bits at the end.
Test Plan: new test + make check
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5655
llvm-svn: 219257
 | 
| | 
| 
| 
|  | 
llvm-svn: 219256
 | 
| | 
| 
| 
|  | 
llvm-svn: 219255
 | 
| | 
| 
| 
|  | 
llvm-svn: 219254
 | 
| | 
| 
| 
|  | 
llvm-svn: 219253
 | 
| | 
| 
| 
|  | 
llvm-svn: 219242
 | 
| | 
| 
| 
|  | 
llvm-svn: 219238
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The main reason for this is that the MCAsmInfo class,
which we were previously using as the base class, sets
PrivateGlobalPrefix to "L", which causes all global
functions that start with L to be treated as local symbols.
MCAsmInfoELF sets PrivateGlobalPrefix to ".L", which is what
we want, and it is probably a good idea to use this as the
base class anyway, since we are emitting ELF binaries.
llvm-svn: 219237
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Added a FIXME coment instead, we need to handle the case where the
two DS instructions being compared have different numbers of operands.
llvm-svn: 219236
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary: CFI directives are generated for .S files.
Reviewers: eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5520
llvm-svn: 219199
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
According to the ABI documentation, f128 and {f128} should both be returned
in $f0 and $f2. However, this doesn't match GCC's behaviour which is to
return f128 in $f0 and $f2, but {f128} in $f0 and $f1.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5578
llvm-svn: 219196
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
mode for certain instructions it shouldn't.
Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present.
Fixes PR21169.
llvm-svn: 219194
 | 
| | 
| 
| 
| 
| 
|  | 
curly brace.
llvm-svn: 219193
 | 
| | 
| 
| 
|  | 
llvm-svn: 219192
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
sign-/zero-extends.
The code already folds sign-/zero-extends, but only if they are arguments to
mul and shift instructions. This extends the code to also fold them when they
are direct inputs.
llvm-svn: 219187
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Tiny enhancement to the address computation code to also fold sub instructions
if the rhs is constant and can be folded into the offset.
llvm-svn: 219186
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This commit fixes an issue with sign-/zero-extending loads that was discovered
by Richard Barton.
We use now the correct load instructions for sign-extending loads to 64bit. Also
updated and added more unit tests.
llvm-svn: 219185
 | 
| | 
| 
| 
|  | 
llvm-svn: 219172
 | 
| | 
| 
| 
|  | 
llvm-svn: 219128
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This instruction form is handled by different AsmOperands now, so the code is
completely dead (and wrong anyway).
llvm-svn: 219127
 | 
| | 
| 
| 
| 
| 
|  | 
convert them anymore.
llvm-svn: 219112
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
These will make it easier to test further changes to the
code generation and optimization pipelines as those are
moved to subtargets initialized with target feature and
target cpu.
llvm-svn: 219106
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This trades a (register-renamer-friendly) movaps for a floating point
/ integer domain cross. That is a very bad trade, even on architectures
where domain crossing is relatively fast. On any chip where there is
even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely
to cause a spill to be introduced because the reason for the copy is to
destructively shuffle in place.
Thanks to Ben Kramer for fixing a bug in this code that my new shuffle
lowering exposed and highlighting that perhaps it should just go away.
=]
llvm-svn: 219090
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
3-address pshufd.
It's debatable whether this transform is useful at all, but for now make sure
we don't generate invalid asm.
llvm-svn: 219084
 | 
| | 
| 
| 
| 
| 
|  | 
This instruction allows to broadacst mask vector to data vector.
llvm-svn: 219083
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
new vector shuffle lowering.
This is loosely based on a patch by Marius Wachtler to the PR (thanks!).
I refactored it a bi to use std::count_if and a mutable array ref but
the core idea was exactly right. I also added some direct testing of
this case.
I believe PR21137 is now the only remaining regression.
llvm-svn: 219081
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the
few remaining regressions impacting benchmarks from the new vector
shuffle lowering.
You may note that it "regresses" many of the vperm2x128 test cases --
these were actually "improved" by the naive lowering that the new
shuffle lowering previously did. This regression gave me fits. I had
this patch ready-to-go about an hour after flipping the switch but
wasn't sure how to have the best of both worlds here and thought the
correct solution might be a completely different approach to lowering
these vector shuffles.
I'm now convinced this is the correct lowering and the missed
optimizations shown in vperm2x128 are actually due to missing
target-independent DAG combines. I've even written most of the needed
DAG combine and will submit it shortly, but this part is ready and
should help some real-world benchmarks out.
llvm-svn: 219079
 | 
| | 
| 
| 
|  | 
llvm-svn: 219073
 | 
| | 
| 
| 
|  | 
llvm-svn: 219071
 | 
| | 
| 
| 
|  | 
llvm-svn: 219069
 | 
| | 
| 
| 
| 
| 
|  | 
weirdness exposed by it.
llvm-svn: 219068
 | 
| | 
| 
| 
| 
| 
|  | 
NFC.
llvm-svn: 219061
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Update the entire regression test suite for the new shuffles. Remove
most of the old testing which was devoted to the old shuffle lowering
path and is no longer relevant really. Also remove a few other random
tests that only really exercised shuffles and only incidently or without
any interesting aspects to them.
Benchmarking that I have done shows a few small regressions with this on
LNT, zero measurable regressions on real, large applications, and for
several benchmarks where the loop vectorizer fires in the hot path it
shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy
Bridge machines. Running on AMD machines shows even more dramatic
improvements.
When using newer ISA vector extensions the gains are much more modest,
but the code is still better on the whole. There are a few regressions
being tracked (PR21137, PR21138, PR21139) but by and large this is
expected to be a win for x86 generated code performance.
It is also more correct than the code it replaces. I have fuzz tested
this extensively with ISA extensions up through AVX2 and found no
crashes or miscompiles (yet...). The old lowering had a few miscompiles
and crashers after a somewhat smaller amount of fuzz testing.
There is one significant area where the new code path lags behind and
that is in AVX-512 support. However, there was *extremely little*
support for that already and so this isn't a significant step backwards
and the new framework will probably make it easier to implement lowering
that uses the full power of AVX-512's table-based shuffle+blend (IMO).
Many thanks to Quentin, Andrea, Robert, and others for benchmarking
assistance. Thanks to Adam and others for help with AVX-512. Thanks to
Hal, Eric, and *many* others for answering my incessant questions about
how the backend actually works. =]
I will leave the old code path in the tree until the 3 PRs above are at
least resolved to folks' satisfaction. Then I will rip it (and 1000s of
lines of code) out. =] I don't expect this flag to stay around for very
long. It may not survive next week.
llvm-svn: 219046
 | 
| | 
| 
| 
|  | 
llvm-svn: 219045
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
It turns out this combine was always somewhat flawed -- there are cases
where nested VZEXT nodes *can't* be combined: if their types have
a mismatch that can be observed in the result. While none of these show
up in currently, once I switch to the new vector shuffle lowering a few
test cases actually form such nested VZEXT nodes. I've not come up with
any IR pattern that I can sensible write to exercise this, but it will
be covered by tests once I flip the switch.
llvm-svn: 219044
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
nodes to the DAG combining of them.
This will allow the combine to fire on both old vector shuffle lowering
and the new vector shuffle lowering and generally seems like a cleaner
design. I've trimmed down the code a bit and tried to make it and the
surrounding combine fairly clean while moving it around.
llvm-svn: 219042
 | 
| | 
| 
| 
|  | 
llvm-svn: 219038
 | 
| | 
| 
| 
|  | 
llvm-svn: 219037
 | 
| | 
| 
| 
| 
| 
|  | 
Re-add the tests since they were deleted at some point
llvm-svn: 219036
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
the various ways in which blends can be used to do vector element
insertion for lowering with the scalar math instruction forms that
effectively re-blend with the high elements after performing the
operation.
This then allows me to bail on the element insertion lowering path when
we have SSE4.1 and are going to be doing a normal blend, which in turn
restores the last of the blends lost from the new vector shuffle
lowering when I got it to prioritize insertion in other cases (for
example when we don't *have* a blend instruction).
Without the patterns, using blends here would have regressed
sse-scalar-fp-arith.ll *completely* with the new vector shuffle
lowering. For completeness, I've added RUN-lines with the new lowering
here. This is somewhat superfluous as I'm about to flip the default, but
hey, it shows that this actually significantly changed behavior.
The patterns I've added are just ridiculously repetative. Suggestions on
making them better very much welcome. In particular, handling the
commuted form of the v2f64 patterns is somewhat obnoxious.
llvm-svn: 219033
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
perform a load to use blendps rather than movss when it is available.
For non-loads, blendps is *much* faster. It can execute on two ports in
Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes
one of the "regressions" from aggressively taking the "insertion" path
in the new vector shuffle lowering.
This does highlight one problem with blendps -- it isn't commuted as
heavily as it should be. That's future work though.
llvm-svn: 219022
 |