| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
To be used by D19781.
Differential Revision: http://reviews.llvm.org/D19801
llvm-svn: 273039
|
| |
|
|
|
|
|
| |
Add AVX512 anyext patterns for i16 and i64, modeled on the existing i8 and
i32 patterns.
llvm-svn: 273038
|
| |
|
|
|
|
| |
As a developer tool it makes sense for it to use the new relocations.
llvm-svn: 273019
|
| |
|
|
|
|
|
|
| |
so we can emit native IR from clang.
Clang-side sibling commit to follow.
llvm-svn: 273002
|
| |
|
|
|
|
|
| |
This will (hopefully very temporarily) break clang.
The clang side of this should be the next commit.
llvm-svn: 272932
|
| |
|
|
| |
llvm-svn: 272928
|
| |
|
|
| |
llvm-svn: 272921
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When calculating a square root using Newton-Raphson with two constants,
a naive implementation is to use five multiplications (four muls to calculate
reciprocal square root and another one to calculate the square root itself).
However, after some reassociation and CSE the same result can be obtained
with only four multiplications. Unfortunately, there's no reliable way to do
such a reassociation in the back-end. So, the patch modifies NR code itself
so that it directly builds optimal code for SQRT and doesn't rely on any
further reassociation.
Patch by Nikolai Bozhenov!
Differential Revision: http://reviews.llvm.org/D21127
llvm-svn: 272920
|
| |
|
|
|
|
|
|
| |
Follow-up to:
http://reviews.llvm.org/rL272806
http://reviews.llvm.org/rL272807
llvm-svn: 272907
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
builtin to IR
The clang side of this was r272840:
http://reviews.llvm.org/rL272840
A follow-up step would be to auto-upgrade and remove these LLVM intrinsics completely.
Differential Revision: http://reviews.llvm.org/D21269
llvm-svn: 272841
|
| |
|
|
|
|
| |
Missed this in r272806, r272807.
llvm-svn: 272834
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened
with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend
too while we're working on that and as a backstop.
This fixes:
https://llvm.org/bugs/show_bug.cgi?id=27924
Differential Revision: http://reviews.llvm.org/D21356
llvm-svn: 272828
|
| |
|
|
|
|
| |
to 32 bits. This is in response to a comment by Eli Friedman.
llvm-svn: 272814
|
| |
|
|
|
|
|
|
| |
This allows us to emit native IR in Clang (next commit).
Also, update the intrinsic tests to show that codegen already knows how to handle
the IR that Clang will soon produce.
llvm-svn: 272806
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D21144
llvm-svn: 272801
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D21085
llvm-svn: 272797
|
| |
|
|
|
|
|
|
|
|
| |
SELECT behavior.
Use BLENDM instead of masked move instruction.
Differential Revision: http://reviews.llvm.org/D21001
llvm-svn: 272763
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
... when the offset is not statically known.
Prioritize addresses relative to the stack pointer in the stackmap, but
fallback gracefully to other modes of addressing if the offset to the
stack pointer is not a known constant.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm
Differential Revision: http://reviews.llvm.org/D21259
llvm-svn: 272756
|
| |
|
|
|
|
|
|
|
|
| |
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM. LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.
Differential Revision: http://reviews.llvm.org/D21316
llvm-svn: 272737
|
| |
|
|
| |
llvm-svn: 272733
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
profile.
Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20991
llvm-svn: 272729
|
| |
|
|
| |
llvm-svn: 272714
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.
This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
the normal rule that the global must have a unique address can be broken without
being observable by the program by performing comparisons against the global's
address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
its own copy of the global if it requires one, and the copy in each linkage unit
must be the same)
- It is a constant or a function (which means that the program cannot observe that
the unique-address rule has been broken by writing to the global)
Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.
See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.
Part of the fix for PR27553.
Differential Revision: http://reviews.llvm.org/D20348
llvm-svn: 272709
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from i8 or i16
For <N x i32> type mul, pmuludq will be used for targets without SSE41, which
often introduces many extra pack and unpack instructions in vectorized loop
body because pmuludq generates <N/2 x i64> type value. However when the operands
of <N x i32> mul are extended from smaller size values like i8 and i16, the type
of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which
generates better code. For targets with SSE41, pmulld is supported so no
shrinking is needed.
Differential Revision: http://reviews.llvm.org/D20931
llvm-svn: 272694
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change EmitGlobalVariable to check final assembler section is in BSS
before using .lcomm/.comm directive. This prevents globals from being
put into .bss erroneously when -data-sections is used.
This fixes PR26570.
Reviewers: echristo, rafael
Subscribers: llvm-commits, mehdi_amini
Differential Revision: http://reviews.llvm.org/D21146
llvm-svn: 272674
|
| |
|
|
|
|
| |
using MOVNTSD/MOVNTSS
llvm-svn: 272651
|
| |
|
|
| |
llvm-svn: 272643
|
| |
|
|
|
|
| |
when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR.
llvm-svn: 272626
|
| |
|
|
|
|
| |
KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX.
llvm-svn: 272625
|
| |
|
|
|
|
| |
instruction. A followup patch will remove that instruction, but adding the tests first to make the more obvious.
llvm-svn: 272624
|
| |
|
|
|
|
|
| |
In rL272580 I accidentally added a test case to test/CodeGen when
test/Transforms/DeadStoreElimination/ is a better place for it.
llvm-svn: 272581
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D21286
llvm-svn: 272580
|
| |
|
|
| |
llvm-svn: 272577
|
| |
|
|
|
|
|
| |
The need for these intrinsics has been obviated by r272564 which
reimplements their functionality using generic IR.
llvm-svn: 272566
|
| |
|
|
|
|
| |
tested
llvm-svn: 272542
|
| |
|
|
|
|
| |
autoupgrade them to selects and shufflevector.
llvm-svn: 272527
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044
By changing the definition of X86ISD::CMPP to use float types, we allow it to be created
and pass legalization for an SSE1-only target where v4i32 is not legal.
The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001
and eventually makes this trigger:
http://reviews.llvm.org/D21190
Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.
Differential Revision: http://reviews.llvm.org/D21235
llvm-svn: 272511
|
| |
|
|
|
|
| |
shufflevector.
llvm-svn: 272510
|
| |
|
|
|
|
| |
A lot of the codegen is pretty awful for these as they are mostly implemented as generic bit twiddling ops
llvm-svn: 272508
|
| |
|
|
|
|
| |
since they are autoupgraded to shufflevector.
llvm-svn: 272494
|
| |
|
|
|
|
|
|
| |
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file
Refreshed some tests to demonstrate the new check
llvm-svn: 272488
|
| |
|
|
|
|
| |
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.
llvm-svn: 272477
|
| |
|
|
| |
llvm-svn: 272474
|
| |
|
|
| |
llvm-svn: 272473
|
| |
|
|
|
|
| |
Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining
llvm-svn: 272471
|
| |
|
|
| |
llvm-svn: 272469
|
| |
|
|
|
|
| |
allows us to create 512-bit PSHUFLW/PSHUFHW.
llvm-svn: 272450
|
| |
|
|
|
|
|
| |
The vector cases don't change because we already have folds in X86ISelLowering
to look through and remove bitcasts.
llvm-svn: 272427
|
| |
|
|
| |
llvm-svn: 272416
|
| |
|
|
| |
llvm-svn: 272411
|