|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| | llvm-svn: 302815 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | We don't use it and it was removed in gfx9, and the encoding
bit repurposed.
Additionally actually using it requires changing the output register
class, which wasn't done anyway.
llvm-svn: 302814 | 
| | 
| 
| 
| 
| 
| 
| | This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.
llvm-svn: 302813 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Earlier fix D32572 introduced a bug where live-ins were calculated
for basic block instead of scheduling region. This change fixes it.
Differential Revision: https://reviews.llvm.org/D33086
llvm-svn: 302812 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable.  I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.
ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).
We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark.  This is not perfect of course but it gives you at
least a rough idea about the tree.  Then you can follow up with -view-slp-tree
to really see the actual tree.
llvm-svn: 302811 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.
It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.
Differential Revision: https://reviews.llvm.org/D31847
llvm-svn: 302810 | 
| | 
| 
| 
| | llvm-svn: 302808 | 
| | 
| 
| 
| | llvm-svn: 302806 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch extends llvm-ir to allow attributes to be set on global variables.
An RFC was sent out earlier by my colleague James Molloy: http://lists.llvm.org/pipermail/cfe-dev/2017-March/053100.html
A key part of that proposal was to extend LLVM-IR to carry attributes on global variables.
This generic feature could be useful for multiple purposes.
In our present context, it would be useful to carry user specified sections for bss/rodata/data.
Reviewed by: Jonathan Roelofs, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D32009
llvm-svn: 302794 | 
| | 
| 
| 
| 
| 
| | Now it handle by TableGen.
llvm-svn: 302793 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and
refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from
ILV to LVP. These changes facilitate building VPlans and using them to generate
code, following https://reviews.llvm.org/D28975 and its tentative breakdown.
Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to
improve clarity; it's contents remain intact.
Differential Revision: https://reviews.llvm.org/D32200
llvm-svn: 302790 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless.
This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero).
Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0.
This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact.
For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code:
  orq     %rcx, %rax
  jne     .LBB0_6
, instead of:
  orl     %ecx, %eax
  testb   $1, %al
  jne     .LBB0_6
llvm-svn: 302787 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.
This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.
This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.
I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.
llvm-svn: 302785 | 
| | 
| 
| 
| | llvm-svn: 302784 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB
and MUL to 32 bits since we only have TableGen patterns for 32 bits.
See the commit message for r292827 for more details.
At this point we could just remove some of the tests for regbankselect
and instruction-select, since we're not going to see any narrow
operations at those levels anymore. Instead I decided to update them
with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences
generated by the legalizer.
llvm-svn: 302782 | 
| | 
| 
| 
| 
| 
| | Conversion rules allow automatic casting of nullptr to any pointer type.
llvm-svn: 302780 | 
| | 
| 
| 
| | llvm-svn: 302779 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | G_ANYEXT can be introduced by the legalizer when widening scalars. Add
support for it in the register bank info (same mapping as everything
else) and in the instruction selector.
When selecting it, we treat it as a COPY, just like G_TRUNC. On this
occasion we get rid of some assertions in selectCopy so we can reuse it.
This shouldn't be a problem at the moment since we're not supporting any
complicated cases (e.g. FPR, different register banks). We might want to
separate the paths when we do.
llvm-svn: 302778 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: support G_ICMP for scalar types i8/i16/i64.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski
Differential Revision: https://reviews.llvm.org/D32995
llvm-svn: 302774 | 
| | 
| 
| 
| 
| 
| | Turns out udivrem can write its output to the same location as one of its inputs so the extra temporary isn't needed.
llvm-svn: 302772 | 
| | 
| 
| 
| 
| 
| | writing back over the original value.
llvm-svn: 302770 | 
| | 
| 
| 
| | llvm-svn: 302769 | 
| | 
| 
| 
| | llvm-svn: 302768 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector.
This is a pre-commit for a patch I'm working on to support G_ICMP. NFC.
Reviewers: zvi, guyblank, delena
Reviewed By: guyblank, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33038
llvm-svn: 302767 | 
| | 
| 
| 
| 
| 
| 
| 
| | This time it actually occurred to me to change the #defines
to actually test the pre-processed out codepath.  Hopefully
this time it works.
llvm-svn: 302752 | 
| | 
| 
| 
| 
| 
| 
| | TaskGroup and Latch need to be in llvm::parallel::detail, not
in llvm::detail.
llvm-svn: 302751 | 
| | 
| 
| 
| | llvm-svn: 302749 | 
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D33024
llvm-svn: 302748 | 
| | 
| 
| 
| | llvm-svn: 302747 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This reverts r302712.
The change fails with ASAN enabled:
ERROR: AddressSanitizer: use-after-poison on address ... at ...
READ of size 2 at ... thread T0
  #0 ... in llvm::SDNode::getNumValues() const <snip>/include/llvm/CodeGen/SelectionDAGNodes.h:855:42
  #1 ... in llvm::SDNode::hasAnyUseOfValue(unsigned int) const <snip>/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7270:3
  #2 ... in llvm::SDValue::use_empty() const <snip> include/llvm/CodeGen/SelectionDAGNodes.h:1042:17
  #3 ... in (anonymous namespace)::DAGCombiner::MergeConsecutiveStores(llvm::StoreSDNode*) <snip>/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12944:7
Reviewers: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33081
llvm-svn: 302746 | 
| | 
| 
| 
| | llvm-svn: 302744 | 
| | 
| 
| 
| 
| 
| | In an attempt to reduce the confusion.
llvm-svn: 302742 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | // (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)
This canonicalization was added at:
https://reviews.llvm.org/rL7264 
By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.
This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.
Similar reasoning was used in:
https://reviews.llvm.org/rL299384
The larger motivation for removing this code is that it could interfere with 
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706
Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'. 
Differential Revision: https://reviews.llvm.org/D33050
llvm-svn: 302733 | 
| | 
| 
| 
| 
| 
| 
| | VOP3P instructions can encode access to either
half of the register.
llvm-svn: 302730 | 
| | 
| 
| 
| 
| 
| 
| | Flat instructions gain an immediate offset, and 2 new
sets of segment specific flat instructions are added.
llvm-svn: 302729 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | numbers to really do what the comment says
r271020 added an early out to skip the signed multiply portion of ConstantRange::multiply. The comment says we don't need to do signed multiply if the range is only positive numbers, but the implemented check only ensures that the start of the range is positive. It doesn't look at the end of the range.
This patch checks the end of the range instead. Because Upper is one more than the end we have to see if its positive or if its one past the last positive number.
llvm-svn: 302717 | 
| | 
| 
| 
| 
| 
| | shorten code.
llvm-svn: 302716 | 
| | 
| 
| 
| 
| 
| 
| | This is nice as is, but it will be used in my next patch to
fix a bug. Suggested by Daniel Berlin.
llvm-svn: 302714 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Allow consecutive stores whose values come from consecutive loads to
merged in the presense of other uses of the loads. Previously this was
disallowed as in general the merged load cannot be shared with the
other uses. Merging N stores into 1 may cause as many as N redundant
loads. However in the context of caching this should have neglible
affect on memory pressure and reduce instruction count making it
almost always a win.
Fixes PR32086.
Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30471
llvm-svn: 302712 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.
Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.
llvm-svn: 302705 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | method directly. Do a better job of reusing allocations while looping. NFCI
This lets toString take advantage of the degenerate case checks in udivrem and is just generally cleaner.
One minor downside of this is that the divisor APInt now needs to be the same size as Tmp which requires an additional allocation. But we were doing a poor job of reusing allocations before so the new code should still be an improvement.
llvm-svn: 302704 | 
| | 
| 
| 
| 
| 
| | divide code. Use Lo_32/Hi_32/Make_64 helpers instead of casts and shifts. NFCI
llvm-svn: 302703 | 
| | 
| 
| 
| | llvm-svn: 302702 | 
| | 
| 
| 
| 
| 
| | in the function. NFC
llvm-svn: 302701 | 
| | 
| 
| 
| | llvm-svn: 302699 | 
| | 
| 
| 
| 
| 
| 
| 
| | For stores, check if the stored value is defined by a floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.
llvm-svn: 302679 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.
The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.
Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 302678 | 
| | 
| 
| 
| 
| 
| 
| 
| | This adds a few missing instructions for the assembler and
disassembler.  Those should be the last missing general-
purpose (Chapter 7) instructions for the z10 ISA.
llvm-svn: 302667 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This adds the remaining general arithmetic instructions
for assembler / disassembler use.  Most of these are not
useful for codegen; a few might be, and those are listed
in the README.txt for future improvements.
llvm-svn: 302665 | 
| | 
| 
| 
| | llvm-svn: 302660 |