| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.
This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.
I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.
This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.
Differential Revision: https://reviews.llvm.org/D55975
llvm-svn: 350245
|
| |
|
|
|
|
|
|
| |
Sometimes it's useful to be able to output demangled names without
tag specifiers like "struct", "class", etc. This patch adds a
flag enabling this.
llvm-svn: 350241
|
| |
|
|
|
|
|
|
|
|
|
|
| |
DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Differential Revision: https://reviews.llvm.org/D56145
llvm-svn: 350239
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
also promote the input type if necessary
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
This fixes the regression on X86 in D56156.
Differential Revision: https://reviews.llvm.org/D56176
llvm-svn: 350236
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Differential Revision: https://reviews.llvm.org/D56041
llvm-svn: 350223
|
| |
|
|
|
|
|
|
|
|
| |
Peek through shift modulo masks while matching double shift patterns.
I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.
Differential Revision: https://reviews.llvm.org/D56199
llvm-svn: 350222
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).
Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).
After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).
As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.
Patch by me with contributions from Michael Ferguson (mferguson@cray.com).
Differential Revision: https://reviews.llvm.org/D38662
llvm-svn: 350220
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GlobalVariable
Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.
Reviewers: chandlerc
Subscribers: hiraditya, llvm-commits, bollu
Differential Revision: https://reviews.llvm.org/D56130
llvm-svn: 350219
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Common code used by the default resource strategy to select pipeline resources
has been moved to an helper function.
The new selection logic has been slightly rewritten to get rid of a redundant
zero check on the `ReadyMask` value. Before this patch, method select internally
called function `PowerOf2Floor` to compute the next ready pipeline resource.
However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input
value. By construction, `ReadyMask` can never be zero. This patch replaces the
call to `PowerOf2Floor` with an equivalent block of code which avoids the
redundant zero check. This gives a minor 3-3.5% speedup on a release build.
No functional change intended.
llvm-svn: 350218
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Use isBaseWithConstantOffset() which handles OR as an operand
to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store.
Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a
Reviewers: nhaehnle, mareko, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55999
llvm-svn: 350208
|
| |
|
|
|
|
|
|
| |
SMUL/UMUL. Remove the second result from X86ISD::UMUL.
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.
llvm-svn: 350206
|
| |
|
|
|
|
|
|
| |
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.
Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll
llvm-svn: 350205
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
llvm-svn: 350199
|
| |
|
|
|
|
|
|
|
|
|
|
| |
in LowerBRCOND and LowerSELECT to avoid some duplicated code.
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.
This is inspired by the helper functions used by ARM and AArch64 for the same purpose.
The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.
llvm-svn: 350198
|
| |
|
|
| |
llvm-svn: 350197
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add accessors so the performance improvement from this setting is accessible to third parties.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56179
llvm-svn: 350196
|
| |
|
|
|
|
| |
Preliminary commit as suggested in D56011.
llvm-svn: 350193
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
|
| |
|
|
|
|
| |
tests.
llvm-svn: 350187
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
default
During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.
Review Reference: D52002
llvm-svn: 350186
|
| |
|
|
|
|
|
|
|
| |
-X * Y --> -(X * Y)
X * -Y --> -(X * Y)
Differential Revision: https://reviews.llvm.org/D55961
llvm-svn: 350185
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D56168
llvm-svn: 350179
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D56169
llvm-svn: 350178
|
| |
|
|
|
|
| |
Mentioned on D56167.
llvm-svn: 350176
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D56128
llvm-svn: 350174
|
| |
|
|
|
|
|
|
|
|
| |
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.
llvm-svn: 350173
|
| |
|
|
|
|
|
|
| |
bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform.
Found while trying out some other changes so I don't really have a test case.
llvm-svn: 350172
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D56131
llvm-svn: 350169
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
that bad machine code
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will
be reserved later, but the verifier doesn't know that).
So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know
X2 is a reserved register.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D56148
llvm-svn: 350165
|
| |
|
|
|
|
|
|
| |
-This line, and those below, will be ignored--
M lib/Support/Error.cpp
llvm-svn: 350162
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch is created to fix the Bugzilla bug 39815:
https://bugs.llvm.org/show_bug.cgi?id=39815
This patch is to support promotion integer result for the instruction ADDE, SUBE.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D56119
llvm-svn: 350161
|
| |
|
|
|
|
|
|
| |
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.
Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.
llvm-svn: 350159
|
| |
|
|
|
|
|
|
|
|
| |
16i16/v32i8 to v4i64 when v4i64 needs splitting.
This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh.
We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization.
llvm-svn: 350158
|
| |
|
|
|
|
|
|
|
|
|
| |
We have some unfortunate code in the back end that defines a bunch of register
sets for the Asm Parser. Every time another class is needed in the parser, we
have to add another one of those definitions with explicit lists of registers.
This NFC patch simply provides macros to use to condense that code a little bit.
Differential revision: https://reviews.llvm.org/D54433
llvm-svn: 350156
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:
v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32
Differential revision: https://reviews.llvm.org/D54663
llvm-svn: 350155
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.
Thanks to Hal Finkel for the review and Uli Weigand for implementation input.
Differential revision: https://reviews.llvm.org/D55996
llvm-svn: 350153
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The following code requests 64-bit PC-relative relocations unsupported
by MIPS ABI. Now it triggers an assertion. It's better to show an error
message.
```
foo:
.quad bar - foo
```
llvm-svn: 350152
|
| |
|
|
| |
llvm-svn: 350151
|
| |
|
|
| |
llvm-svn: 350145
|
| |
|
|
| |
llvm-svn: 350144
|
| |
|
|
| |
llvm-svn: 350142
|
| |
|
|
|
|
|
|
| |
widening legalization.
This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better.
llvm-svn: 350141
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2i8/v2i16/v2i32 multiply.
Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly.
Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll
Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs.
I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization
llvm-svn: 350134
|
| |
|
|
|
|
|
| |
Added -verify-loop-lcssa to test cases.
Updated comments in ConnectProlog.
llvm-svn: 350131
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also moves to FeatureSB the old FeatureSpecRestrict.
Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55921
llvm-svn: 350126
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.
For example of this test case,
struct s64b {
int a:4;
int b:16;
int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
p->b = v;
}
the selection DAG loos like:
t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64
t18: i32 = and t14, Constant:i32<-1048561>
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t22: i64 = AssertZext t4, ValueType:ch:i8
t23: i32 = truncate t22
t16: i32 = shl nuw nsw t23, Constant:i32<4>
t19: i32 = or t18, t16
t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64
By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are
without this patch
rlwinm 5, 5, 0, 28, 11 # corresponding to t18
rlwimi 5, 4, 4, 20, 27
with this patch
rlwimi 5, 4, 4, 12, 27
Differential Revision: https://reviews.llvm.org/D49076
llvm-svn: 350118
|
| |
|
|
| |
llvm-svn: 350117
|
| |
|
|
|
|
|
|
|
| |
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.
llvm-svn: 350116
|
| |
|
|
|
|
|
|
| |
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.
Differential Revision: https://reviews.llvm.org/D56078
llvm-svn: 350115
|