| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
No observable changes, spotted while looking at the scheduling description.
llvm-svn: 296277
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current pattern for extract bits in range is typically:
Mask.lshr(BitOffset).trunc(SubSizeInBits);
Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.
Differential Revision: https://reviews.llvm.org/D30336
llvm-svn: 296272
|
| |
|
|
| |
llvm-svn: 296271
|
| |
|
|
| |
llvm-svn: 296270
|
| |
|
|
|
|
| |
using the scalar register class. We only have patterns for the masked intrinsics.
llvm-svn: 296264
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296252
|
| |
|
|
| |
llvm-svn: 296207
|
| |
|
|
|
|
|
| |
This replaces the __stack_pointer variable which was allocated in linear
memory.
llvm-svn: 296201
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For example, avoid (single shift):
r0 = and(##536870908,lsr(r0,#3))
r0 = memw(r1+r0<<#0)
in favor of (two shifts):
r0 = lsr(r0,#5)
r0 = memw(r1+r0<<#2)
llvm-svn: 296196
|
| |
|
|
|
|
|
|
|
| |
With the "wasm32-unknown-unknown-wasm" triple, this allows writing out
simple wasm object files, and is another step in a larger series toward
migrating from ELF to general wasm object support. Note that this code
and the binary format itself is still experimental.
llvm-svn: 296190
|
| |
|
|
| |
llvm-svn: 296187
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D29958
llvm-svn: 296186
|
| |
|
|
|
|
|
|
| |
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.
llvm-svn: 296182
|
| |
|
|
| |
llvm-svn: 296172
|
| |
|
|
|
|
| |
Hopefully placates gcc with -Werror.
llvm-svn: 296153
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
subrange
The current pattern for extract bits in range is typically:
Mask.lshr(BitOffset).trunc(SubSizeInBits);
Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.
Differential Revision: https://reviews.llvm.org/D30336
llvm-svn: 296147
|
| |
|
|
|
|
|
|
|
|
|
| |
Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate.
The corresponding pattern already exists for 32-bit integers.
Committing on behalf of Hiroshi Inoue.
Differential Revision: https://reviews.llvm.org/D29387
llvm-svn: 296144
|
| |
|
|
|
|
|
|
|
|
|
| |
Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that
clear bits from the right hand size.
Committing on behalf of Hiroshi Inoue.
Differential Revision: https://reviews.llvm.org/D29388
llvm-svn: 296143
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current pattern for extract bits in range is typically:
Mask.lshr(BitOffset).trunc(SubSizeInBits);
Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.
Differential Revision: https://reviews.llvm.org/D30336
llvm-svn: 296141
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This time with the missing files.
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.
This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.
This resolves PR/32020.
Thanks to James Cowgill for reporting the issue!
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D30257
llvm-svn: 296134
|
| |
|
|
|
|
| |
This reverts r296132. I forgot to include the tests.
llvm-svn: 296133
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.
This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.
This resolves PR/32020.
Thanks to James Cowgill for reporting the issue!
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D30257
llvm-svn: 296132
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.
The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.
The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
%1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
%1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.
Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.
Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.
Depends on D29711
Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar
Reviewed By: rovka
Subscribers: aemerson, dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D29712
llvm-svn: 296131
|
| |
|
|
|
|
| |
Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth).
llvm-svn: 296130
|
| |
|
|
| |
llvm-svn: 296128
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously LLVM was assuming 32-bit signed immediates which results in and with
a bitmask that has bit 31 set to incorrectly include bits 63-32 in the result.
After applying this patch I can now compile all of the FreeBSD mips assembly
code with clang.
This issue also affects the nor, slt and sltu macros and I will fix those in a
separate review.
Patch By: Alexander Richardson
Commit message reformatted by sdardis.
Reviewers: atanasyan, theraven, sdardis
Differential Revision: https://reviews.llvm.org/D30298
llvm-svn: 296125
|
| |
|
|
|
|
| |
Same as selecting G_LOAD.
llvm-svn: 296122
|
| |
|
|
|
|
| |
Same as the ones for loads.
llvm-svn: 296115
|
| |
|
|
|
|
| |
Allow the same types that we allow for loads.
llvm-svn: 296108
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the MIPS disassembler consistent with the other targets in returning
a Size of zero when the input buffer cannot contain an instruction due
to it's size. Previously it reported the minimum instruction size when
it failed due to the buffer not being big enough for an instruction
causing llvm-objdump to crash when disassembling all sections.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D29984
llvm-svn: 296105
|
| |
|
|
|
|
| |
This reverts commit r296103 because the test broke on one of the bots. Sorry!
llvm-svn: 296104
|
| |
|
|
|
|
| |
Allow the same types that we allow for loads.
llvm-svn: 296103
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current pattern for setting bits in range is typically:
Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos);
Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable.
This is one of the key compile time issues identified in PR32037.
This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible.
I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial.
Differential Revision: https://reviews.llvm.org/D30265
llvm-svn: 296102
|
| |
|
|
| |
llvm-svn: 296095
|
| |
|
|
|
|
| |
opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC
llvm-svn: 296094
|
| |
|
|
|
|
|
|
| |
intrinsics with select.
Clang has been emitting cltz intrinsics for a while now.
llvm-svn: 296091
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D30237
llvm-svn: 296081
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables support for .f16x2 operations.
Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.
Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310
llvm-svn: 296032
|
| |
|
|
|
|
|
|
|
|
|
| |
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.
The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.
llvm-svn: 296031
|
| |
|
|
| |
llvm-svn: 296026
|
| |
|
|
|
|
|
|
|
|
|
| |
In the bit tracker, references to other bit values in which the register
is 0 are prohibited. This means that generating self-referential register
cells like { w:32 [0-15]:s[0-15] [16-31]:s[15] } is impossible. In order
to get a self-referential cell, it had to be stored into a map and then
reloaded from it. To avoid this step, add a function that will set the
register to a given value without going through the map.
llvm-svn: 296025
|
| |
|
|
|
|
|
| |
Clang issues warning about hidden overload. That was intended, so
add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it.
llvm-svn: 296021
|
| |
|
|
|
|
| |
The TLS slot did not exist back then.
llvm-svn: 296014
|
| |
|
|
|
|
|
|
|
|
| |
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.
Differential Revision: https://reviews.llvm.org/D29835
llvm-svn: 296009
|
| |
|
|
| |
llvm-svn: 295997
|
| |
|
|
|
|
|
|
| |
Hit on ASICs that support 16bit instructions.
Differential Revision: https://reviews.llvm.org/D30281
llvm-svn: 295990
|
| |
|
|
| |
llvm-svn: 295981
|
| |
|
|
|
|
|
|
| |
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).
llvm-svn: 295973
|
| |
|
|
|
|
|
|
| |
Add support for lowering calls with parameters than can fit into regs. Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.
llvm-svn: 295971
|
| |
|
|
|
|
|
|
| |
class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).
llvm-svn: 295970
|